Active Learning | Climate Change AI

Active Learning

Agile Modeling for Bioacoustic Monitoring

Jenny Hamer, Rob Laber, and Tom Denton, NeurIPS 2023

ICLR 2023
- Keynote: Bistra Dilkina (University of Southern California)

Venue	Title
ICLR 2025	Balancing quantity and representativeness in constrained geospatial dataset design (Papers Track) Abstract and authors: (click to expand) Abstract: Effective geospatial machine learning (GeoML) relies on high-quality, large-scale datasets, yet geospatial data collection is often costly and logistically challenging. Creating new geospatial datasets frequently requires on-site labeling of data, including collecting data through surveys or scientific instruments, which leads to variable costs across different regions or groups. To address this, we propose a sampling method that jointly maximizes dataset size and representative composition with respect to cost constraints. We evaluate our proposed sampling method by training GeoML models on the selected subsets and comparing their performance to models trained on randomly sampled data. We find that our method leads to improved performance over standard data collection baselines. These findings provide guidance on when to prioritize representation or dataset size and highlight the need for further research into how sampling strategies can enhance model performance. Authors: Livia Betti (University of Colorado at Boulder); Esther Rolf (CU Boulder)
ICLR 2025	Atlantes: A system of GPS transformers for global-scale real-time maritime intelligence (Papers Track) Abstract and authors: (click to expand) Abstract: Billions of humans depend on healthy oceans for prosperity and sustenance. Unsustainable exploitation of the oceans exacerbated by climate change are threatening coastal communities worldwide. Accurate and timely monitoring of maritime activity is an essential step to effective governance and to inform future policy. In support of this complex global-scale effort, we built Atlantes a machine learning based system that provides the first ever real-time view of vessel behavior at global scale. Atlantes leverages a series of bespoke transformers to distill a high volume (100M/day) continuous stream of GPS messages emitted by hundreds of thousands of vessels into real-time behavioral classification. The combination of low latency and high performance enables operationally relevant decision-making and successful interventions on the high seas where illegal and exploitative activity is common. Atlantes is already in use by hundreds of organizations worldwide. Here we provide an overview of the machine learning strategy and modeling architecture that enables this system to function efficiently and cost-effectively at global-scale and in real-time. Authors: Henry Herzog (Allen Institute for AI)
NeurIPS 2024	Harnessing AI for Wildfire Defense: An approach to Predict and Mitigate Global Fire Risk (Papers Track) Abstract and authors: (click to expand) Abstract: Wildfires pose a critical threat to wildlife, economies, properties, and human lives globally, making accurate risk assessment essential for effective management and mitigation. This study introduces a novel machine learning-based approach utilizing a Convolutional Neural Network (CNN) to evaluate wildfire risks across diverse ecosystems. Leveraging a comprehensive dataset of remote-sensed variables—including topography, vegetation health indicators, and climatic conditions—our model operates at a spatial resolution of 1000 meters per pixel, providing enhanced precision in predicting wildfire occurrences. The CNN outperforms state-of-the-art models, achieving a fire detection ratio of 0.82 and a no-fire detection ratio of 0.87. The results demonstrate that most dataset variables are crucial for accurate risk assessment, although some are non-essential. By integrating data from regions around the globe, this study underscores the feasibility and effectiveness of implementing globally scalable wildfire prediction tools. Authors: Hassan Ashfaq (Ghulam Ishaq Khan Institute of Engineering Sciences and Technology)
NeurIPS 2024	Towards Using Machine Learning to Generatively Simulate EV Charging in Urban Areas (Papers Track) Abstract and authors: (click to expand) Abstract: This study addresses the challenge of predicting electric vehicle (EV) charging profiles in urban locations with limited data. Utilizing a neural network architecture, we aim to uncover latent charging profiles influenced by spatio-temporal factors. Our model focuses on peak power demand and daily load shapes, providing insights into charging behavior. Our results indicate significant impacts from the type of Basic Administrative Units on predicted load curves, which contributes to the understanding and optimization of EV charging infrastructure in urban settings and allows Distribution System Operators (DSO) to more efficiently plan EV charging infrastructure expansion. Authors: Marek Miltner (Stanford University; Czech Technical University); Jakub Zíka (CTU); Daniel Vašata (Czech Technical University in Prague, Faculty of Information Technology); Artem Bryksa (CTU); Magda Friedjungová (Czech Technical University in Prague, Faculty of Information Technology); Ondřej Štogl (CTU); Ram Rajagopal (Stanford University); Oldřich Starý (CTU)
ICLR 2024	Calibrating Earth System Models with Bayesian Optimal Experimental Design (Proposals Track) Abstract and authors: (click to expand) Abstract: Earth system models (ESMs) are complex climate simulations that are critical for projecting future climate change and its impacts. However, running ESMs is extremely computationally expensive, limiting the number of simulations that can be performed. This results in significant uncertainty in key climate metrics estimated from ESM ensembles. We propose a Bayesian optimal experimental design (BOED) approach to efficiently calibrate ESM simulations to observational data by actively selecting the most informative input parameters. BOED optimises the expected information gain (EIG) to select the ESM input parameter to reduce the final uncertainty estimates in the climate metrics of interest. Initial results on a synthetic benchmark demonstrate our approach can more efficiently reduce uncertainty compared to common sampling schemes like Latin hypercube sampling. Authors: Tim Reichelt (University of Oxford); Shahine Bouabid (University of Oxford); Luke Ong (University of Oxford); Duncan Watson-Parris (University of California San Diego); Tom Rainforth (University of Oxford)
NeurIPS 2023	ALAS: Active Learning for Autoconversion Rates Prediction from Satellite Data (Papers Track) Abstract and authors: (click to expand) Abstract: High-resolution simulations, such as the ICOsahedral Non-hydrostatic Large-Eddy Model (ICON-LEM), provide valuable insights into the complex interactions among aerosols, clouds, and precipitation, which are the major contributors to climate change uncertainty. However, due to their exorbitant computational costs, they can only be employed for a limited period and geographical area. To address this, we propose a more cost-effective method powered by an emerging machine learning approach to better understand the intricate dynamics of the climate system. Our approach involves active learning techniques by leveraging high-resolution climate simulation as an oracle that is queried based on an abundant amount of unlabeled data drawn from satellite observations. In particular, we aim to predict autoconversion rates, a crucial step in precipitation formation, while significantly reducing the need for a large number of labeled instances. In this study, we present novel methods: custom query strategy fusion for labeling instances -- weight fusion (WiFi) and merge fusion (MeFi) -- along with active feature selection based on SHAP. These methods are designed to tackle real-world challenges -- in this case, climate change, with a specific focus on the prediction of autoconversion rates -- due to their simplicity and practicality in application. Authors: Maria C Novitasari (University College London); Johanness Quaas (Universität Leipzig); Miguel Rodrigues (University College London)
NeurIPS 2023	Agile Modeling for Bioacoustic Monitoring (Tutorials Track) Abstract and authors: (click to expand) Abstract: Bird, insect, and other wild animal populations are rapidly declining, highlighting the need for better monitoring, understanding, and protection of Earth’s remaining wild places. However, direct monitoring of biodiversity is difficult. Passive Acoustic Monitoring (PAM) enables detection of the vocalizing species in an ecosystem, many of which can be difficult or impossible to detect by satellite or camera trap. Large-scale PAM deployments using low-cost devices allow measuring changes over time and responses to environmental changes, and targeted deployments can discover and monitor endangered or invasive species. Machine learning methods are needed to analyze the thousands or even millions of hours of audio produced by large-scale deployments. But there are a massive number of potential signals to target for bioacoustic measurement, and many of the most interesting lack training data. Many rare species are difficult to observe. Detecting specific call-types and juvenile calls can give further insight into behavior and population health, but almost no structured datasets exist for these use-cases. No single classifier can address all of these needs, so practitioners regularly need to create new classifiers to address novel problems. Soundscape annotation efforts are very expensive, and machine learning experts are scarce, creating a bottleneck on analysis. We aim to eliminate the bottleneck by providing an efficient, self-contained active learning workflow for biologists. In this tutorial, we present an integrated workflow for analyzing large unlabeled bioacoustic datasets, adapting new agile modeling techniques to audio. Our goal is to allow experts to create a new high quality classifier for a novel class with under one hour of effort. We achieve this by leveraging transfer learning from high-quality bioacoustic models, vector search over audio databases, and lightweight Python notebook UX. The workflow can begin from a single example, proceeds through an efficient active learning loop, and finally applies the produced classifier to a large mass of unlabeled data to produce insights for ecologists and land managers. Authors: tom denton (google); Jenny Hamer (Google Research); Rob Laber (Google)
NeurIPS 2022	Accessible Large-Scale Plant Pathology Recognition (Papers Track) Abstract and authors: (click to expand) Abstract: Plant diseases are costly and threaten agricultural production and food security worldwide. Climate change is increasing the frequency and severity of plant diseases and pests. Therefore, detection and early remediation can have a significant impact, especially in developing countries. However, AI solutions are yet far from being in production. The current process for plant disease diagnostic consists of manual identification and scoring by humans, which is time-consuming, low-supply, and expensive. Although computer vision models have shown promise for efficient and automated plant disease identification, there are limitations for real-world applications: a notable variation in visual symptoms of a single disease, different light and weather conditions, and the complexity of the models. In this work, we study the performance of efficient classification models and training "tricks" to solve this problem. Our analysis represents a plausible solution for these ecological disasters and might help to assist producers worldwide. More information available at: https://github.com/mv-lab/mlplants Authors: Marcos V. Conde (University of Würzburg); Dmitry Gordeev (H2O.ai)
NeurIPS 2022	Continual VQA for Disaster Response Systems (Papers Track) Abstract and authors: (click to expand) Abstract: Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image, semantically understanding the contents of the image and answering it in natural language. Using VQA for disaster management is an important line of research due to the scope of problems that are answered by the VQA system. However, the main challenge is the delay caused by the generation of labels in the assessment of the affected areas. To tackle this, we deployed pre-trained CLIP model, which is trained on visual-image pairs. however, we empirically see that the model has poor zero-shot performance. Thus, we instead use pre-trained embeddings of text and image from this model for our supervised training and surpass previous state-of-the-art results on the FloodNet dataset. We expand this to a continual setting, which is a more real-life scenario. We tackle the problem of catastrophic forgetting using various experience replay methods. Authors: Aditya Kane (Pune Institute of Computer Technology); V Manushree (Manipal Institute Of Technology); Sahil S Khose (Georgia Institute of Technology)
NeurIPS 2022	Disaster Risk Monitoring Using Satellite Imagery (Tutorials Track) Abstract and authors: (click to expand) Abstract: Natural disasters such as flood, wildfire, drought, and severe storms wreak havoc throughout the world, causing billions of dollars in damages, and uprooting communities, ecosystems, and economies. Unfortunately, flooding events are on the rise due to climate change and sea level rise. The ability to detect and quantify them can help us minimize their adverse impacts on the economy and human lives. Using satellites to study flood is advantageous since physical access to flooded areas is limited and deploying instruments in potential flood zones can be dangerous. We are proposing a hands-on tutorial to highlight the use of satellite imagery and computer vision to study natural disasters. Specifically, we aim to demonstrate the development and deployment of a flood detection model using Sentinel-1 satellite data. The tutorial will cover relevant fundamental concepts as well as the full development workflow of a deep learning-based application. We will include important considerations such as common pitfalls, data scarcity, augmentation, transfer learning, fine-tuning, and details of each step in the workflow. Importantly, the tutorial will also include a case study on how the application was used by authorities in response to a flood event. We believe this tutorial will enable machine learning practitioners of all levels to develop new technologies that tackle the risks posed by climate change. We expect to deliver the below learning outcomes: • Develop various deep learning-based computer vision solutions using hardware-accelerated open-source tools that are optimized for real-time deployment • Create an optimized pipeline for the machine learning development workflow • Understand different performance metrics for model evaluation that are relevant for real world datasets and data imbalances • Understand the public sector’s efforts to support climate action initiatives and point out where the audience can contribute Authors: Kevin Lee (NVIDIA); Siddha Ganju (NVIDIA); Edoardo Nemni (UNOSAT)
NeurIPS 2021	Machine Learning in Automating Carbon Sequestration Site Assessment (Proposals Track) Abstract and authors: (click to expand) Abstract: Carbon capture and sequestration are viewed as an indispensable component to achieve the Paris Agreement climate goal, i.e., keep the global warming within 2 degrees Celsius from pre-industrial levels. Once captured, most CO2 needs to be stored securely for at least decades, preferably in deep underground geological formations. It is economical to inject and store CO2 near/around a depleted gas/oil reservoir or well, where a geological trap for CO2 with good sealing properties and some minimum infrastructure exist. In this proposal, with our preliminary work, it is shown that Machine Learning tools like Optical Character Recognition and Natural Language Processing can aid in screening and selection of injection sites for CO2 storage, facilitate identification of possible CO2 leakage paths in the subsurface, and assist in locating a depleted gas/oil well suitable for CO2 injection and long-term storage. The automated process based on ML tools can also drastically decrease the decision-making cycle time in site selection and assessment phase by reducing human effort. In the longer term, we expect ML tools like Deep Neural Networks to be utilized in CO2 storage monitoring, injection optimization etc. By injecting CO2 into a trapping geological underground formation in a safe and sustainable manner, the Energy industry can contribute substantially to reducing global warming and achieving the goals of the Paris Agreement by the end of this century. Authors: Jay Chen (Shell); Ligang Lu (Shell); Mohamed Sidahmed (Shell); Taixu Bai (Shell); Ilyana Folmar (Shell); Puneet Seth (Shell); Manoj Sarfare (Shell); Duane Mikulencak (Shell); Ihab Akil (Shell)
ICML 2021	Machine Learning for Climate Change: Guiding Discovery of Sorbent Materials for Direct Air Capture of CO2 (Proposals Track) Abstract and authors: (click to expand) Abstract: The global climate crisis requires interdisciplinary collaboration. The same is true for making significant strides in materials discovery for direct air capture (DAC) of carbon dioxide (CO2). DAC is an emerging technology that captures CO2 directly from the atmosphere and it is part of the solution to achieving carbon neutrality by 2050. The proposed project is a collaborative effort that tackles climate change by using machine learning to guide scientists to novel, optimized, advanced sorbent materials for direct air capture of CO2. Immediate impacts will include high throughput machine learning tools for developing new, cost-effective CO2 sorption materials, and continued, expanded collaborations with potential domestic and international stakeholders. Authors: Diana L Ortiz-Montalvo (NIST); Aaron Gilad Kusne (NIST); Austin McDannald (NIST); Daniel Siderius (NIST); Kamal Choudhary (NIST); Taner Yildirim (NIST)
NeurIPS 2020	ACED: Accelerated Computational Electrochemical systems Discovery (Proposals Track) Abstract and authors: (click to expand) Abstract: Large-scale electrification is vital to addressing the climate crisis, but many engineering challenges remain to fully electrifying both the chemical industry and transportation. In both of these areas, new electrochemical materials and systems will be critical, but developing these systems currently relies heavily on computationally expensive first-principles simulations as well as human-time-intensive experimental trial and error. We propose to develop an automated workflow that accelerates these computational steps by introducing both automated error handling in generating the first-principles training data as well as physics-informed machine learning surrogates to further reduce computational cost. It will also have the capacity to include automated experiments ``in the loop'' in order to dramatically accelerate the overall materials discovery pipeline. Authors: Rachel C Kurchin (CMU); Eric Muckley (Citrine Informatics); Lance Kavalsky (CMU); Vinay Hegde (Citrine Informatics); Dhairya Gandhi (Julia Computing); Xiaoyu Sun (CMU); Matthew Johnson (MIT); Alan Edelman (MIT); James Saal (Citrine Informatics); Christopher V Rackauckas (Massachusetts Institute of Technology); Bryce Meredig (Citrine Informatics); Viral Shah (Julia Computing); Venkat Viswanathan (Carnegie Mellon University)
NeurIPS 2020	Artificial Intelligence, Machine Learning and Modeling for Understanding the Oceans and Climate Change (Proposals Track) Abstract and authors: (click to expand) Abstract: These changes will have a drastic impact on almost all forms of life in the ocean with further consequences on food security, ecosystem services in coastal and inland communities. Despite these impacts, scientific data and infrastructures are still lacking to understand and quantify the consequences of these perturbations on the marine ecosystem. Understanding this phenomenon is not only an urgent but also a scientifically demanding task. Consequently, it is a problem that must be addressed with a scientific cohort approach, where multi-disciplinary teams collaborate to bring the best of different scientific areas. In this proposal paper, we describe our newly launched four-years project focused on developing new artificial intelligence, machine learning, and mathematical modeling tools to contribute to the understanding of the structure, functioning, and underlying mechanisms and dynamics of the global ocean symbiome and its relation with climate change. These actions should enable the understanding of our oceans and predict and mitigate the consequences of climate change. Authors: Nayat Sánchez Pi (Inria); Luis Martí (Inria); André Abreu (Fountation Tara Océans); Olivier Bernard (Inria); Colomban de Vargas (CNRS); Damien Eveillard (Univ. Nantes); Alejandro Maass (CMM, U. Chile); Pablo Marquet (PUC); Jacques Sainte-Marie (Inria); Julien Salomin (Inria); Marc Schoenauer (INRIA); Michele Sebag (LRI, CNRS, France)