Active Learning


Workshop Papers

Venue Title
ICLR 2024 Calibrating Earth System Models with Bayesian Optimal Experimental Design (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Earth system models (ESMs) are complex climate simulations that are critical for projecting future climate change and its impacts. However, running ESMs is extremely computationally expensive, limiting the number of simulations that can be performed. This results in significant uncertainty in key climate metrics estimated from ESM ensembles. We propose a Bayesian optimal experimental design (BOED) approach to efficiently calibrate ESM simulations to observational data by actively selecting the most informative input parameters. BOED optimises the expected information gain (EIG) to select the ESM input parameter to reduce the final uncertainty estimates in the climate metrics of interest. Initial results on a synthetic benchmark demonstrate our approach can more efficiently reduce uncertainty compared to common sampling schemes like Latin hypercube sampling.

Authors: Tim Reichelt (University of Oxford); Shahine Bouabid (University of Oxford); Luke Ong (University of Oxford); Duncan Watson-Parris (University of California San Diego); Tom Rainforth (University of Oxford)

NeurIPS 2023 ALAS: Active Learning for Autoconversion Rates Prediction from Satellite Data (Papers Track)
Abstract and authors: (click to expand)

Abstract: High-resolution simulations, such as the ICOsahedral Non-hydrostatic Large-Eddy Model (ICON-LEM), provide valuable insights into the complex interactions among aerosols, clouds, and precipitation, which are the major contributors to climate change uncertainty. However, due to their exorbitant computational costs, they can only be employed for a limited period and geographical area. To address this, we propose a more cost-effective method powered by an emerging machine learning approach to better understand the intricate dynamics of the climate system. Our approach involves active learning techniques by leveraging high-resolution climate simulation as an oracle that is queried based on an abundant amount of unlabeled data drawn from satellite observations. In particular, we aim to predict autoconversion rates, a crucial step in precipitation formation, while significantly reducing the need for a large number of labeled instances. In this study, we present novel methods: custom query strategy fusion for labeling instances -- weight fusion (WiFi) and merge fusion (MeFi) -- along with active feature selection based on SHAP. These methods are designed to tackle real-world challenges -- in this case, climate change, with a specific focus on the prediction of autoconversion rates -- due to their simplicity and practicality in application.

Authors: Maria C Novitasari (University College London); Johanness Quaas (Universität Leipzig); Miguel Rodrigues (University College London)

NeurIPS 2023 Agile Modeling for Bioacoustic Monitoring (Tutorials Track)
Abstract and authors: (click to expand)

Abstract: Bird, insect, and other wild animal populations are rapidly declining, highlighting the need for better monitoring, understanding, and protection of Earth’s remaining wild places. However, direct monitoring of biodiversity is difficult. Passive Acoustic Monitoring (PAM) enables detection of the vocalizing species in an ecosystem, many of which can be difficult or impossible to detect by satellite or camera trap. Large-scale PAM deployments using low-cost devices allow measuring changes over time and responses to environmental changes, and targeted deployments can discover and monitor endangered or invasive species. Machine learning methods are needed to analyze the thousands or even millions of hours of audio produced by large-scale deployments. But there are a massive number of potential signals to target for bioacoustic measurement, and many of the most interesting lack training data. Many rare species are difficult to observe. Detecting specific call-types and juvenile calls can give further insight into behavior and population health, but almost no structured datasets exist for these use-cases. No single classifier can address all of these needs, so practitioners regularly need to create new classifiers to address novel problems. Soundscape annotation efforts are very expensive, and machine learning experts are scarce, creating a bottleneck on analysis. We aim to eliminate the bottleneck by providing an efficient, self-contained active learning workflow for biologists. In this tutorial, we present an integrated workflow for analyzing large unlabeled bioacoustic datasets, adapting new agile modeling techniques to audio. Our goal is to allow experts to create a new high quality classifier for a novel class with under one hour of effort. We achieve this by leveraging transfer learning from high-quality bioacoustic models, vector search over audio databases, and lightweight Python notebook UX. The workflow can begin from a single example, proceeds through an efficient active learning loop, and finally applies the produced classifier to a large mass of unlabeled data to produce insights for ecologists and land managers.

Authors: tom denton (google); Jenny Hamer (Google Research); Rob Laber (Google)

NeurIPS 2022 Accessible Large-Scale Plant Pathology Recognition (Papers Track)
Abstract and authors: (click to expand)

Abstract: Plant diseases are costly and threaten agricultural production and food security worldwide. Climate change is increasing the frequency and severity of plant diseases and pests. Therefore, detection and early remediation can have a significant impact, especially in developing countries. However, AI solutions are yet far from being in production. The current process for plant disease diagnostic consists of manual identification and scoring by humans, which is time-consuming, low-supply, and expensive. Although computer vision models have shown promise for efficient and automated plant disease identification, there are limitations for real-world applications: a notable variation in visual symptoms of a single disease, different light and weather conditions, and the complexity of the models. In this work, we study the performance of efficient classification models and training "tricks" to solve this problem. Our analysis represents a plausible solution for these ecological disasters and might help to assist producers worldwide. More information available at:

Authors: Marcos V. Conde (University of Würzburg); Dmitry Gordeev (

NeurIPS 2022 Continual VQA for Disaster Response Systems (Papers Track)
Abstract and authors: (click to expand)

Abstract: Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image, semantically understanding the contents of the image and answering it in natural language. Using VQA for disaster management is an important line of research due to the scope of problems that are answered by the VQA system. However, the main challenge is the delay caused by the generation of labels in the assessment of the affected areas. To tackle this, we deployed pre-trained CLIP model, which is trained on visual-image pairs. however, we empirically see that the model has poor zero-shot performance. Thus, we instead use pre-trained embeddings of text and image from this model for our supervised training and surpass previous state-of-the-art results on the FloodNet dataset. We expand this to a continual setting, which is a more real-life scenario. We tackle the problem of catastrophic forgetting using various experience replay methods.

Authors: Aditya Kane (Pune Institute of Computer Technology); V Manushree (Manipal Institute Of Technology); Sahil S Khose (Georgia Institute of Technology)

NeurIPS 2022 Disaster Risk Monitoring Using Satellite Imagery (Tutorials Track)
Abstract and authors: (click to expand)

Abstract: Natural disasters such as flood, wildfire, drought, and severe storms wreak havoc throughout the world, causing billions of dollars in damages, and uprooting communities, ecosystems, and economies. Unfortunately, flooding events are on the rise due to climate change and sea level rise. The ability to detect and quantify them can help us minimize their adverse impacts on the economy and human lives. Using satellites to study flood is advantageous since physical access to flooded areas is limited and deploying instruments in potential flood zones can be dangerous. We are proposing a hands-on tutorial to highlight the use of satellite imagery and computer vision to study natural disasters. Specifically, we aim to demonstrate the development and deployment of a flood detection model using Sentinel-1 satellite data. The tutorial will cover relevant fundamental concepts as well as the full development workflow of a deep learning-based application. We will include important considerations such as common pitfalls, data scarcity, augmentation, transfer learning, fine-tuning, and details of each step in the workflow. Importantly, the tutorial will also include a case study on how the application was used by authorities in response to a flood event. We believe this tutorial will enable machine learning practitioners of all levels to develop new technologies that tackle the risks posed by climate change. We expect to deliver the below learning outcomes: • Develop various deep learning-based computer vision solutions using hardware-accelerated open-source tools that are optimized for real-time deployment • Create an optimized pipeline for the machine learning development workflow • Understand different performance metrics for model evaluation that are relevant for real world datasets and data imbalances • Understand the public sector’s efforts to support climate action initiatives and point out where the audience can contribute

Authors: Kevin Lee (NVIDIA); Siddha Ganju (NVIDIA); Edoardo Nemni (UNOSAT)

NeurIPS 2021 Machine Learning in Automating Carbon Sequestration Site Assessment (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Carbon capture and sequestration are viewed as an indispensable component to achieve the Paris Agreement climate goal, i.e., keep the global warming within 2 degrees Celsius from pre-industrial levels. Once captured, most CO2 needs to be stored securely for at least decades, preferably in deep underground geological formations. It is economical to inject and store CO2 near/around a depleted gas/oil reservoir or well, where a geological trap for CO2 with good sealing properties and some minimum infrastructure exist. In this proposal, with our preliminary work, it is shown that Machine Learning tools like Optical Character Recognition and Natural Language Processing can aid in screening and selection of injection sites for CO2 storage, facilitate identification of possible CO2 leakage paths in the subsurface, and assist in locating a depleted gas/oil well suitable for CO2 injection and long-term storage. The automated process based on ML tools can also drastically decrease the decision-making cycle time in site selection and assessment phase by reducing human effort. In the longer term, we expect ML tools like Deep Neural Networks to be utilized in CO2 storage monitoring, injection optimization etc. By injecting CO2 into a trapping geological underground formation in a safe and sustainable manner, the Energy industry can contribute substantially to reducing global warming and achieving the goals of the Paris Agreement by the end of this century.

Authors: Jay Chen (Shell); Ligang Lu (Shell); Mohamed Sidahmed (Shell); Taixu Bai (Shell); Ilyana Folmar (Shell); Puneet Seth (Shell); Manoj Sarfare (Shell); Duane Mikulencak (Shell); Ihab Akil (Shell)

ICML 2021 Machine Learning for Climate Change: Guiding Discovery of Sorbent Materials for Direct Air Capture of CO2 (Proposals Track)
Abstract and authors: (click to expand)

Abstract: The global climate crisis requires interdisciplinary collaboration. The same is true for making significant strides in materials discovery for direct air capture (DAC) of carbon dioxide (CO2). DAC is an emerging technology that captures CO2 directly from the atmosphere and it is part of the solution to achieving carbon neutrality by 2050. The proposed project is a collaborative effort that tackles climate change by using machine learning to guide scientists to novel, optimized, advanced sorbent materials for direct air capture of CO2. Immediate impacts will include high throughput machine learning tools for developing new, cost-effective CO2 sorption materials, and continued, expanded collaborations with potential domestic and international stakeholders.

Authors: Diana L Ortiz-Montalvo (NIST); Aaron Gilad Kusne (NIST); Austin McDannald (NIST); Daniel Siderius (NIST); Kamal Choudhary (NIST); Taner Yildirim (NIST)

NeurIPS 2020 ACED: Accelerated Computational Electrochemical systems Discovery (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Large-scale electrification is vital to addressing the climate crisis, but many engineering challenges remain to fully electrifying both the chemical industry and transportation. In both of these areas, new electrochemical materials and systems will be critical, but developing these systems currently relies heavily on computationally expensive first-principles simulations as well as human-time-intensive experimental trial and error. We propose to develop an automated workflow that accelerates these computational steps by introducing both automated error handling in generating the first-principles training data as well as physics-informed machine learning surrogates to further reduce computational cost. It will also have the capacity to include automated experiments ``in the loop'' in order to dramatically accelerate the overall materials discovery pipeline.

Authors: Rachel C Kurchin (CMU); Eric Muckley (Citrine Informatics); Lance Kavalsky (CMU); Vinay Hegde (Citrine Informatics); Dhairya Gandhi (Julia Computing); Xiaoyu Sun (CMU); Matthew Johnson (MIT); Alan Edelman (MIT); James Saal (Citrine Informatics); Christopher V Rackauckas (Massachusetts Institute of Technology); Bryce Meredig (Citrine Informatics); Viral Shah (Julia Computing); Venkat Viswanathan (Carnegie Mellon University)

NeurIPS 2020 Artificial Intelligence, Machine Learning and Modeling for Understanding the Oceans and Climate Change (Proposals Track)
Abstract and authors: (click to expand)

Abstract: These changes will have a drastic impact on almost all forms of life in the ocean with further consequences on food security, ecosystem services in coastal and inland communities. Despite these impacts, scientific data and infrastructures are still lacking to understand and quantify the consequences of these perturbations on the marine ecosystem. Understanding this phenomenon is not only an urgent but also a scientifically demanding task. Consequently, it is a problem that must be addressed with a scientific cohort approach, where multi-disciplinary teams collaborate to bring the best of different scientific areas. In this proposal paper, we describe our newly launched four-years project focused on developing new artificial intelligence, machine learning, and mathematical modeling tools to contribute to the understanding of the structure, functioning, and underlying mechanisms and dynamics of the global ocean symbiome and its relation with climate change. These actions should enable the understanding of our oceans and predict and mitigate the consequences of climate change.

Authors: Nayat Sánchez Pi (Inria); Luis Martí (Inria); André Abreu (Fountation Tara Océans); Olivier Bernard (Inria); Colomban de Vargas (CNRS); Damien Eveillard (Univ. Nantes); Alejandro Maass (CMM, U. Chile); Pablo Marquet (PUC); Jacques Sainte-Marie (Inria); Julien Salomin (Inria); Marc Schoenauer (INRIA); Michele Sebag (LRI, CNRS, France)