Natural Language Processing

Tutorials

Blog Posts

Webinars

Workshop Papers

Venue Title
NeurIPS 2023 Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-Shot ML (Papers Track)
Abstract and authors: (click to expand)

Abstract: Consumer products contribute to >75% of global greenhouse gas (GHG) emissions, primarily through indirect contributions from the supply chain. Measurement of GHG emissions associated with products is crucial to quantify the impact of GHG emission abatement actions. Life cycle assessment (LCA), the scientific discipline for measuring GHG emissions, estimates the environmental impact of a product. Scaling LCA to millions of products is challenging as it requires extensive manual analysis by domain experts. To avoid repetitive analysis, environmental impact factors (EIF) of common materials and products are published for use by experts. However, finding appropriate EIFs for even a single product can require hundreds of hours of manual work, especially for complex products. We present Flamingo, an algorithm that leverages neural language models to automatically identify an appropriate EIF given a text description. A key challenge in automation is that EIF databases are incomplete. Flamingo uses industry sector classification as an intermediate layer to identify when there are no good matches in the database. On a dataset of 664 products, Flamingo achieves an EIF matching precision of 75%.

Authors: Bharathan Balaji (Amazon); Venkata Sai Gargeya Vunnava (amazon); Nina Domingo (Amazon); Shikhar Gupta (Amazon); Harsh Gupta (Amazon); Geoffrey Guest (Amazon); Aravind Srinivasan (Amazon); Kellen Axten (Amazon); Jared Kramer (Amazon)

NeurIPS 2023 How to Recycle: General Vision-Language Model without Task Tuning for Predicting Object Recyclability (Papers Track)
Abstract and authors: (click to expand)

Abstract: Waste segregation and recycling place a crucial role in fostering environmental sustainability. However, discerning the whether a material is recyclable or not poses a formidable challenge, primarily because of inadequate recycling guidelines to accommodate a diverse spectrum of objects and their varying conditions. We investigated the role of vision-language models in addressing this challenge. We curated a dataset consisting >1000 images across 11 disposal categories for optimal discarding and assessed the applicability of general vision-language models for recyclability classification. Our results show that Contrastive Language-Image Pre- training (CLIP) model, which is pretrained to understand the relationship between images and text, demonstrated remarkable performance in the zero-shot recycla- bility classification task, with an accuracy of 89%. Our results underscore the potential of general vision-language models in addressing real-world challenges, such as automated waste sorting, by harnessing the inherent associations between visual and textual information.

Authors: Eliot Park (Harvard Medical School); Eddy Pan (Harvard Medical School); Shreya Johri (Harvard Medical School); Pranav Rajpurkar (Harvard Medical School)

NeurIPS 2023 ClimateX: Do LLMs Accurately Assess Human Expert Confidence in Climate Statements? (Papers Track)
Abstract and authors: (click to expand)

Abstract: Evaluating the accuracy of outputs generated by Large Language Models (LLMs) is especially important in the climate science and policy domain. We introduce the Expert Confidence in Climate Statements (ClimateX) dataset, a novel, curated, expert-labeled dataset consisting of 8094 climate statements collected from the latest Intergovernmental Panel on Climate Change (IPCC) reports, labeled with their associated confidence levels. Using this dataset, we show that recent LLMs can classify human expert confidence in climate-related statements, especially in a few-shot learning setting, but with limited (up to 47%) accuracy. Overall, models exhibit consistent and significant over-confidence on low and medium confidence statements. We highlight implications of our results for climate communication, LLMs evaluation strategies, and the use of LLMs in information retrieval systems.

Authors: Romain Lacombe (Stanford University); Kerrie Wu (Stanford University); Eddie Dilworth (Stanford University)

NeurIPS 2023 Proof-of-concept: Using ChatGPT to Translate and Modernize an Earth System Model from Fortran to Python/JAX (Papers Track)
Abstract and authors: (click to expand)

Abstract: Earth system models (ESMs) are vital for understanding past, present, and future climate, but they suffer from legacy technical infrastructure. ESMs are primarily implemented in Fortran, a language that poses a high barrier of entry for early career scientists and lacks a GPU runtime, which has become essential for continued advancement as GPU power increases and CPU scaling slows. Fortran also lacks differentiability — the capacity to differentiate through numerical code — which enables hybrid models that integrate machine learning methods. Converting an ESM from Fortran to Python/JAX could resolve these issues. This work presents a semi-automated method for translating individual model components from Fortran to Python/JAX using a large language model (GPT-4). By translating the photosynthesis model from the Community Earth System Model (CESM), we demonstrate that the Python/JAX version results in up to 100x faster runtimes using GPU parallelization, and enables parameter estimation via automatic differentiation. The Python code is also easy to read and run and could be used by instructors in the classroom. This work illustrates a path towards the ultimate goal of making climate models fast, inclusive, and differentiable.

Authors: Anthony Zhou (Columbia University), Linnia Hawkins (Columbia University), Pierre Gentine (Columbia University)

NeurIPS 2023 Understanding Climate Legislation Decisions with Machine Learning (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Effective action is crucial in order to avert climate disaster. Key in enacting change is the swift adoption of climate positive legislation which advocates for climate change mitigation and adaptation. This is because government legislation can result in far-reaching impact, due to the relationships between climate policy, technology, and market forces. To advocate for legislation, current strategies aim to identify potential levers and obstacles, presenting an opportunity for the application of recent advances in machine learning language models. Here we propose a machine learning pipeline to analyse climate legislation, aiming to investigate the feasibility of natural language processing for the classification of climate legislation texts, to predict policy voting outcomes. By providing a model of the decision making process, the proposed pipeline can enhance transparency and aid policy advocates and decision makers in understanding legislative decisions, thereby providing a tool to monitor and understand legislative decisions towards climate positive impact.

Authors: Jeff Clark (University of Bristol); Michelle Wan (University of Cambridge); Raul Santos Rodriguez (University of Bristol)

NeurIPS 2023 Mapping the Landscape of Artificial Intelligence in Climate Change Research: A Meta-Analysis on Impact and Applications (Proposals Track)
Abstract and authors: (click to expand)

Abstract: This proposal advocates a comprehensive and systematic analysis aimed at mapping and characterizing the intricate landscape of Artificial Intelligence and Machine Learning applications and their impacts within the domain of climate change research, both in adaption and mitigation efforts. Notably, a significant upswing in this interdisciplinary intersection has been observed since 2020. Utilizing advanced topic clustering techniques and qualitative analysis, we have discerned 12 distinct macro areas that supplement, enrich, and expand upon those identified in prior research. The primary objective of this undertaking is to furnish a data-rich panoramic view and informative insights regarding the functions and tools of the mentioned disciplines. Our intention is to offer valuable guidance to the scholarly community and propel further research endeavors, encouraging meticulous examinations of research trends and gaps in addressing the formidable challenges posed by climate change and the climate crisis.

Authors: Christian Burmester (Osnabrück University); Teresa Scantamburlo (UniversityofVenice)

ICLR 2023 CaML: Carbon Footprinting of Products with Zero-Shot Semantic Text Similarity (Papers Track)
Abstract and authors: (click to expand)

Abstract: Estimating the embodied carbon in products is a key step towards understanding their impact, and undertaking mitigation actions. Precise carbon attribution is challenging at scale, requiring both domain expertise and granular supply chain data. As a first-order approximation, standard reports use Economic Input-Output based Life Cycle Assessment (EIO-LCA) which estimates carbon emissions per dollar at an industry sector level using transactions between different parts of the economy. For EIO-LCA, an expert needs to map each product to one of upwards of 1000 potential industry sectors. We present CaML, an algorithm to automate EIO-LCA using semantic text similarity matching by leveraging the text descriptions of the product and the industry sector. CaML outperforms the previous manually intensive method, yielding a MAPE of 22% with no domain labels.

Authors: Bharathan Balaji (Amazon); Venkata Sai Gargeya Vunnava (amazon); Geoffrey Guest (Amazon); Jared Kramer (Amazon)

ICLR 2023 Mapping global innovation networks around clean energy technologies (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Reaching net zero emissions requires rapid innovation and scale-up of clean tech. In this context, clean tech innovation networks (CTINs) can play a crucial role by pooling necessary resources and competences and enabling knowledge transfers between different actors. However, existing evidence on CTINs is limited due to a lack of comprehensive data. Here, we develop a machine learning framework to identify CTINs from announcements on social media to map the global CTIN landscape. Specifically, we classify the social media announcements regarding the type of technology (e.g., hydrogen, solar), interaction type (e.g., equity investment, R\&D collaboration), and status (e.g., commencement, update). We then extract referenced organizations via entity recognition. Thereby, we generate a large-scale dataset of CTINs across different technologies, countries, and over time. This allows us to compare characteristics of CTINs, such as the geographic proximity of actors, and to investigate the association between network evolution and technology innovation and diffusion. As a direct implication, our work helps policy makers to promote CTINs by identifying current barriers and needs.

Authors: Malte Toetzke (ETH Zurich); Francesco Re (ETH Zurich); Benedict Probst (ETH Zurich); Stefan Feuerriegel (LMU Munich); Laura Diaz Anadon (University of Cambridge); Volker Hoffmann (ETH Zurich)

ICLR 2023 Mining Effective Strategies for Climate Change Communication (Papers Track)
Abstract and authors: (click to expand)

Abstract: With the goal of understanding effective strategies to communicate about climate change, we build interpretable models to rank tweets related to climate change with respect to the engagement they generate. Our models are based on the Bradley-Terry model of pairwise comparison outcomes and use a combination of the tweets’ topic and metadata features to do the ranking. To remove confounding factors related to author popularity and minimise noise, they are trained on pairs of tweets that are from the same author and around the same time period and have a sufficiently large difference in engagement. The models achieve good accuracy on a held-out set of pairs. We show that we can interpret the parameters of the trained model to identify the topic and metadata features that contribute to high engagement. Among other observations, we see that topics related to climate projections, human cost and deaths tend to have low engagement while those related to mitigation and adaptation strategies have high engagement. We hope the insights gained from this study will help craft effective climate communication to promote engagement, thereby lending strength to efforts to tackle climate change.

Authors: Aswin Suresh (EPFL); Lazar Milikic (EPFL); Francis Murray (EPFL); Yurui Zhu (EPFL); Matthias Grossglauser (École Polytechnique Fédérale de Lausanne (EPFL))

NeurIPS 2022 Deep Climate Change: A Dataset and Adaptive domain pre-trained Language Models for Climate Change Related Tasks (Papers Track)
Abstract and authors: (click to expand)

Abstract: The quantity and quality of literature around climate change (CC) and its impacts are increasing yearly. Yet, this field has received limited attention in the Natural Language Processing (NLP) community. With the help of large Language Models (LMs) and transfer learning, NLP can support policymakers, researchers, and climate activists in making sense of large-scale and complex CC-related texts. CC-related texts include specific language that general language models cannot represent accurately. Therefore we collected a climate change corpus consisting of over 360 thousand abstracts of top climate scientists' articles from trustable sources covering large temporal and spatial scales. Comparison of the performance of GPT2 LM and our 'climateGPT2 models', fine-tuned on the CC-related corpus, on claim generation (text generation) and fact-checking, downstream tasks show the better performance of the climateGPT2 models compared to the GPT2. The climateGPT2 models decrease the validation loss to 1.08 for claim generation from 43.4 obtained by GPT2. We found that climateGPT2 models improved the masked language model objective for the fact-checking task by increasing the F1 score from 0.67 to 0.72.

Authors: Saeid Vaghefi (University of Zürich); Veruska Muccione (University of Zürich); Christian Huggel (University of Zürich); Hamed Khashehchi (2w2e GmbH); Markus Leippold (University of Zurich)

NeurIPS 2022 Temperature impacts on hate speech online: evidence from four billion tweets (Papers Track)
Abstract and authors: (click to expand)

Abstract: Human aggression is no longer limited to the physical space but exists in the form of hate speech on social media. Here, we examine the effect of temperature on the occurrence of hate speech on Twitter and interpret the results in the context of climate change, human behavior and mental health. Employing supervised machine learning models, we identify hate speech in a data set of four billion geolocated tweets from over 750 US cities (2014 – 2020). We statistically evaluate the changes in daily hate tweets against changes in local temperature, isolating the temperature influence from confounding factors using binned panel-regression models. We find a low prevalence of hate tweets in moderate temperatures and observe sharp increases of up to 12% for colder and up to 22% for hotter temperatures, indicating that not only hot but also cold temperatures increase aggressive tendencies. Further, we observe that for extreme temperatures hate speech also increases as a percentage of total tweeting activity, crowding out non-hate speech. The quasi-quadratic shape of the temperature-hate tweet curve is robust across varying climate zones, income groups, religious and political beliefs. The prevalence of the results across climatic and socioeconomic splits points to limits in adaptation. Our results illuminate hate speech online as an impact channel through which temperature alters societal aggression.

Authors: Annika Stechemesser (Potsdam Insitute for Climate Impact Research); Anders Levermann (Potsdam Institute for Climate Impact Research); Leonie Wenz (Potsdam Institute for Climate Impact Research)

NeurIPS 2022 TCFD-NLP: Assessing alignment of climate disclosures using NLP for the financial markets (Papers Track)
Abstract and authors: (click to expand)

Abstract: Climate-related disclosure is increasing in importance as companies and stakeholders alike aim to reduce their environmental impact and exposure to climate-induced risk. Companies primarily disclose this information in annual or other lengthy documents where climate information is not the sole focus. To assess the quality of a company's climate-related disclosure, these documents, often hundreds of pages long, must be reviewed manually by climate experts. We propose a more efficient approach to assessing climate-related financial information. We construct a model leveraging TF-IDF, sentence transformers and multi-label k nearest neighbors (kNN). The developed model is capable of assessing alignment of climate disclosures at scale, with a level of granularity and transparency that will support decision-making in the financial markets with relevant climate information. In this paper, we discuss the data that enabled this project, the methodology, and how the resulting model can drive climate impact.

Authors: Rylen Sampson (Manifest Climate); Aysha Cotterill (Manifest Climate); Quoc Tien Au (Manifest Climate)

NeurIPS 2022 Climate Policy Tracker: Pipeline for automated analysis of public climate policies (Papers Track)
Abstract and authors: (click to expand)

Abstract: The number of standardized policy documents regarding climate policy and their publication frequency is significantly increasing. The documents are long and tedious for manual analysis, especially for policy experts, lawmakers, and citizens who lack access or domain expertise to utilize data analytics tools. Potential consequences of such a situation include reduced citizen governance and involvement in climate policies and an overall surge in analytics costs, rendering less accessibility for the public. In this work, we use a Latent Dirichlet Allocation-based pipeline for the automatic summarization and analysis of 10-years of national energy and climate plans (NECPs) for the period from 2021 to 2030, established by 27 Member States of the European Union. We focus on analyzing policy framing, the language used to describe specific issues, to detect essential nuances in the way governments frame their climate policies and achieve climate goals. The methods leverage topic modeling and clustering for the comparative analysis of policy documents across different countries. It allows for easier integration in potential user-friendly applications for the development of theories and processes of climate policy. This would further lead to better citizen governance and engagement over climate policies and public policy research.

Authors: Artur Żółkowski (Warsaw University of Technology); Mateusz Krzyziński (Warsaw University of Technology); Piotr Wilczyński (Warsaw University of Technology); Stanisław Giziński (University of Warsaw); Emilia Wiśnios (University of Warsaw); Bartosz Pieliński (University of Warsaw); Julian Sienkiewicz (Warsaw University of Technology); Przemysław Biecek (Warsaw University of Technology)

NeurIPS 2022 Topic correlation networks inferred from open-ended survey responses reveal signatures of ideology behind carbon tax opinion (Papers Track)
Abstract and authors: (click to expand)

Abstract: Ideology can often render policy design ineffective by overriding what, at face value, are rational incentives. A timely example is carbon pricing, whose public support is strongly influenced by ideology. As a system of ideas, ideology expresses itself in the way people explain themselves and the world. As an object of study, ideology is then amenable to a generative modelling approach within the text-as-data paradigm. Here, we analyze the structure of ideology underlying carbon tax opinion using topic models. An idea, termed a topic, is operationalized as the fixed set of proportions with which words are used when talking about it. We characterize ideology through the relational structure between topics. To access this latent structure, we use the highly expressive Structural Topic Model to infer topics and the weights with which individual opinions mix topics. We fit the model to a large dataset of open-ended survey responses of Canadians elaborating on their support of or opposition to the tax. We propose and evaluate statistical measures of ideology in our data, such as dimensionality and heterogeneity. Finally, we discuss the implications of the results for transition policy in particular, and of our approach to analyzing ideology for computational social science in general.

Authors: Maximilian Puelma Touzel (Mila)

NeurIPS 2022 Analyzing the global energy discourse with machine learning (Proposals Track)
Abstract and authors: (click to expand)

Abstract: To transform our economy towards net-zero emissions, industrial development of clean energy technologies (CETs) to replace fossil energy technologies (FETs) is crucial. Although the media has great power in influencing consumer behavior and decision making in business and politics, its role in the energy transformation is still underexplored. In this paper, we analyze the global energy discourse via machine learning. For this, we collect a large-scale dataset with ~5 million news articles from seven of the world’s major CO2 emitting countries, covering eight CETs and four FETs. Using machine learning, we then analyze the content of news articles on a highly granular level and along several dimensions, namely relevance (for the energy discourse), context (e.g., costs, regulation, investment), and connotations (e.g., high/increasing vs. low/decreasing costs). By linking empirical discourse patterns to investment and deployment data of CETs and FETs, this study advances the current understanding about the role of the media in the energy transformation. Thereby, it enables businesses, investors, and policy makers to respond more effectively to sensitive topics in the media discourse and leverage windows of opportunity for scaling CETs.

Authors: Malte Toetzke (ETH Zurich); Benedict Probst (ETH Zurich); Yasin Tatar (ETH Zurich); Stefan Feuerriegel (LMU Munich); Volker Hoffmann (ETH Zurich)

NeurIPS 2022 CliMedBERT: A Pre-trained Language Model for Climate and Health-related Text (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Climate change is threatening human health in unprecedented orders and many ways. These threats are expected to grow unless effective and evidence-based policies are developed and acted upon to minimize or eliminate them. Attaining such a task requires the highest degree of the flow of knowledge from science into policy. The multidisciplinary, location-specific, and vastness of published science makes it challenging to keep track of novel work in this area, as well as making the traditional knowledge synthesis methods inefficient in infusing science into policy. To this end, we consider developing multiple domain-specific language models (LMs) with different variations from Climate- and Health-related information, which can serve as a foundational step toward capturing available knowledge to enable solving different tasks, such as detecting similarities between climate- and health-related concepts, fact-checking, relation extraction, evidence of health effects to policy text generation, and more. To our knowledge, this is the first work that proposes developing multiple domain-specific language models for the considered domains. We will make the developed models, resources, and codebase available for the researchers.

Authors: Babak Jalalzadeh Fard (University of Nebraska Medical Center); Sadid A. Hasan (Microsoft); Jesse E. Bell (University of Nebraska Medical Center)

AAAI FSS 2022 AI-Based Text Analysis for Evaluating Food Waste Policies
Abstract and authors: (click to expand)

Abstract: Food waste is a major contributor to climate change, making the reduction of food waste one of the most important strategies to preserve threatened ecosystems and increase economic benefits. To evaluate the impact of food waste policies in this arena and provide actionable guidance to policymakers, we conducted an AI-based text analysis of food waste policy provisions. Specifically, we a) identified commonalities across state policy texts, b) clustered states by shared policy text, and c) examined relationships between state cluster memberships and food waste . This approach generated state clusters but demonstrated very limited convergent validity with policy ratings provided by subject matter experts and no predictive validity with food waste. We discuss the potential of using supervised machine learning to analyze food waste policy text as a next step.

Authors: John Aitken (The MITRE Corporation), Denali Rao (The MITRE Corporation), Balca Alaybek (The MITRE Corporation), Amber Sprenger (The MITRE Corporation), Grace Mika (The MITRE Corporation), Rob Hartman (The MITRE Corporation) and Laura Leets (The MITRE Corporation)

AAAI FSS 2022 KnowUREnvironment: An Automated Knowledge Graph for Climate Change and Environmental Issues
Abstract and authors: (click to expand)

Abstract: Despite climate change being one of the greatest threats to humanity, many people are still in denial or lack motivation for appropriate action. A structured source of knowledge can help increase public awareness while also helping crucial natural language understanding tasks such as information retrieval, question answering, and recommendation systems. We introduce KnowUREnvironment – a knowledge graph for climate change and related environmental issues, extracted from the scientific literature. We automatically identify 210,230 domain-specific entities/concepts and encode how these concepts are interrelated with 411,860 RDF triples backed up with evidence from the literature, without using any supervision or human intervention. Human evaluation shows our extracted triples are syntactically and factually correct (81.69% syntactic correctness and 75.85% precision). The proposed framework can be easily extended to any domain that can benefit from such a knowledge graph.

Authors: Md Saiful Islam (University of Rochester), Adiba Proma (University of Rochester), Yilin Zhou (University of Rochester), Syeda Nahida Akter (Carnegie Mellon University), Caleb Wohn (University of Rochester) and Ehsan Hoque (University of Rochester)

AAAI FSS 2022 ClimateBert: A Pretrained Language Model for Climate-Related Text
Abstract and authors: (click to expand)

Abstract: Over the recent years, large pretrained language models (LM) have revolutionized the field of natural language processing (NLP). However, while pretraining on general language has been shown to work very well for common language, it has been observed that niche language poses problems. In particular, climate-related texts include specific language that common LMs can not represent accurately. We argue that this shortcoming of today's LMs limits the applicability of modern NLP to the broad field of text processing of climate-related texts. As a remedy, we propose ClimateBert, a transformer-based language model that is further pretrained on over 2 million paragraphs of climate-related texts, crawled from various sources such as common news, research articles, and climate reporting of companies. We find that ClimateBert leads to a 48% improvement on a masked language model objective which, in turn, leads to lowering error rates by 3.57% to 35.71% for various climate-related downstream tasks like text classification, sentiment analysis, and fact-checking

Authors: Nicolas Webersinke (FAU Erlangen-Nürnberg), Mathias Kraus (FAU Erlangen-Nürnberg), Julia Anna Bingler (ETH Zurich) and Markus Leippold (UZH Zurich)

AAAI FSS 2022 The Impact of TCFD Reporting - A New Application of Zero-Shot Analysis to Climate-Related Financial Disclosures
Abstract and authors: (click to expand)

Abstract: We examine climate-related disclosures in 3,335 reports based on a sample of 188 banks that officially endorsed the recommendations of the Task Force for Climate-related Financial Disclosures (TCFD). In doing so, we introduce a new application of zero-shot text classification based on the BART model and a MNLI task. By developing a set of robust and fine-grained labels, we show that zero-shot analysis provides high accuracy in analyzing companies’ climate-related reporting without further model training. We are able to demonstrate that banks that support the TCFD increase their level of disclosure after officially declaring their support for the guidelines, although we also find significant differences depending on the topic of disclosure. Our findings yield important conclusions for the design of climate-related disclosures.

Authors: Alix Auzepy (Justus-Liebig-Universität Gießen), Elena Tönjes (Justus-Liebig-Universität Gießen) and Christoph Funk (Justus-Liebig-Universität Gießen)

AAAI FSS 2022 Using Natural Language Processing for Automating the Identification of Climate Action Interlinkages within the Sustainable Development Goals
Abstract and authors: (click to expand)

Abstract: Climate action, Goal 13 of the UN Sustainable Development Goals (SDG), cuts across almost all SDGs. Achieving climate goals can reinforce the achievements in many other goals, but at the same time climate mitigation and adaptation measures may generate trade-offs, such as levelling the cost of energy and transitioning away from fossil fuels. Leveraging the synergies and minimizing the trade-offs among the climate goals and other SDGs is an imperative task for ensuring policy coherence. Understanding the interlinkages between climate action and other SDGs can help inform about the synergies and trade-offs. This paper presents a novel methodology by using natural language processing (NLP) to automate the process of systematically identifying the key interlinkages between climate action and SDGs from a large amount of climate literature. A qualitative SDG interlinkages model for climate action was automatically generated and visualized in a network graph. This work contributes to the conference thematic topic on using AI for policy alignment for climate change goals, SDGs and associated environmental, social and governance (ESG) frameworks.

Authors: Xin Zhou (Institute for Global Environmental Strategies (IGES)), Kshitij Jain (Google Inc.), Mustafa Moinuddin (Institute for Global Environmental Strategies (IGES)) and Patrick McSharry (Carnegie Mellon University Africa; Oxford Man Institute of Quantitative Finance, Oxford University)

NeurIPS 2021 A Deep Learning application towards transparent communication for Payment for Forest Environmental Services (PES) (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Deforestation accounts for more than 20% of global emission. Payments for Environmental Services (PES) is seen by both policy makers and practitioners as an effective market-based instrument to provide financial incentives for forest owners, particularly poor and indigenous households in developing countries. It is a critical instrument to protect forests, and ultimately to mitigate climate change and reduce emission from deforestation. However, previous studies have pointed out a key challenge for PES is to ensure transparent payment to local people, due to i) weak monitoring and evaluation and ii) indigenous inaccessibility to e-banking and complying with procedural and administrative paper works to receive payments. Specifically, the amount and the complexity of forms along with the language barriers is a key issue; and most transactions need several intermediaries and transaction costs which reduce the payments reaching landowners. To address these issues, we propose a communication platform that links across the stakeholders and processes. Our proposal will utilize Machine Learning techniques to lower the language barrier and provide technology solutions to help indigenous people to access payments. This would also help improve the effectiveness and transparency of PES schemes. Specifically, we propose the use of Natural Language Processing techniques in providing a speech-to-text and auto translation capability, and the use of Graph Neural Network to provide link predictions of transaction types, volumes and values. The pathway to impact will be forest protection and local livelihood through providing financial incentives, and subsequently contribution to more carbon sequestration and storage – a key issue in climate change mitigation.

Authors: Lan HOANG (IBM Research); Thuy Thu Phan (Center for International Forestry Research (CIFOR))

NeurIPS 2021 A NLP-based Analysis of Alignment of Organizations' Climate-Related Risk Disclosures with Material Risks and Metrics (Proposals Track)
Abstract and authors: (click to expand)

Abstract: The Sustainability Accounting Standards Board (SASB) establishes standards to guide the disclosures of material sustainability and ESG (Environment, Social, Governance)-related information across industries. The availability of quality, comparable and decision-useful information is required to assess risks and opportunities later integrated into financial decision-making. Particularly, standardized, industry-specific climate risk metrics and topics can support these efforts. SASB’s latest climate risk technical bulletin introduces three climate-related risks that are financially material - physical, transition and regulatory risks - and maps these across industries. The main objective of this work is to create a framework that can analyze climate related risk disclosures using an AI-based tool that automatically extracts and categorizes climate-related risks and related metrics from company disclosures based on SASB’s latest climate risk guidance. This process will help with automating large-scale analysis and add much-needed transparency vis-a-vis the current state of climate-related disclosures, while also assessing how far along companies are currently disclosing information on climate risks relevant to their industry. As it stands, this much needed type of analysis is made mostly manually or using third-party metrics, often opaque and biased, as proxies. In this work, we will first create a climate risk glossary that will be trained on a large amount of climate risk text. By combining climate risk keywords in this glossary with recent advances in natural language processing (NLP), we will then be able to quantitatively and qualitatively compare climate risk information in different sectors and industries using a novel climate risk score that will be based on SASB standards.

Authors: Elham Kheradmand (University of Montreal); Didier Serre (Clearsum); Manuel Morales (University of Montreal); Cedric B Robert (Clearsum)

ICML 2021 Challenges in Applying Audio Classification Models to Datasets Containing Crucial Biodiversity Information (Papers Track)
Abstract and authors: (click to expand)

Abstract: The acoustic signature of a natural soundscape can reveal consequences of climate change on biodiversity. Hardware costs, human labor time, and expertise dedicated to labeling audio are impediments to conducting acoustic surveys across a representative portion of an ecosystem. These barriers are quickly eroding away with the advent of low-cost, easy to use, open source hardware and the expansion of the machine learning field providing pre-trained neural networks to test on retrieved acoustic data. One consistent challenge in passive acoustic monitoring (PAM) is a lack of reliability from neural networks on audio recordings collected in the field that contain crucial biodiversity information that otherwise show promising results from publicly available training and test sets. To demonstrate this challenge, we tested a hybrid recurrent neural network (RNN) and convolutional neural network (CNN) binary classifier trained for bird presence/absence on two Peruvian bird audiosets. The RNN achieved an area under the receiver operating characteristics (AUROC) of 95% on a dataset collected from Xeno-canto and Google’s AudioSet ontology in contrast to 65% across a stratified random sample of field recordings collected from the Madre de Dios region of the Peruvian Amazon. In an attempt to alleviate this discrepancy, we applied various audio data augmentation techniques in the network’s training process which led to an AUROC of 77% across the field recordings.

Authors: Jacob G Ayers (UC San Diego); Yaman Jandali (University of California, San Diego); Yoo-Jin Hwang (Harvey Mudd College); Erika Joun (University of California, San Diego); Gabriel Steinberg (Binghampton University); Mathias Tobler (San Diego Zoo Wildlife Alliance); Ian Ingram (San Diego Zoo Wildlife Alliance); Ryan Kastner (University of California San Diego); Curt Schurgers (University of California San Diego)

ICML 2021 Automated Identification of Climate Risk Disclosures in Annual Corporate Reports (Papers Track)
Abstract and authors: (click to expand)

Abstract: It is important for policymakers to understand which financial policies are effective in increasing climate risk disclosure in corporate reporting. We use machine learning to automatically identify disclosures of five different types of climate-related risks. For this purpose, we have created a dataset of over 120 manually-annotated annual reports by European firms. Applying our approach to reporting of 337 firms over the last 20 years, we find that risk disclosure is increasing. Disclosure of transition risks grows more dynamically than physical risks, and there are marked differences across industries. Country-specific dynamics indicate that regulatory environments potentially have an important role to play for increasing disclosure.

Authors: David Friederich (University of Bern); Lynn Kaack (ETH Zurich); Sasha Luccioni (Mila); Bjarne Steffen (ETH Zurich)

ICML 2021 TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data (Papers Track)
Abstract and authors: (click to expand)

Abstract: Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.

Authors: Beichen Zhang (University of Nebraska-Lincoln); Frank Schilder (Thomson Reuters); Kelly Smith (National Drought Mitigation Center); Michael Hayes (University of Nebraska-Lincoln); Sherri Harms (University of Nebraska-Kearney); Tsegaye Tadesse (University of Nebraska-Lincoln)

ICML 2021 DeepPolicyTracker: Tracking Changes In Environmental Policy In The Brazilian Federal Official Gazette With Deep Learning (Papers Track)
Abstract and authors: (click to expand)

Abstract: Even though most of its energy generation comes from renewable sources, Brazil is one of the largest emitters of greenhouse gases in the world, due to intense farming and deforestation of biomes, such as the Amazon Rainforest, whose preservation is essential for compliance with the Paris Agreement. Still, regardless of lobbies or prevailing political orientation, all government legal actions are published daily in the Federal Official Gazette. However, with hundreds of decrees issued every day by the authorities, it is absolutely burdensome to manually analyze all these processes and find out which ones can pose serious environmental hazards. In this paper, we propose the DeepPolicyTracker, a promising deep learning model that uses a state-of-the-art pre-trained natural language model to classify government acts and track harmful changes in the environmental policies. We also provide the used dataset annotated by domain experts and show some results already obtained. In the future, this system should serve to scale up the high-quality tracking of all oficial documents with a minimum of human supervision and contribute to increasing society's awareness of every government action.

Authors: Flávio N Cação (University of Sao Paulo); Anna Helena Reali Costa (Universidade de São Paulo); Natalie Unterstell (Política por Inteiro); Liuca Yonaha (Política por Inteiro); Taciana Stec (Política por Inteiro); Fábio Ishisaki (Política por Inteiro)

ICML 2021 BERT Classification of Paris Agreement Climate Action Plans (Papers Track)
Abstract and authors: (click to expand)

Abstract: As the volume of text-based information on climate policy increases, natural language processing (NLP) tools can distill information from text to better inform decision making on climate policy. We investigate how large pretrained transformers based on the BERT architecture classify sentences on a dataset of climate action plans which countries submitted to the United Nations following the 2015 Paris Agreement. We use the document header structure to assign noisy policy-relevant labels such as mitigation, adaptation, energy, and land use to text elements. Our models provide an improvement in out-of-sample classification over simple heuristics though fall short of the consistency observed between human annotators. We hope to extend this framework to a wider class of textual climate change data such as climate legislation and corporate social responsibility filings and build tools to streamline the extraction of information from these documents for climate change researchers.

Authors: Tom Corringham (Scripps Institution of Oceanography); Daniel Spokoyny (Carnegie Mellon University); Eric Xiao (University of California San Diego); Christopher Cha (University of California San Diego); Colin Lemarchand (University of California San Diego); Mandeep Syal (University of California San Diego); Ethan Olson (University of California San Diego); Alexander Gershunov (Scripps Institution of Oceanography)

ICML 2021 Powering Effective Climate Communication with a Climate Knowledge Base (Proposals Track)
Abstract and authors: (click to expand)

Abstract: While many accept climate change and its growing impacts, few converse about it well, limiting the adoption speed of societal changes necessary to address it. In order to make effective climate communication easier, we aim to build a system that presents to any individual the climate information predicted to best motivate and inspire them to take action given their unique set of personal values. To alleviate the cold-start problem, the system relies on a knowledge base (ClimateKB) of causes and effects of climate change, and their associations to personal values. Since no such comprehensive ClimateKB exists, we revisit knowledge base construction techniques and build a ClimateKB from free text. We plan to open source the ClimateKB and associated code to encourage future research and applications.

Authors: Kameron B. Rodrigues (Stanford University); Shweta Khushu (SkySpecs Inc); Mukut Mukherjee (ClimateMind); Andrew Banister (Climate Mind); Anthony Hevia (ClimateMind); Sampath Duddu (ClimateMind); Nikita Bhutani (Megagon Labs)

ICML 2021 From Talk to Action with Accountability: Monitoring the Public Discussion of Policy Makers with Deep Neural Networks and Topic Modelling (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Decades of research on climate have provided a consensus that human activity has changed the climate and we are currently heading into a climate crisis. While public discussion and research efforts on climate change mitigation have increased, potential solutions need to not only be discussed but also effectively deployed. For preventing mismanagement and holding policy makers accountable, transparency and degree of information about government processes have been shown to be crucial. However, currently the quantity of information about climate change discussions and the range of sources make it increasingly difficult for the public and civil society to maintain an overview to hold politicians accountable. In response, we propose a multi-source topic aggregation system (MuSTAS) which processes policy makers speech and rhetoric from several publicly available sources into an easily digestible topic summary. MuSTAS uses novel multi-source hybrid latent Dirichlet allocation to model topics from a variety of documents. This topic digest will serve the general public and civil society in assessing where, how, and when politicians talk about climate and climate policies, enabling them to hold politicians accountable for their actions to mitigate climate change and lack thereof.

Authors: Vili Hätönen (Emblica); Fiona Melzer (University of Edinburgh)

ICML 2021 NeuralNERE: Neural Named Entity Relationship Extraction for End-to-End Climate Change Knowledge Graph Construction (Proposals Track)
Abstract and authors: (click to expand)

Abstract: This paper proposes an end-to-end Neural Named Entity Relationship Extraction model (called NeuralNERE) for climate change knowledge graph (KG) construction, directly from the raw text of relevant news articles. The proposed model will not only remove the need for any kind of human supervision for building knowledge bases for climate change KG construction (used in the case of supervised or dictionary-based KG construction methods), but will also prove to be highly valuable for analyzing climate change by summarising relationships between different factors responsible for climate change, extracting useful insights & reasoning on pivotal events, and helping industry leaders in making more informed future decisions. Additionally, we also introduce the Science Daily Climate Change dataset (called SciDCC) that contains over 11k climate change news articles scraped from the Science Daily website, which could be used for extracting prior knowledge for constructing climate change KGs.

Authors: Prakamya Mishra (Independent Researcher); Rohan Mittal (Independent Researcher)

NeurIPS 2020 Analyzing Sustainability Reports Using Natural Language Processing (Papers Track)
Abstract and authors: (click to expand)

Abstract: Climate change is a far-reaching, global phenomenon that will impact many aspects of our society, including the global stock market. In recent years, companies have increasingly been aiming to both mitigate their environmental impact and adapt their practices the changing climate context. This is reported via increasingly exhaustive reports, which cover many types of sustainability measures, often under the umbrella of Environmental, Social, and Governance (ESG) disclosures. However, given this abundance of data, sustainability analysts are obliged to comb through hundreds of pages of reports in order to find relevant information. We have leveraged recent progress in Natural Language Processing (NLP) to create a custom model, ClimateQA, which allows the analysis of financial reports in order to identify climate-relevant sections using a question answering approach. We present this tool and the methodology that we used to develop it in the present article.

Authors: Sasha Luccioni (Mila); Emi Baylor (McGill); Nicolas Duchene (Universite de Montreal)

NeurIPS 2020 Using attention to model long-term dependencies in occupancy behavior (Papers Track)
Abstract and authors: (click to expand)

Abstract: Over the past years, more and more models have been published that aim to capture relationships in human residential behavior. Most of these models are different Markov variants or regression models that have a strong assumption bias and are therefore unable to capture complex long-term dependencies and the diversity in occupant behavior. This work shows that attention based models are able to capture complex long-term dependencies in occupancy behavior and at the same time adequately depict the diversity in behavior across the entire population and different socio-demographic groups. By combining an autoregressive generative model with an imputation model, the advantages of two data sets are combined and new data are generated which are beneficial for multiple use cases (e.g. generation of consistent household energy demand profiles). The two step approach generates synthetic activity schedules that have similar statistical properties as the empirical collected schedules and do not contain direct information about single individuals. Therefore, the presented approach forms the basis to make data on occupant behavior freely available, so that further investigations based on the synthetic data can be carried out without a large data application effort. In future work it is planned to take interpersonal dependencies into account in order to be able to generate entire household behavior profiles.

Authors: Max Kleinebrahm (Karlsruhe Institut für Technologie); Jacopo Torriti (University Reading); Russell McKenna (University of Aberdeen); Armin Ardone (Karlsruhe Institut für Technologie); Wolf Fichtner (Karlsruhe Institute of Technology)

NeurIPS 2020 Narratives and Needs: Analyzing Experiences of Cyclone Amphan Using Twitter Discourse (Papers Track)
Abstract and authors: (click to expand)

Abstract: People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identify unmet needs in response to Cyclone Amphan, which affected 18 million people in May 2020.

Authors: Ancil S Crayton (Booz Allen Hamilton); Joao Fonseca (NOVA Information Management School); Kanav Mehra (Independent Researcher); Jared Ross (Booz Allen Hamilton); Marcelo Sandoval-Castañeda (New York University Abu Dhabi); Michelle Ng (International Water Management Institute); Rachel von Gnechten (International Water Management Institute)

NeurIPS 2020 Emerging Trends of Sustainability Reporting in the ICT Industry: Insights from Discriminative Topic Mining (Papers Track)
Abstract and authors: (click to expand)

Abstract: The Information and Communication Technologies (ICT) industry has a considerable climate change impact and accounts for approximately 3 percent of global carbon emissions. Despite the increasing availability of sustainability reports provided by ICT companies, we still lack a systematic understanding of what has been disclosed at an industry level. In this paper, we make the first major effort to use modern unsupervised learning methods to investigate the sustainability reporting themes and trends of the ICT industry over the past two decades. We build a cross-sector dataset containing 22,534 environmental reports from 1999 to 2019, of which 2,187 are ICT specific. We then apply CatE, a text embedding based topic modeling method, to mine specific keywords that ICT companies use to report on climate change and energy. As a result, we identify (1) important shifts in ICT companies' climate change narratives from physical metrics towards climate-related disasters, (2) key organizations with large influence on ICT companies, and (3) ICT companies' increasing focus on data center and server energy efficiency.

Authors: Lin Shi (Stanford University); Nhi Truong Vu (Stanford University)

NeurIPS 2020 Climate-FEVER: A Dataset for Verification of Real-World Climate Claims (Papers Track)
Abstract and authors: (click to expand)

Abstract: Our goal is to introduce \textsc{climate-fever}, a new publicly available dataset for verification of climate change-related claims. By providing a dataset for the research community, we aim to help and encourage work on improving algorithms for retrieving climate-specific information and detecting fake news in social and mass media to reduce the impact of misinformation on the formation of public opinion on climate change. We adapt the methodology of \textsc{fever} \cite{thorne2018fever}, the largest dataset of artificially designed claims, to real-life claims collected from the Internet. Although during this process, we could count on the support of renowned climate scientists, it turned out to be no easy task. We discuss the surprising, subtle complexity of modeling real-world climate-related claims within the \textsc{fever} framework, which provides a valuable challenge for general natural language understanding. We hope that our work will mark the beginning of an exciting long-term joint effort by the climate science and \textsc{ai} community to develop robust algorithms to verify the facts for climate-related claims.

Authors: Markus Leippold (University of Zurich); Thomas Diggelmann (ETH Zurich)

NeurIPS 2020 ClimaText: A Dataset for Climate Change Topic Detection (Papers Track)
Abstract and authors: (click to expand)

Abstract: Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering, and fact-checking. However, automating this process is a challenge, as climate change is a complex, fast-moving, and often ambiguous topic with scarce resources for popular text-based AI tasks. In this paper, we introduce \textsc{ClimaText}, a dataset for sentence-based climate change topic detection, which we make publicly available. We explore different approaches to identify the climate change topic in various text sources. We find that popular keyword-based models are not adequate for such a complex and evolving task. Context-based algorithms like BERT~\cite{devlin2018bert} can detect, in addition to many trivial cases, a variety of complex and implicit topic patterns. Nevertheless, our analysis reveals a great potential for improvement in several directions, such as, e.g., capturing the discussion on indirect effects of climate change. Hence, we hope this work can serve as a good starting point for further research on this topic.

Authors: Markus Leippold (University of Zurich); Francesco Saverio Varini (ETH)

NeurIPS 2020 Expert-in-the-loop Systems Towards Safety-critical Machine Learning Technology in Wildfire Intelligence (Proposals Track)
Abstract and authors: (click to expand)

Abstract: With the advent of climate change, wildfires are becoming more frequent and severe across several regions worldwide. To prevent and mitigate its effects, wildfire intelligence plays a pivotal role, e.g. to monitor the evolution of wildfires and for early detection in high-risk areas such as wildland-urban-interface regions. Recent works have proposed deep learning solutions for fire detection tasks, however the current limited databases prevent reliable real-world deployments. We propose the development of expert-in-the-loop systems that combine the benefits of semi-automated data annotation with relevant domain knowledge expertise. Through this approach we aim to improve the data curation process and contribute to the generation of large-scale image databases for relevant wildfire tasks and empower the application of machine learning techniques in wildfire intelligence in real scenarios.

Authors: Maria João Sousa (IDMEC, Instituto Superior Técnico, Universidade de Lisboa); Alexandra Moutinho (IDMEC, Instituto Superior Técnico, Universidade de Lisboa); Miguel Almeida (ADAI, University of Coimbra)