Chemistry & Materials


Blog Posts


Workshop Papers

Venue Title
ICLR 2024 Analyzing the secondary wastewater-treatment process using Faster R-CNN and YOLOv5 object detection algorithms (Papers Track)
Abstract and authors: (click to expand)

Abstract: The activated sludge (AS) process is the most common type of secondary wastewater treatment, applied worldwide. Due to the complexity of microbial communities, imbalances between the different types of bacteria may occur and disturb the process, with pronounced economical and environmental consequences. Microscopic inspection of the morphology of flocs and microorganisms provides key information on AS properties and function. This is a time-consuming, highly skilled, and expensive process that is not readily available in all locations. Thus, most wastewater-treatment plants do not carry out this essential analysis, resulting in frequent operational faults. In this study, we develop a novel deep learning (DL) object detection algorithm to analyze and monitor the AS process based on a unique microscopic image database of flocs and microorganisms. Specifically, we applied YOLOv5 and Faster R-CNN algorithms as tools for segmentation and object detection to analyze the wastewater. The mean average precision (mAP) of the YOLOv5 was 0.67, outperforming the Faster R-CNN by 15%. Histogram equalization preprocessing of both bright-field and phase-contrast images significantly improved the results of the algorithm in all classes. In the case of YOLOv5, the mAP increased by 16.67%, to 0.77, where the AP of protozoa, filaments, and open floc classes outperformed the previous model by over 20%. These results demonstrate the potential of leveraging DL algorithms to enhance the analysis and monitoring of WWTPs in an affordable manner, consequently reducing environmental pollution caused by contaminated effluent. The fundamental challenge addressed herein has important global relevance, especially in an era in which the demand for high-quality wastewater reuse is expected to increase dramatically.

Authors: Offir Inbar (Tel-Aviv University); Moni Shahar (Tel Aviv University); Jacob Gidron (Tel-Aviv University); Ido Cohen (Tel-Aviv University); Dror Avisar (Tel-Aviv University)

ICLR 2024 Explaining Zeolite Synthesis-Structure Relationships using Aggregated SHAP Analysis (Papers Track)
Abstract and authors: (click to expand)

Abstract: Zeolites, crystalline aluminosilicate materials with well-defined porous structures, have emerged as versatile materials with applications in carbon capture. Hydrothermal synthesis is a widely used method for zeolite production, offering control over crystallinity and and pore size. However, the intricate interplay of synthesis parameters necessitates a comprehensive understanding to optimize the synthesis process. We train a supervised classification machine learning model on ZeoSyn (a dataset of zeolite synthesis routes) to predict the zeolite framework product given a synthesis route. Subsequently, we leverage SHapley Additive Explanations (SHAP) to reveal key synthesis-structure relationships in zeolites. To that end, we introduce an aggregation SHAP approach to extend such analysis to explain the formation of composite building units (CBUs) of zeolites. Analysis at this unprecedented scale sheds light on key synthesis parameters driving zeolite crystallization.

Authors: Elton Pan (MIT)

ICLR 2024 Literature Mining with Large Language Models to Assist the Development of Sustainable Building Materials (Papers Track)
Abstract and authors: (click to expand)

Abstract: Concrete industry, as one of the significant sources of carbon emissions, drives the urgency for its decarbonization that requires a shift to alternative materials. However, the absence of systematic knowledge summary remains a challenge for further development of sustainable building materials. This work offers a cost-efficient strategy for information extraction tasks in complex terminology settings using small (2.8B) large language models (LLMs) with well-designed instruction-completion schemes and fine-tuning strategies, introducing a dataset cataloging civil engineering applications of alternative materials. The Multiple Choice instruction scheme significantly improves model accuracies in entity inference from non-Noun-Phrase sources, with supervised fine-tuning benefiting from straightforward tokenized representations of choices. We also demonstrate the utility of the dataset by extracting valuable insights into promising applications of alternative materials from knowledge graph representations.

Authors: Yifei Duan (Massachusetts Institute of Technology); Yixi Tian (Massachusetts Institute of Technology); Soumya Ghosh (IBM Research); Richard Goodwin (IBM T.J. Watson Research Center); Vineeth Venugopal (Massachusetts Institute of Technology); Jeremy Gregory (Massachusetts Institute of Technology); Jie Chen (IBM Research); Elsa Olivetti (Massachusetts Institute of Technology)

NeurIPS 2023 Scaling Sodium-ion Battery Development with NLP (Papers Track)
Abstract and authors: (click to expand)

Abstract: Sodium-ion batteries (SIBs) have been gaining attention for applications like grid-scale energy storage, largely owing to the abundance of sodium and an expected favorable $/kWh figure. SIBs can leverage the well-established manufacturing knowledge of Lithium-ion Batteries (LIBs), but several materials synthesis and performance challenges for electrode materials need to be addressed. This work extracts a large database of challenges restricting the performance and synthesis of SIB cathode active materials (CAMs) and pairs them with corresponding mitigation strategies from the SIB literature by employing custom natural language processing (NLP) tools. The derived insights enable scientists in research and industry to navigate a large number of proposed strategies and focus on impactful scalability-informed mitigation strategies to accelerate the transition from lab to commercialization.

Authors: Mrigi Munjal (Massachusetts Institute of Technology); Thorben Pein (TU Munich); Vineeth Venugopal (Massachusetts Institute of Technology); Kevin Huang (Massachusetts Institute of Technology); Elsa Olivetti (Massachusetts Institute of Technology)

NeurIPS 2023 Predicting Adsorption Energies for Catalyst Screening with Transfer Learning Using Crystal Hamiltonian Graph Neural Network (Proposals Track)
Abstract and authors: (click to expand)

Abstract: As the world moves towards a clean energy future to mitigate the risks of climate change, the discovery of new catalyst materials plays a significant role in enabling the sustainable production and transformation of energy [2]. The development and verification of fast, accurate, and efficient artificial intelligence and machine learning techniques is critical to shortening time-intensive calculations, reducing costs, and improving computational feasibility. We propose applying the Crystal Hamiltonian Graph Neural Network (CHGNet) on the OC20 dataset in order to iteratively perform structure-to-energy and forces calculations and identify the lowest energy across relaxed structures for a given adsorbate-surface combination. CHGNet's predictions will be compared and benchmarked to corresponding values calculated by density functional theory (DFT) [7] and other models to determine its efficacy.

Authors: Angelina Chen (Foothill College/Lawrence Berkeley National Lab); Hui Zheng (Lawrence Berkeley National Lab); Paula Harder (Mila)

ICLR 2023 Graph Neural Network Generated Metal-Organic Frameworks for Carbon Capture (Proposals Track)
Abstract and authors: (click to expand)

Abstract: The level of carbon dioxide (CO2) in our atmosphere is rapidly rising and is projected to double today‘s levels to reach 1,000 ppm by 2100 under certain scenarios, primarily driven by anthropogenic sources. Technology that can capture CO2 from anthropogenic sources, remove from atmosphere and sequester it at the gigaton scale by 2050 is required stop and reverse the impact of climate change. Metal-organic frameworks (MOFs) have been a promising technology in various applications including gas separation as well as CO2 capture from point-source flue gases or removal from the atmosphere. MOFs offer unmatched surface area through their highly porous crystalline structure and MOF technology has potential to become a leading adsorption-based CO2 separation technology providing high surface area, structure stability and chemical tunability. Due to its complex structure, MOF crystal structure (atoms and bonds) cannot be easily represented in tabular format for machine learning (ML) applications whereas graph neural networks (GNN) have already been explored in representation of simpler chemical molecules. In addition to difficulty in MOF data representation, an infinite number of combinations can be created for MOF crystals, which makes ML applications more suitable to alleviate dependency on subject matter experts (SME) than conventional computational methods. In this work, we propose training of GNNs in variational autoencoder (VAE) setting to create an end-to-end workflow for the generation of new MOF crystal structures directly from the data within the crystallographic information files (CIFs) and conditioned by additional CO2 performance values.

Authors: Zikri Bayraktar (Schlumberger Doll Research); Shahnawaz Molla (Schlumberger Doll Research); Sharath Mahavadi (Schlumberger Doll Research)

NeurIPS 2022 AutoML for Climate Change: A Call to Action (Papers Track)
Abstract and authors: (click to expand)

Abstract: The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications. The climate change ML (CCML) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML) techniques to automatically find high-performing architectures and hyperparameters for a given dataset. In this work, we benchmark popular Auto ML libraries on three high-leverage CCML applications: climate modeling, wind power forecasting, and catalyst discovery. We find that out-of-the-box AutoML libraries currently fail to meaningfully surpass the performance of human-designed CCML models. However, we also identify a few key weaknesses, which stem from the fact that most AutoML techniques are tailored to computer vision and NLP applications. For example, while dozens of search spaces have been designed for image and language data, none have been designed for spatiotemporal data. Addressing these key weaknesses can lead to the discovery of novel architectures that yield substantial performance gains across numerous CCML applications. Therefore, we present a call to action to the AutoML community, since there are a number of concrete, promising directions for future work in the space of AutoML for CCML. We release our code and a list of resources at

Authors: Renbo Tu (University of Toronto); Nicholas Roberts (University of Wisconsin-Madison); Vishak Prasad C (Indian Institute Of Technology, Bombay); Sibasis Nayak (Indian Institute of Technology, Bombay); Paarth Jain (Indian Institute of Technology Bombay); Frederic Sala (University of Wisconsin-Madison); Ganesh Ramakrishnan (IIT Bombay); Ameet Talwalkar (CMU); Willie Neiswanger (Stanford University); Colin White (Abacus.AI)

ICML 2021 A multi-task learning approach to enhance sustainable biomolecule production in engineered microorganisms (Proposals Track)
Abstract and authors: (click to expand)

Abstract: A sustainable alternative to sourcing many materials humans need is metabolic engineering: a field that aims to engineer microorganisms into biological factories that convert renewable feedstocks into valuable biomolecules (i.e., jet fuel, medicine). Microorganism factories must be genetically optimized using predictable DNA sequence tools, however, for many organisms, the exact DNA sequence signals defining their genetic control systems are poorly understood. To better decipher these DNA signals, we propose a multi-task learning approach that uses deep learning and feature attribution methods to identify DNA sequence signals that control gene expression in the methanotroph M. buryatense. This bacterium consumes methane, a potent greenhouse gas. If successful, this work would enhance our ability to build gene expression tools to more effectively engineer M. buryatense into an efficient biomolecule factory that can divert methane pollution into valuable, everyday materials.

Authors: Erin Wilson (University of Washington); Mary Lidstrom (University of Washington); David Beck (University of Washington)