Chemistry & Materials

Open Catalyst Project: An Introduction to Machine Learning for Material Discovery

Adeesh Kolluru, Muhammed Shuaibi, Abhishek Das, Brandon Wood, Janice Lan, Anuroop Sriram, Zachary Ulissi, and Larry Zitnick, NeurIPS 2021

Blog Posts

Deep learning of nanoporous materials for chemical separations

Gustavo Perez, October 08, 2023

A real world dataset for estimating battery safety and capacity

Prof. Jingzhao Zhang (Assistant professor at Institute for Interdisciplinary Information Sciences, Tsinghua University) & Prof. Guannan He (Assistant professor at College of Engineering, Peking University), November 22, 2023
Open Catalyst Project Tutorial: An Introduction to Machine Learning for Material Discovery

Prof. John Kitchin (Carnegie Mellon University) and Dr. Zachary Ulissi (Meta’s Fundamental AI Research Lab), September 19, 2023

Accelerating Material Discovery for High-Performance Chemical Separation using AI

Subhransu Maji (University of Massachusetts, Amherst); Peng Bai (University of Massachusetts, Amherst), 2022

ICML 2021
- Solomon Assefa: Addressing Enterprise Decarbonization and Climate Resiliency Goals with Advances in AI, Cloud, and Quantum Computing (Invited Talk)

Venue	Title
NeurIPS 2025	Enabling Machine Learning-Assisted Discovery of Polyamines for Solid-State CO₂ Capture (Papers Track) Abstract and authors: (click to expand) Abstract: Necessitated by the impact of global climate change, the efficient direct air capture (DAC) of CO₂ is one of the technologies with the potential to contribute to the goal of net zero emissions. Solid amine adsorption has shown most promise among existing approaches due to its energy efficiency and scalability. To estimate adsorption for these polyamines, we introduce a computational framework that combines fragment-based polymer generation with Density Functional Theory, molecular dynamics relaxations, and grand canonical Monte Carlo sampling. Through this efficient workflow, we generated a large library of polymers with absorption data — potentially supporting machine learning models for inverse design of polymers with optimized CO₂ adsorption. Computational experimental results showed that Bayesian optimization can further accelerate the process by efficiently identifying high-performing candidates. In summary, our integrated approach bridges atomistic simulation with data-driven optimization, enabling accelerated screening of polymer sorbents for DAC applications. Authors: A N M Nafiz Abeer (Texas A&M University); Junhe Chen (Georgia Institute of Technology); Alif Bin Abdul Qayyum (Texas A&M University); Zhihao Feng (Georgia Institute of Technology); Hyun-Myung Woo (Incheon National University); Seung Soon Jang (Georgia Institute of Technology); Byung-Jun Yoon (Texas A&M University)
ICLR 2025	FabAgent: An LLM-based Agentic Optimization Framework for Design of Sustainable Fabrics (Papers Track) Abstract and authors: (click to expand) Abstract: The fashion industry emits an estimated four billion tons of CO2 annually and nearly one-third of this is due to the choice of fibers used in clothing. Despite the critical role of fiber selection, limited research exists on the design of optimal fiber blends because of a lack of available datasets on fiber properties. This paper introduces FabAgent, the first large language model (LLM) based agentic optimization framework to discover novel sustainable fabric blends. FabAgent provides a scalable way to extract information from scientific publications and the Internet, compiling a structured data set of 101 fabric materials with 24 attributes each, making this one of the most comprehensive raw material data sets for sustainable clothing design. Next, FabAgent uses multi-objective evolutionary optimization to explore Pareto optimal solutions over a large design space of possible blends, balancing sustainability, durability, comfort, and cost, while accommodating constraints on allowable yarn compositions. The optimal blend found by FabAgent substantially outperforms many commercially available blends in leading fashion brands such as Banana Republic, Giorgio Armani, GAP, and Nike: a 30.46–52.71% improvement in environmental sustainability, 15.40–92.21% improvement in cost efficiency, and 68.29-83.49% improvement in comfort. Authors: Anusha Narayan (The Nueva School)
NeurIPS 2024	AI-Driven Predictive Modeling of PFAS Contamination in Aquatic Ecosystems: Exploring A Geospatial Approach (Papers Track) Abstract and authors: (click to expand) Abstract: Per- and polyfluoroalkyl substances (PFAS), a class of synthetic fluorinated compounds termed “forever chemicals”, have garnered significant attention due to their persistence, widespread environmental presence, bioaccumulative properties, and associated risks for human health. Their presence in aquatic ecosystems highlights the link between human activity and the hydrological cycle. They also disrupt aquatic life, interfere with gas exchange, and disturb the carbon cycle, contributing to greenhouse gas emissions and exacerbating climate change. Federal agencies, state governments and non-government research and public interest organizations have emphasized the need for documenting the sites and the extent of PFAS contamination. However, the time-consuming and expensive nature of data collection and analysis poses challenges. It hinders the rapid identification of locations at high risk of PFAS contamination, which may then require further sampling or remediation. To address this data limitation, our study leverages a novel geospatial dataset, machine learning models including frameworks such as Random Forest, IBM-NASA's Prithvi and UNet, and geospatial analysis to predict regions with high PFAS concentrations in surface water. Using fish data from the National Rivers and Streams Assessment (NRSA) dataset by the Environmental Protection Agency (EPA), our analysis suggests the potential value of machine learning based models for targeted deployment of sampling investigations and remediation efforts. Authors: Jowaria Khan (University of Michigan); David Andrews (Environmental Working Group); Kaley Beins (Environmental Working Group); Sydney Evans (Environmental Working Group); Alexa Friedman (Environmental Working Group); Elizabeth Bondi-Kelly (MIT)
NeurIPS 2024	Multimodal AI framework for predicting candidate high temperature superconductors (Proposals Track) Abstract and authors: (click to expand) Abstract: Materials science is at the forefront of addressing some of the most pressing challenges of our era, particularly in enhancing energy efficiency and sustainability. One of the most promising avenues in this field is the study of superconductors—materials that, when cooled below a critical temperature (Tc), exhibit zero electrical resistance. This unique property not only eliminates energy loss due to resistance but also enables a wide range of advanced technologies, such as MRI machines, magnetically levitating trains, and other high-efficiency systems. Superconductors can significantly reduce the carbon footprint of power transmission and other industrial applications. Given the complexity and importance of predicting candidate and practical high-temperature superconductors, we propose to develop a multimodal AI framework to predict new high-Tc superconducting materials. By integrating various material properties, including structural and compositional data, we seek to study patterns and relationships that could guide the discovery of new high-temperature superconductors. Success in this endeavor could significantly reduce energy losses in electrical systems, contributing to the fight against climate change. Authors: Nidhish Sagar (Massachusetts Institute of Technology); Eslam G. Al-Sakkari (Polytechnique Montréal); Ahmed Ragab (Polytechnique Montréal)
ICLR 2024	Analyzing the secondary wastewater-treatment process using Faster R-CNN and YOLOv5 object detection algorithms (Papers Track) Abstract and authors: (click to expand) Abstract: The activated sludge (AS) process is the most common type of secondary wastewater treatment, applied worldwide. Due to the complexity of microbial communities, imbalances between the different types of bacteria may occur and disturb the process, with pronounced economical and environmental consequences. Microscopic inspection of the morphology of flocs and microorganisms provides key information on AS properties and function. This is a time-consuming, highly skilled, and expensive process that is not readily available in all locations. Thus, most wastewater-treatment plants do not carry out this essential analysis, resulting in frequent operational faults. In this study, we develop a novel deep learning (DL) object detection algorithm to analyze and monitor the AS process based on a unique microscopic image database of flocs and microorganisms. Specifically, we applied YOLOv5 and Faster R-CNN algorithms as tools for segmentation and object detection to analyze the wastewater. The mean average precision (mAP) of the YOLOv5 was 0.67, outperforming the Faster R-CNN by 15%. Histogram equalization preprocessing of both bright-field and phase-contrast images significantly improved the results of the algorithm in all classes. In the case of YOLOv5, the mAP increased by 16.67%, to 0.77, where the AP of protozoa, filaments, and open floc classes outperformed the previous model by over 20%. These results demonstrate the potential of leveraging DL algorithms to enhance the analysis and monitoring of WWTPs in an affordable manner, consequently reducing environmental pollution caused by contaminated effluent. The fundamental challenge addressed herein has important global relevance, especially in an era in which the demand for high-quality wastewater reuse is expected to increase dramatically. Authors: Offir Inbar (Tel-Aviv University); Moni Shahar (Tel Aviv University); Jacob Gidron (Tel-Aviv University); Ido Cohen (Tel-Aviv University); Dror Avisar (Tel-Aviv University)
ICLR 2024	Explaining Zeolite Synthesis-Structure Relationships using Aggregated SHAP Analysis (Papers Track) Abstract and authors: (click to expand) Abstract: Zeolites, crystalline aluminosilicate materials with well-defined porous structures, have emerged as versatile materials with applications in carbon capture. Hydrothermal synthesis is a widely used method for zeolite production, offering control over crystallinity and and pore size. However, the intricate interplay of synthesis parameters necessitates a comprehensive understanding to optimize the synthesis process. We train a supervised classification machine learning model on ZeoSyn (a dataset of zeolite synthesis routes) to predict the zeolite framework product given a synthesis route. Subsequently, we leverage SHapley Additive Explanations (SHAP) to reveal key synthesis-structure relationships in zeolites. To that end, we introduce an aggregation SHAP approach to extend such analysis to explain the formation of composite building units (CBUs) of zeolites. Analysis at this unprecedented scale sheds light on key synthesis parameters driving zeolite crystallization. Authors: Elton Pan (MIT)
ICLR 2024	Literature Mining with Large Language Models to Assist the Development of Sustainable Building Materials (Papers Track) Abstract and authors: (click to expand) Abstract: Concrete industry, as one of the significant sources of carbon emissions, drives the urgency for its decarbonization that requires a shift to alternative materials. However, the absence of systematic knowledge summary remains a challenge for further development of sustainable building materials. This work offers a cost-efficient strategy for information extraction tasks in complex terminology settings using small (2.8B) large language models (LLMs) with well-designed instruction-completion schemes and fine-tuning strategies, introducing a dataset cataloging civil engineering applications of alternative materials. The Multiple Choice instruction scheme significantly improves model accuracies in entity inference from non-Noun-Phrase sources, with supervised fine-tuning benefiting from straightforward tokenized representations of choices. We also demonstrate the utility of the dataset by extracting valuable insights into promising applications of alternative materials from knowledge graph representations. Authors: Yifei Duan (Massachusetts Institute of Technology); Yixi Tian (Massachusetts Institute of Technology); Soumya Ghosh (IBM Research); Richard Goodwin (IBM T.J. Watson Research Center); Vineeth Venugopal (Massachusetts Institute of Technology); Jeremy Gregory (Massachusetts Institute of Technology); Jie Chen (IBM Research); Elsa Olivetti (Massachusetts Institute of Technology)
NeurIPS 2023	Scaling Sodium-ion Battery Development with NLP (Papers Track) Abstract and authors: (click to expand) Abstract: Sodium-ion batteries (SIBs) have been gaining attention for applications like grid-scale energy storage, largely owing to the abundance of sodium and an expected favorable $/kWh figure. SIBs can leverage the well-established manufacturing knowledge of Lithium-ion Batteries (LIBs), but several materials synthesis and performance challenges for electrode materials need to be addressed. This work extracts a large database of challenges restricting the performance and synthesis of SIB cathode active materials (CAMs) and pairs them with corresponding mitigation strategies from the SIB literature by employing custom natural language processing (NLP) tools. The derived insights enable scientists in research and industry to navigate a large number of proposed strategies and focus on impactful scalability-informed mitigation strategies to accelerate the transition from lab to commercialization. Authors: Mrigi Munjal (Massachusetts Institute of Technology); Thorben Pein (TU Munich); Vineeth Venugopal (Massachusetts Institute of Technology); Kevin Huang (Massachusetts Institute of Technology); Elsa Olivetti (Massachusetts Institute of Technology)
NeurIPS 2023	Predicting Adsorption Energies for Catalyst Screening with Transfer Learning Using Crystal Hamiltonian Graph Neural Network (Proposals Track) Abstract and authors: (click to expand) Abstract: As the world moves towards a clean energy future to mitigate the risks of climate change, the discovery of new catalyst materials plays a significant role in enabling the sustainable production and transformation of energy [2]. The development and verification of fast, accurate, and efficient artificial intelligence and machine learning techniques is critical to shortening time-intensive calculations, reducing costs, and improving computational feasibility. We propose applying the Crystal Hamiltonian Graph Neural Network (CHGNet) on the OC20 dataset in order to iteratively perform structure-to-energy and forces calculations and identify the lowest energy across relaxed structures for a given adsorbate-surface combination. CHGNet's predictions will be compared and benchmarked to corresponding values calculated by density functional theory (DFT) [7] and other models to determine its efficacy. Authors: Angelina Chen (Foothill College/Lawrence Berkeley National Lab); Hui Zheng (Lawrence Berkeley National Lab); Paula Harder (Mila)
ICLR 2023	Graph Neural Network Generated Metal-Organic Frameworks for Carbon Capture (Proposals Track) Abstract and authors: (click to expand) Abstract: The level of carbon dioxide (CO2) in our atmosphere is rapidly rising and is projected to double today‘s levels to reach 1,000 ppm by 2100 under certain scenarios, primarily driven by anthropogenic sources. Technology that can capture CO2 from anthropogenic sources, remove from atmosphere and sequester it at the gigaton scale by 2050 is required stop and reverse the impact of climate change. Metal-organic frameworks (MOFs) have been a promising technology in various applications including gas separation as well as CO2 capture from point-source flue gases or removal from the atmosphere. MOFs offer unmatched surface area through their highly porous crystalline structure and MOF technology has potential to become a leading adsorption-based CO2 separation technology providing high surface area, structure stability and chemical tunability. Due to its complex structure, MOF crystal structure (atoms and bonds) cannot be easily represented in tabular format for machine learning (ML) applications whereas graph neural networks (GNN) have already been explored in representation of simpler chemical molecules. In addition to difficulty in MOF data representation, an infinite number of combinations can be created for MOF crystals, which makes ML applications more suitable to alleviate dependency on subject matter experts (SME) than conventional computational methods. In this work, we propose training of GNNs in variational autoencoder (VAE) setting to create an end-to-end workflow for the generation of new MOF crystal structures directly from the data within the crystallographic information files (CIFs) and conditioned by additional CO2 performance values. Authors: Zikri Bayraktar (Schlumberger Doll Research); Shahnawaz Molla (Schlumberger Doll Research); Sharath Mahavadi (Schlumberger Doll Research)
NeurIPS 2022	AutoML for Climate Change: A Call to Action (Papers Track) Abstract and authors: (click to expand) Abstract: The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications. The climate change ML (CCML) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML) techniques to automatically find high-performing architectures and hyperparameters for a given dataset. In this work, we benchmark popular Auto ML libraries on three high-leverage CCML applications: climate modeling, wind power forecasting, and catalyst discovery. We find that out-of-the-box AutoML libraries currently fail to meaningfully surpass the performance of human-designed CCML models. However, we also identify a few key weaknesses, which stem from the fact that most AutoML techniques are tailored to computer vision and NLP applications. For example, while dozens of search spaces have been designed for image and language data, none have been designed for spatiotemporal data. Addressing these key weaknesses can lead to the discovery of novel architectures that yield substantial performance gains across numerous CCML applications. Therefore, we present a call to action to the AutoML community, since there are a number of concrete, promising directions for future work in the space of AutoML for CCML. We release our code and a list of resources at https://github.com/climate-change-automl/climate-change-automl. Authors: Renbo Tu (University of Toronto); Nicholas Roberts (University of Wisconsin-Madison); Vishak Prasad C (Indian Institute Of Technology, Bombay); Sibasis Nayak (Indian Institute of Technology, Bombay); Paarth Jain (Indian Institute of Technology Bombay); Frederic Sala (University of Wisconsin-Madison); Ganesh Ramakrishnan (IIT Bombay); Ameet Talwalkar (CMU); Willie Neiswanger (Stanford University); Colin White (Abacus.AI)
ICML 2021	A multi-task learning approach to enhance sustainable biomolecule production in engineered microorganisms (Proposals Track) Abstract and authors: (click to expand) Abstract: A sustainable alternative to sourcing many materials humans need is metabolic engineering: a field that aims to engineer microorganisms into biological factories that convert renewable feedstocks into valuable biomolecules (i.e., jet fuel, medicine). Microorganism factories must be genetically optimized using predictable DNA sequence tools, however, for many organisms, the exact DNA sequence signals defining their genetic control systems are poorly understood. To better decipher these DNA signals, we propose a multi-task learning approach that uses deep learning and feature attribution methods to identify DNA sequence signals that control gene expression in the methanotroph M. buryatense. This bacterium consumes methane, a potent greenhouse gas. If successful, this work would enhance our ability to build gene expression tools to more effectively engineer M. buryatense into an efficient biomolecule factory that can divert methane pollution into valuable, everyday materials. Authors: Erin Wilson (University of Washington); Mary Lidstrom (University of Washington); David Beck (University of Washington)

Chemistry & Materials

Tutorials

Blog Posts

Discussion Seminars and Webinars

Innovation Grants

Talks

Workshop Papers