Open Catalyst Project: An Introduction to ML applied to Molecular Simulations (Tutorials Track) Spotlight

Muhammed Shuaibi (Carnegie Mellon University); Anuroop Sriram (Facebook); Abhishek Das (Facebook AI Research); Janice Lan (Facebook AI Research); Adeesh Kolluru (Carnegie Mellon University); Brandon Wood (NERSC); Zachary Ulissi (Carnegie Mellon University); Larry Zitnick (Facebook AI Research)

Slides PDF NeurIPS 2021 Poster Cite


As the world continues to battle energy scarcity and climate change, the future of our energy infrastructure is a growing challenge. Renewable energy technologies offer the opportunity to drive efficient carbon-neutral means for energy storage and generation. Doing so, however, requires the discovery of efficient and economic catalysts (materials) to accelerate associated chemical processes. A common approach in discovering high performance catalysts is using molecular simulations. Specifically, each simulation models the interaction of a catalyst surface with molecules that are commonly seen in electrochemical reactions. By predicting these interactions accurately, the catalyst's impact on the overall rate of a chemical reaction may be estimated. The Open Catalyst Project (OCP) aims to develop new ML methods and models to accelerate the catalyst simulation process for renewable energy technologies and improve our ability to predict properties across catalyst composition. The initial release of the Open Catalyst 2020 (OC20) dataset presented the largest open dataset of molecular combinations, spanning 55 unique elements and over 130M+ data points. We will present a comprehensive tutorial of the Open Catalyst Project repository, including (1) Accessing & visualizing the dataset, (2) Overview of the various tasks, (3) Training graph neural network (GNN) models, (4) Developing your own model for OCP, (5) Running ML-driven simulations, and (6) Visualizing the results. Primary tools include PyTorch and PyTorch Geometric. No background in chemistry is assumed. Following this tutorial we hope to better equip attendees with a basic understanding of the data and repository.