Reinforcement Learning


Blog Posts


Innovation Grants

Workshop Papers

Venue Title
ICLR 2024 Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: Energy storage devices, such as batteries, thermal energy storages, and hydrogen systems, can help mitigate climate change by ensuring a more stable and sustainable power supply. To maximize the effectiveness of such energy storage, determining the appropriate charging and discharging amounts for each time period is crucial. Reinforcement learning is preferred over traditional optimization for the control of energy storage due to its ability to adapt to dynamic and complex environments. However, the continuous nature of charging and discharging levels in energy storage poses limitations for discrete reinforcement learning, and time-varying feasible charge-discharge range based on state of charge (SoC) variability also limits the conventional continuous reinforcement learning. In this paper, we propose a continuous reinforcement learning approach that takes into account the time-varying feasible charge-discharge range. An additional objective function was introduced for learning the feasible action range for each time period, supplementing the objectives of training the actor for policy learning and the critic for value learning. This actively promotes the utilization of energy storage by preventing them from getting stuck in suboptimal states, such as continuous full charging or discharging. This is achieved through the enforcement of the charging and discharging levels into the feasible action range. The experimental results demonstrated that the proposed method further maximized the effectiveness of energy storage by actively enhancing its utilization.

Authors: Jaeik Jeong (Electronics and Telecommunications Research Institute (ETRI)); Tai-Yeon Ku (Electronics and Telecommunications Research Institute (ETRI)); Wan-Ki Park (Electronics and Telecommunications Research Institute (ETRI))

ICLR 2024 Generalized Policy Learning for Smart Grids: FL TRPO Approach (Papers Track)
Abstract and authors: (click to expand)

Abstract: The smart grid domain requires bolstering the capabilities of existing energy management systems; Federated Learning (FL) aligns with this goal as it demonstrates a remarkable ability to train models on heterogeneous datasets while maintaining data privacy, making it suitable for smart grid applications, which often involve disparate data distributions and interdependencies among features that hinder the suitability of linear models. This paper introduces a framework that combines FL with a Trust Region Policy Optimization (FL TRPO) aiming to reduce energy-associated emissions and costs. Our approach reveals latent interconnections and employs personalized encoding methods to capture unique insights, understanding the relationships between features and optimal strategies, allowing our model to generalize to previously unseen data. Experimental results validate the robustness of our approach, affirming its proficiency in effectively learning policy models for smart grid challenges.

Authors: Yunxiang LI (MBZUAI); Nicolas M Cuadrado (MBZUAI); Samuel Horváth (MBZUAI); Martin Takac (Mohamed bin Zayed University of Artificial Intelligence)

ICLR 2024 Empowering Safe Reinforcement Learning in Power System Control with CommonPower (Tutorials Track)
Abstract and authors: (click to expand)

Abstract: Reinforcement learning (RL) has become a valuable tool for addressing complex decision-making problems in power system control. However, the unique intricacies of this domain necessitate the development of specialized RL algorithms. While benchmarking problems have proven effective in advancing algorithm development in various domains, existing suites do not enable a systematic study of two key challenges in power system control: ensuring adherence to physical constraints and evaluating the impact of forecast accuracy on controller performance. This tutorial introduces the sophisticated capabilities of the CommonPower toolbox, designed to address these overlooked challenges. We guide participants in composing benchmark problems within CommonPower, leveraging predefined components, and demonstrate the creation of new components. We showcase the training of a safe RL agent to solve a benchmark problem, comparing its performance against a built-in MPC baseline. Notably, CommonPower's symbolic modeling approach enables the automatic derivation of safety shields for vanilla RL algorithms. We explain the theory behind this feature in a concise introduction to the field of safe RL. Furthermore, we present CommonPower's interface for seamlessly integrating diverse forecasting strategies into the system. The workshop emphasizes the significance of safeguarding vanilla RL algorithms and encourages researchers to systematically investigate the influence of forecast uncertainties in their experiments.

Authors: Hannah Markgraf (Technical University of Munich); Michael Eichelbeck (Technical University of Munich); Matthias Althoff (Technical University of Munich)

NeurIPS 2023 Ocean Wave Energy: Optimizing Reinforcement Learning Agents for Effective Deployment (Papers Track)
Abstract and authors: (click to expand)

Abstract: Fossil fuel energy production is a leading cause of climate change. While wind and solar energy have made advancements, ocean waves, a more consistent clean energy source, remain underutilized. Wave Energy Converters (WEC) transform wave power into electric energy. To be economically viable, modern WECs need sophisticated real-time controllers that boost energy output and minimize mechanical stress, thus lowering the overall cost of energy (LCOE). This paper presents how a Reinforcement Learning (RL) controller can outperform the default spring damper controller for complex spread waves in the sea, enhancing wave energy's viability. Using the Proximal Policy Optimization (PPO) algorithm with Transformer variants as function approximators, the RL controllers optimize multi-generator Wave Energy Converters (WEC), leveraging wave sensor data for multiple cost-efficiency goals. After successful tests in the EuropeWave\footnote{EuropeWave:} project's emulator tank, the platform is planned to deploy. We discuss the challenges of deployment at the BiMEP site and how we had to tune the RL controller to address that. The RL controller outperforms the default Spring Damper controller in the BiMEP\footnote{BiMEP:} conditions by 22.8% on energy capture. Enhancing wave energy's economic viability will expedite the transition to clean energy, reducing carbon emissions and fostering a healthier climate.

Authors: Vineet Gundecha (Hewlett Packard Enterpise); Sahand Ghorbanpour (Hewlett Packard Enterprise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs); Avisek Naug (Hewlett Packard Enterprise); Alexandre Pichard (Carnegie Clean Energy); mathieu Cocho (Carnegie Clean Energy); Soumyendu Sarkar (Hewlett Packard Enterprise)

NeurIPS 2023 Sustainable Data Center Modeling: A Multi-Agent Reinforcement Learning Benchmark (Papers Track)
Abstract and authors: (click to expand)

Abstract: The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent control of DC components such as cooling, load shifting, and energy storage is essential. However, the complexity of managing these controls in tandem with external factors like weather and green energy availability presents a significant challenge. While some individual components like HVAC control have seen research in Reinforcement Learning (RL), there's a gap in holistic optimization covering all elements simultaneously. To tackle this, we've developed DCRL, a multi-agent RL environment that empowers the ML community to research, develop, and refine RL controllers for carbon footprint reduction in DCs. DCRL is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. In its default setup, DCRL also provides a benchmark for evaluating multi-agent RL algorithms, facilitating collaboration and progress in green computing research.

Authors: Soumyendu Sarkar (Hewlett Packard Enterprise); Avisek Naug (Hewlett Packard Enterprise); Antonio Guillen (Hewlett Packard Enterprise); Ricardo Luna Gutierrez (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterpise); Sahand Ghorbanpour (Hewlett Packard Enterprise); Sajad Mousavi (Hewlett Packard Enterprise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs)

NeurIPS 2023 A Configurable Pythonic Data Center Model for Sustainable Cooling and ML Integration (Papers Track)
Abstract and authors: (click to expand)

Abstract: There have been growing discussions on estimating and subsequently reducing the operational carbon footprint of enterprise data centers. The design and intelligent control for data centers have an important impact on data center carbon footprint. In this paper, we showcase PyDCM, a Python library that enables extremely fast prototyping of data center design and applies reinforcement learning-enabled control with the purpose of evaluating key sustainability metrics, including carbon footprint, energy consumption, and observing temperature hotspots. We demonstrate these capabilities of PyDCM and compare them to existing works in EnergyPlus for modeling data centers. PyDCM can also be used as a standalone Gymnasium environment for demonstrating sustainability-focused data center control.

Authors: Avisek Naug (Hewlett Packard Enterprise); Antonio Guillen (Hewlett Packard Enterprise); Ricardo Luna Gutierrez (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterpise); Sahand Ghorbanpour (Hewlett Packard Enterprise); Sajad Mousavi (Hewlett Packard Enterprise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs); Soumyendu Sarkar (Hewlett Packard Enterprise)

NeurIPS 2023 Reinforcement Learning control for Airborne Wind Energy production (Papers Track)
Abstract and authors: (click to expand)

Abstract: Airborne Wind Energy (AWE) is an emerging technology that promises to be able to harvest energy from strong high-altitude winds, while addressing some of the key critical issues of current wind turbines. AWE is based on flying devices (usually gliders or kites) that, tethered to a ground station, fly driven by the wind and convert the mechanical energy of wind into electrical energy by means of a generator. Such systems are usually controlled by adjusting the trajectory of the kite using optimal control techniques, such as model-predictive control. These methods are based upon a mathematical model of the system to control, and they produce results that are strongly dependent on the specific model at use and difficult to generalize. Our aim is to replace these classical techniques with an approach based on Reinforcement Learning (RL), which can be used even in absence of a known model. Experimental results prove that RL is a viable method to control AWE systems in complex simulated environments, including turbulent flows.

Authors: Lorenzo Basile (University of Trieste); Maria Grazia Berni (University of Trieste); Antonio Celani (ICTP)

NeurIPS 2023 Real-time Carbon Footprint Minimization in Sustainable Data Centers wth Reinforcement Learning (Papers Track) Best ML Innovation
Abstract and authors: (click to expand)

Abstract: As machine learning workloads significantly increase energy consumption, sustainable data centers with low carbon emissions are becoming a top priority for governments and corporations worldwide. There is a pressing need to optimize energy usage in these centers, especially considering factors like cooling, balancing flexible load based on renewable energy availability, and battery storage utilization. The challenge arises due to the interdependencies of these strategies with fluctuating external factors such as weather and grid carbon intensity. Although there's currently no real-time solution that addresses all these aspects, our proposed Data Center Carbon Footprint Reduction (DC-CFR) framework, based on multi-agent Reinforcement Learning (MARL), targets carbon footprint reduction, energy optimization, and cost. Our findings reveal that DC-CFR's MARL agents efficiently navigate these complexities, optimizing the key metrics in real-time. DC-CFR reduced carbon emissions, energy consumption, and energy costs by over 13% with EnergyPlus simulation compared to the industry standard ASHRAE controller controlling HVAC for a year in various regions.

Authors: Soumyendu Sarkar (Hewlett Packard Enterprise); Avisek Naug (Hewlett Packard Enterprise); Ricardo Luna Gutierrez (Hewlett Packard Enterprise); Antonio Guillen (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterpise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs); Cullen Bash (HPE)

NeurIPS 2023 Reinforcement Learning for Wildfire Mitigation in Simulated Disaster Environments (Papers Track)
Abstract and authors: (click to expand)

Abstract: Climate change has resulted in a year over year increase in adverse weather and weather conditions which contribute to increasingly severe fire seasons. Without effective mitigation, these fires pose a threat to life, property, ecology, cultural heritage, and critical infrastructure. To better prepare for and react to the increasing threat of wildfires, more accurate fire modelers and mitigation responses are necessary. In this paper, we introduce SimFire, a versatile wildland fire projection simulator designed to generate realistic wildfire scenarios, and SimHarness, a modular agent-based machine learning wrapper capable of automatically generating land management strategies within SimFire to reduce the overall damage to the area. Together, this publicly available system allows researchers and practitioners the ability to emulate and assess the effectiveness of firefighter interventions and formulate strategic plans that prioritize value preservation and resource allocation optimization. The repositories are available for download at

Authors: Alexander Tapley (The MITRE Corporation); savanna o smith (MITRE); Tim Welsh (The MITRE Corporation); Aidan Fennelly (The MITRE Corporation); Dhanuj M Gandikota (The MITRE Corporation); Marissa Dotter (MITRE Corporation); Michael Doyle (The MITRE Corporation); Michael Threet (MITRE)

NeurIPS 2023 Hybridizing Physics and Neural ODEs for Predicting Plasma Inductance Dynamics in Tokamak Fusion Reactors (Papers Track)
Abstract and authors: (click to expand)

Abstract: While fusion reactors known as tokamaks hold promise as a firm energy source, advances in plasma control, and handling of events where control of plasmas is lost, are needed for them to be economical. A significant bottleneck towards applying more advanced control algorithms is the need for better plasma simulation, where both physics-based and data-driven approaches currently fall short. The former is bottle-necked by both computational cost and the difficulty of modelling plasmas, and the latter is bottle-necked by the relative paucity of data. To address this issue, this work applies the neural ordinary differential equations (ODE) framework to the problem of predicting a subset of plasma dynamics, namely the coupled plasma current and internal inductance dynamics. As the neural ODE framework allows for the natural inclusion of physics-based inductive biases, we train both physics-based and neural network models on data from the Alcator C-Mod fusion reactor and find that a model that combines physics-based equations with a neural ODE performs better than both existing physics-motivated ODEs and a pure neural ODE model.

Authors: Allen Wang (MIT); Cristina Rea (MIT); Darren Garnier (MIT)

NeurIPS 2023 Deploying Reinforcement Learning based Economizer Optimization at Scale (Papers Track)
Abstract and authors: (click to expand)

Abstract: Building operations account for a significant portion of global emissions, contributing approximately 28\% of global greenhouse gas emissions. With anticipated increase in cooling demand due to rising global temperatures, the optimization of rooftop units (RTUs) in buildings becomes crucial for reducing emissions. We focus on the optimization of the economizer logic within RTUs, which balances the mix of indoor and outdoor air. By effectively utilizing free outside air, economizers can significantly decrease mechanical energy usage, leading to reduced energy costs and emissions. We introduce a reinforcement learning (RL) approach that adaptively controls the economizer based on the unique characteristics of individual facilities. We have trained and deployed our solution in the real-world across a distributed building stock. We address the scaling challenges with our cloud-based RL deployment on 10K+ RTUs across 200+ sites.

Authors: Ivan Cui (Amazon); Wei Yih Yap (Amazon); Charles Prosper (Independant); Bharathan Balaji (Amazon); Jake Chen (Amazon)

NeurIPS 2023 Contextual Reinforcement Learning for Offshore Wind Farm Bidding (Papers Track)
Abstract and authors: (click to expand)

Abstract: We propose a framework for applying reinforcement learning to contextual two-stage stochastic optimization and apply this framework to the problem of energy market bidding of an off-shore wind farm. Reinforcement learning could potentially be used to learn close to optimal solutions for first stage variables of a two-stage stochastic program under different contexts. Under the proposed framework, these solutions would be learned without having to solve the full two-stage stochastic program. We present initial results of training using the DDPG algorithm and present intended future steps to improve performance.

Authors: David Cole (University of Wisconsin-Madison); Himanshu Sharma (Pacific Northwest National Laboratory); Wei Wang (Pacific Northwest National Laboratory)

NeurIPS 2023 A Scalable Network-Aware Multi-Agent Reinforcement Learning Framework for Distributed Converter-based Microgrid Voltage Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: Renewable energy plays a crucial role in mitigating climate change. With the rising use of distributed energy resources (DERs), microgrids (MGs) have emerged as a solution to accommodate high DER penetration. However, controlling MGs' voltage during islanded operation is challenging due to system's nonlinearity and stochasticity. Although multi-agent reinforcement learning (MARL) methods have been applied to distributed MG voltage control, they suffer from bad scalability and are found difficult to control the MG with a large number of DGs due to the well-known curse of dimensionality. To address this, we propose a scalable network-aware reinforcement learning framework which exploits network structure to truncate the critic's Q-function to achieve scalability. Our experiments show effective control of a MG with up to 84 DGs, surpassing the existing maximum of 40 agents in the existing literature. We also compare our framework with state-of-the-art MARL algorithms to show the superior scalability of our framework.

Authors: Han Xu (Tsinghua University); Guannan Qu (Carnegie Mellon University)

NeurIPS 2023 Reinforcement Learning in agent-based modeling to reduce carbon emissions in transportation (Papers Track)
Abstract and authors: (click to expand)

Abstract: This paper explores the integration of reinforcement learning (RL) into transportation simulations to explore system interventions to reduce greenhouse gas emissions. The study leverages the Behavior, Energy, Automation, and Mobility (BEAM) transportation simulation framework in conjunction with the Berkeley Integrated System for Transportation Optimization (BISTRO) for scenario development. The main objective is to determine optimal parameters for transportation simulations to increase public transport usage and reduce individual vehicle reliance. Initial experiments were conducted on a simplified transportation scenario, and results indicate that RL can effectively find system interventions that increase public transit usage and decrease transportation emissions.

Authors: Yuhao Yuan (UC Berkeley); Felipe Leno da Silva (Lawrence Livermore National Laboratory); Ruben Glatt (Lawrence Livermore National Laboratory)

NeurIPS 2023 Breeding Programs Optimization with Reinforcement Learning (Papers Track)
Abstract and authors: (click to expand)

Abstract: Crop breeding is crucial in improving agricultural productivity while potentially decreasing land usage, greenhouse gas emissions, and water consumption. However, breeding programs are challenging due to long turnover times, high-dimensional decision spaces, long-term objectives, and the need to adapt to rapid climate change. This paper introduces the use of Reinforcement Learning (RL) to optimize simulated crop breeding programs. RL agents are trained to make optimal crop selection and cross-breeding decisions based on genetic information. To benchmark RL-based breeding algorithms, we introduce a suite of Gym environments. The study demonstrates the superiority of RL techniques over standard practices in terms of genetic gain when simulated in silico using real-world genomic maize data.

Authors: Omar G. Younis (ETH Zurich); Luca Corinzia (ETH Zurich - Information Science & Engineering Group); Ioannis N Athanasiadis (Wageningen University and Research); Andreas Krause (ETH Zürich); Joachim Buhmann (ETH Zurich); Matteo Turchetta (ETH Zurich)

NeurIPS 2023 Cooperative Logistics: Can Artificial Intelligence Enable Trustworthy Cooperation at Scale? (Papers Track)
Abstract and authors: (click to expand)

Abstract: Cooperative Logistics studies the setting where logistics companies pool their resources together to improve their individual performance. Prior literature suggests carbon savings of approximately 22%. If attained globally, this equates to 480,000,000 tonnes of CO2-eq. Whilst well-studied in operations research – industrial adoption remains limited due to a lack of trustworthy cooperation. A key remaining challenge is fair and scalable gain sharing (i.e., how much should each company be fairly paid?). We propose the use of deep reinforcement learning with a neural reward model for coalition structure generation and present early findings.

Authors: Stephen Mak (University of Cambridge); Tim Pearce (Microsoft Research); Matthew Macfarlane (University of Amsterdam); Liming Xu (University of Cambridge); Michael Ostroumov (Value Chain Lab); Alexandra Brintrup (University of Cambridge)

NeurIPS 2023 AI assisted Search for Atmospheric CO2 Capture (Papers Track)
Abstract and authors: (click to expand)

Abstract: Carbon capture technologies is an important tool for mitigating climate change. In recent years, polymer membrane separation methods have emerged as a promising technology for separating CO2 and other green house gases from the atmosphere. Designing new polymers for such tasks is quite difficult. In this work we look at machine learning based methods to search for new polymer designs optimized for CO2 separation. An ensemble ML models is trained on a large database of molecules to predict permeabilities of CO2/N2 and CO2/O2 pairs. We then use search based optimization to discover new polymers that surpass existing polymer designs. Simulations are then done to verify the predicted performance of the new designs. Overall result suggests that ML based search can be used to discover new polymers optimized for carbon capture.

Authors: Shivashankar Shivashankar (Student)

NeurIPS 2023 Zero-Emission Vehicle Intelligence (ZEVi): Effectively Charging Electric Vehicles at Scale Without Breaking Power Systems (or the Bank) (Tutorials Track)
Abstract and authors: (click to expand)

Abstract: Transportation contributes to 29% of all greenhouse gas (GHG) emissions in the US, of which 58% are from light-duty vehicles and 28% from medium-to-heavy duty vehicles (MHDVs) [1]. Battery electric vehicles (EVs) emit 90% less life cycle GHGs than their internal combustion engine (ICEV) counterparts [2], but currently only comprise 2% of all vehicles in the U.S. EVs thus represent a crucial step in decarbonizing road transportation. One major challenge in replacing ICEVs with EVs at scale is the ability to charge a large number of EVs within the constraints of power systems in a cost-effective way. This is an especially prominent problem for MHDVs used in commercial fleets such as shuttle buses and delivery trucks, as they generally require more energy to complete assigned trips compared to light-duty vehicles. In this tutorial, we describe the myriad challenges in charging EVs at scale and define common objectives such as minimizing total load on power systems, minimizing fleet operating costs, as well as maximizing vehicle state of charge and onsite photovoltaic energy usage. We discuss common constraints such as vehicle trip energy requirements, charging station power limits, and limits on vehicles’ time to charge between trips. We survey several different methods to formulate EV charging and energy dispatch as a mathematically solvable optimization problem, using tools such as convex optimization, Markov decision process (MDP), and reinforcement learning (RL). We introduce a commercial application of model-based predictive control (MPC) algorithm, ZEVi (Zero Emission Vehicle intelligence), which solves optimal energy dispatch strategies for charging sessions of commercial EV fleets. Using a synthetic dataset modeled after a real fleet of electric school buses, we engage the audience with a hands-on exercise applying ZEVi to find the optimal charging strategy for a commercial fleet. Lastly, we briefly discuss other contexts in which methods originating from process control and deep learning, like MPC and RL, can be applied to solve problems related to climate change mitigation and adaptation. With the examples provided in this tutorial, we hope to inspire the audience to come up with their own creative ways to apply these methods in different fields within the climate domain. References [1] EPA (2023). Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2021. U.S. Environmental Protection Agency, EPA 430-R-23-002. [2] Verma, S., Dwivedi, G., & Verma, P. (2022). Life cycle assessment of electric vehicles in comparison to combustion engine vehicles: A review. Materials Today: Proceedings, 49, 217-222.

Authors: Shasha Lin (NextEra Mobility); Jonathan Brophy (NextEra Mobility); Tamara Monge (NextEra Mobility); Jamie Hussman (NextEra Mobility); Michelle Lee (NextEra Mobility); Sam Penrose (NextEra Mobility)

ICLR 2023 CityLearn: A Tutorial on Reinforcement Learning Control for Grid-Interactive Efficient Buildings and Communities (Tutorials Track)
Abstract and authors: (click to expand)

Abstract: Buildings are responsible for up to 75% of electricity consumption in the United States. Grid-Interactive Efficient Buildings can provide flexibility to solve the issue of power supply-demand mismatch, particularly brought about by renewables. Their high energy efficiency and self-generating capabilities can reduce demand without affecting the building function. Additionally, load shedding and shifting through smart control of storage systems can further flatten the load curve and reduce grid ramping cost in response to rapid decrease in renewable power supply. The model-free nature of reinforcement learning control makes it a promising approach for smart control in grid-interactive efficient buildings, as it can adapt to unique building needs and functions. However, a major challenge for the adoption of reinforcement learning in buildings is the ability to benchmark different control algorithms to accelerate their deployment on live systems. CityLearn is an open source OpenAI Gym environment for the implementation and benchmarking of simple and advanced control algorithms, e.g., rule-based control, model predictive control or deep reinforcement learning control thus, provides solutions to this challenge. This tutorial leverages CityLearn to demonstrate different control strategies in grid-interactive efficient buildings. Participants will learn how to design three controllers of varying complexity for battery management using a real-world residential neighborhood dataset to provide load shifting flexibility. The algorithms will be evaluated using six energy flexibility, environmental and economic key performance indicators, and their benefits and shortcomings will be identified. By the end of the tutorial, participants will acquire enough familiarity with the CityLearn environment for extended use in new datasets or personal projects.

Authors: Kingsley E Nweye (The University of Texas at Austin); Allen Wu (The University of Texas at Austin); Hyun Park (The University of Texas at Austin); Yara Almilaify (The University of Texas at Austin); Zoltan Nagy (The University of Texas at Austin)

ICLR 2023 Safe Multi-Agent Reinforcement Learning for Price-Based Demand Response (Papers Track)
Abstract and authors: (click to expand)

Abstract: Price-based demand response (DR) enables households to provide the flexibility required in power grids with a high share of volatile renewable energy sources. Multi-agent reinforcement learning (MARL) offers a powerful, decentralized decision-making tool for autonomous agents participating in DR programs. Unfortunately, MARL algorithms do not naturally allow one to incorporate safety guarantees, preventing their real-world deployment. To meet safety constraints, we propose a safety layer that minimally adjusts each agent's decisions. We investigate the influence of using a reward function that reflects these safety adjustments. Results show that considering safety aspects in the reward during training improves both convergence speed and performance of the MARL agents in the investigated numerical experiments.

Authors: Hannah Markgraf (Technical University of Munich); Matthias Althoff (Technical University of Munich)

ICLR 2023 MAHTM: A Multi-Agent Framework for Hierarchical Transactive Microgrids (Papers Track)
Abstract and authors: (click to expand)

Abstract: Integration of variable renewable energy into the grid has posed challenges to system operators in achieving optimal trade-offs among energy availability, cost affordability, and pollution controllability. This paper proposes a multi-agent reinforcement learning framework for managing energy transactions in microgrids. The framework addresses the challenges above: it seeks to optimize the usage of available resources by minimizing the carbon footprint while benefiting all stakeholders. The proposed architecture consists of three layers of agents, each pursuing different objectives. The first layer, comprised of prosumers and consumers, minimizes the total energy cost. The other two layers control the energy price to decrease the carbon impact while balancing the consumption and production of both renewable and conventional energy. This framework also takes into account fluctuations in energy demand and supply.

Authors: Nicolas M Cuadrado (MBZUAI); Roberto Alejandro Gutierrez Guillen (MBZUAI); Yongli Zhu (Texas A&M University); Martin Takac (Mohamed bin Zayed University of Artificial Intelligence)

ICLR 2023 Global-Local Policy Search and Its Application in Grid-Interactive Building Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: As the buildings sector represents over 70% of the total U.S. electricity consumption, it offers a great amount of untapped demand-side resources to tackle many critical grid-side problems and improve the overall energy system's efficiency. To help make buildings grid-interactive, this paper proposes a global-local policy search method to train a reinforcement learning (RL) based controller which optimizes building operation during both normal hours and demand response (DR) events. Experiments on a simulated five-zone commercial building demonstrate that by adding a local fine-tuning stage to the evolution strategy policy training process, the control costs can be further reduced by 7.55% in unseen testing scenarios. Baseline comparison also indicates that the learned RL controller outperforms a pragmatic linear model predictive controller (MPC), while not requiring intensive online computation.

Authors: Xiangyu Zhang (National Renewable Energy Laboratory); Yue Chen (National Renewable Energy Laboratory); Andrey Bernstein (NREL)

ICLR 2023 Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics (Papers Track)
Abstract and authors: (click to expand)

Abstract: Finding a balance between collaboration and competition is crucial for artificial agents in many real-world applications. We investigate this using a Multi-Agent Reinforcement Learning (MARL) setup on the back of a high-impact problem. The accumulation and yearly growth of plastic in the ocean cause irreparable damage to many aspects of oceanic health and the marina system. To prevent further damage, we need to find ways to reduce macroplastics from known plastic patches in the ocean. Here we propose a Graph Neural Network (GNN) based communication mechanism that increases the agents' observation space. In our custom environment, agents control a plastic collecting vessel. The communication mechanism enables agents to develop a communication protocol using a binary signal. While the goal of the agent collective is to clean up as much as possible, agents are rewarded for the individual amount of macroplastics collected. Hence agents have to learn to communicate effectively while maintaining high individual performance. We compare our proposed communication mechanism with a multi-agent baseline without the ability to communicate. Results show communication enables collaboration and increases collective performance significantly. This means agents have learned the importance of communication and found a balance between collaboration and competition.

Authors: Philipp D Siedler (Aleph Alpha)

ICLR 2023 Distributed Reinforcement Learning for DC Open Energy Systems (Papers Track)
Abstract and authors: (click to expand)

Abstract: The direct current open energy system (DCOES) enables the production, storage, and exchange of renewable energy within local communities, which is helpful, especially in isolated villages and islands where centralized power supply is unavailable or unstable. As solar and wind energy production varies in time and space depending on the weather and the energy usage patterns differ for different households, how to store and exchange energy is an important research issue. In this work, we explore the use of deep reinforcement learning (DRL) for adaptive control of energy storage in local batteries and energy sharing through DC grids. We extend the Autonomous Power Interchange System (APIS) emulator from SonyCSL to combine it with reinforcement learning algorithms in each house. We implemented deep Q-network (DQN) and prioritized DQN to dynamically set the parameters of the real-time energy exchange protocol of APIS and tested it using the actual data collected from the DCOES in the faculty houses of Okinawa Institute of Science and Technology (OIST). The simulation results showed that RL agents outperformed the hand-tuned control strategy. Sharing average energy production, storage, and usage within the local community further improved efficiency. The implementation of DRL methods for adaptive energy storage and exchange can help reducing carbon emission and positively impact the climate.

Authors: Qiong Huang (Okinawa Institute of Science and Technology Graduate University); Kenji Doya (Okinawa Institute of Science and Technology)

ICLR 2023 Efficient HVAC Control with Deep Reinforcement Learning and EnergyPlus (Papers Track)
Abstract and authors: (click to expand)

Abstract: Heating and cooling comprise a significant fraction of the energy consumed by buildings, which in turn account for a significant fraction of society’s energy use. Most building heating, ventilation, and air conditioning (HVAC) systems use standard control schemes that meet basic operating constraints and comfort requirements but with suboptimal efficiency. Deep reinforcement learning (DRL) has shown immense potential for high-performing control in a variety of simulated settings, but has not been widely deployed for real-world control. Here we provide two contributions toward increasing the viability of real-world, DRL-based HVAC control, leveraging the EnergyPlus building simulator. First, we use the new EnergyPlus Python API to implement a first-of-its-kind, purely Python-based EnergyPlus DRL learning framework capable of generalizing to a wide variety of building configurations and weather scenarios. Second, we demonstrate an approach to constrained learning for this setting, removing the requirement to tune reward functions in order to maximize energy efficiency given temperature constraints. We tested our framework on realistic building models of a data center, an office building, and a secondary school. In each case, trained agents maintained temperature control while achieving energy savings relative to standard approaches.

Authors: Jared Markowitz (Johns Hopkins University Applied Physics Laboratory); Nathan Drenkow (Johns Hopkins University Applied Physics Laboratory)

ICLR 2023 Decision-aware uncertainty-calibrated deep learning for robust energy system operation (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Decision-making under uncertainty is an important problem that arises in many domains. Achieving robustness guarantees requires well-calibrated uncertainties, which can be difficult to achieve in high-capacity prediction models such as deep neural networks. This paper proposes an end-to-end approach for learning uncertainty-calibrated deep learning models that directly optimizes a downstream decision-making objective with provable robustness. We also propose two concrete applications in energy system operations, including a grid scheduling task as well as an energy storage arbitrage task. As renewable wind and solar generation increasingly proliferate and their variability penetrates the energy grid, learning uncertainty-aware predictive models becomes increasingly crucial for maintaining efficient and reliable grid operation.

Authors: Christopher Yeh (California Institute of Technology); Nicolas Christianson (California Institute of Technology); Steven Low (California Institute of Technology); Adam Wierman (California Institute of Technology); Yisong Yue (Caltech)

ICLR 2023 Multi-Agent Deep Reinforcement Learning for Solar-Battery System to Mitigate Solar Curtailment in Real-Time Electricity Market (Papers Track)
Abstract and authors: (click to expand)

Abstract: The increased uptake of solar energy in the energy transition towards decarbonization has caused the issue of solar photovoltaic (PV) curtailments, resulting in significant economic losses and hindering the energy transition. To overcome this issue, battery energy storage systems (BESS) can serve as onsite backup sources for solar farms. However, the backup role of the BESS significantly limits its economic value, disincentivizing the BESS deployment due to high investment costs. Hence, it is essential to effectively reduce solar curtailment while ensuring viable operations of the BESS.

Authors: Jinhao Li (Monash University); Changlong Wang (Monash University); Hao Wang (Monash University)

NeurIPS 2022 Function Approximations for Reinforcement Learning Controller for Wave Energy Converters (Papers Track)
Abstract and authors: (click to expand)

Abstract: Waves are a more consistent form of clean energy than wind and solar and the latest Wave Energy Converters (WEC) platforms like CETO 6 have evolved into complex multi-generator designs with a high energy capture potential for financial viability. Multi-Agent Reinforcement Learning (MARL) controller can handle these complexities and control the WEC optimally unlike the default engineering controllers like Spring Damper which suffer from lower energy capture and mechanical stress from the spinning yaw motion. In this paper, we look beyond the normal hyper-parameter and MARL agent tuning and explored the most suitable architecture for the neural network function approximators for the policy and critic networks of MARL which act as its brain. We found that unlike the commonly used fully connected network (FCN) for MARL, the sequential models like transformers and LSTMs can model the WEC system dynamics better. Our novel transformer architecture, Skip Transformer-XL (STrXL), with several gated residual connections in and around the transformer block performed better than the state-of-the-art with faster training convergence. STrXL boosts energy efficiency by an average of 25% to 28% over the existing spring damper (SD) controller for waves at different angles and almost eliminated the mechanical stress from the rotational yaw motion, saving costly maintenance on open seas, and thus reducing the Levelized Cost of wave energy (LCOE). Demo:

Authors: Soumyendu Sarkar (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterpise); Alexander Shmakov (UC Irvine); Sahand Ghorbanpour (Hewlett Packard Enterprise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs); Alexandre Pichard (Carnegie Clean Energy); Mathieu Cocho (Carnegie Clean Energy)

NeurIPS 2022 Robustifying machine-learned algorithms for efficient grid operation (Papers Track)
Abstract and authors: (click to expand)

Abstract: We propose a learning-augmented algorithm, RobustML, for operation of dispatchable generation that exploits the good performance of a machine-learned algorithm while providing worst-case guarantees on cost. We evaluate the algorithm on a realistic two-generator system, where it exhibits robustness to distribution shift while enabling improved efficiency as renewable penetration increases.

Authors: Nicolas Christianson (California Institute of Technology); Christopher Yeh (California Institute of Technology); Tongxin Li (The Chinese University of Hong Kong (Shenzhen)); Mahdi Torabi Rad (Beyond Limits); Azarang Golmohammadi (Beyond Limits, Inc.); Adam Wierman (California Institute of Technology)

NeurIPS 2022 A POMDP Model for Safe Geological Carbon Sequestration (Papers Track)
Abstract and authors: (click to expand)

Abstract: Geological carbon capture and sequestration (CCS), where CO2 is stored in subsurface formations, is a promising and scalable approach for reducing global emissions.However, if done incorrectly, it may lead to earthquakes and leakage of CO2 back to the surface, harming both humans and the environment. These risks are exacerbated by the large amount of uncertainty in the structure of the storage formation. For these reasons, we propose that CCS operations be modeled as a partially observable Markov decision process (POMDP) and decisions be informed using automated planning algorithms. To this end, we develop a simplified model of CCS operations based on a 2D spillpoint analysis that retains many of the challenges and safety considerations of the real-world problem. We show how off-the-shelf POMDP solvers outperform expert baselines for safe CCS planning. This POMDP model can be used as a test bed to drive the development of novel decision-making algorithms for CCS operations.

Authors: Anthony Corso (Stanford University); Yizheng Wang (Stanford Univerity); Markus Zechner (Stanford University); Jef Caers (Stanford University); Mykel J Kochenderfer (Stanford University)

NeurIPS 2022 Stability Constrained Reinforcement Learning for Real-Time Voltage Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: This paper is a summary of a recently submitted work. Deep Reinforcement Learning (DRL) has been recognized as a promising tool to address the challenges in real-time control of power systems. However, its deployment in real-world power systems has been hindered by a lack of explicit stability and safety guarantees. In this paper, we propose a stability constrained reinforcement learning method for real-time voltage control in both single-phase and three-phase distribution grids. The key idea underlying our approach is an explicitly constructed Lyapunov function that certifies stability. We demonstrate the effectiveness of our approach with IEEE test feeders, where the proposed method achieves the best overall performance, while always achieving voltage stability. In contrast, standard RL methods often fail to achieve voltage stability.

Authors: Jie Feng (UCSD); Yuanyuan Shi (University of California San Diego); Guannan Qu (Carnegie Mellon University); Steven Low (California Institute of Technology); Animashree Anandkumar (Caltech); Adam Wierman (California Institute of Technology)

NeurIPS 2022 SustainGym: A Benchmark Suite of Reinforcement Learning for Sustainability Applications (Papers Track)
Abstract and authors: (click to expand)

Abstract: The lack of standardized benchmarks for reinforcement learning (RL) in sustainability applications has made it difficult to both track progress on specific domains and identify bottlenecks for researchers to focus their efforts on. In this paper, we present SustainGym, a suite of two environments designed to test the performance of RL algorithms on realistic sustainability tasks. The first environment simulates the problem of scheduling decisions for a fleet of electric vehicle (EV) charging stations, and the second environment simulates decisions for a battery storage system bidding in an electricity market. We describe the structure and features of the environments and show that standard RL algorithms have significant room for improving performance. We discuss current challenges in introducing RL to real-world sustainability tasks, including physical constraints and distribution shift.

Authors: Christopher Yeh (California Institute of Technology); Victor Li (California Institute of Technology); Rajeev Datta (California Institute of Technology); Yisong Yue (Caltech); Adam Wierman (California Institute of Technology)

NeurIPS 2022 Learn to Bid: Deep Reinforcement Learning with Transformer for Energy Storage Bidding in Energy and Contingency Reserve Markets (Papers Track)
Abstract and authors: (click to expand)

Abstract: As part of efforts to tackle climate change, grid-scale battery energy storage systems (BESS) play an essential role in facilitating reliable and secure power system operation with variable renewable energy (VRE). BESS can balance time-varying electricity demand and supply in the spot market through energy arbitrage and in the frequency control ancillary services (FCAS) market through service enablement or delivery. Effective algorithms are needed for the optimal participation of BESS in multiple markets. Using deep reinforcement learning (DRL), we present a BESS bidding strategy in the joint spot and contingency FCAS markets, leveraging a transformer-based temporal feature extractor to exploit the temporal trends of volatile energy prices. We validate our strategy on real-world historical energy prices in the Australian National Electricity Market (NEM). We demonstrate that the novel DRL-based bidding strategy significantly outperforms benchmarks. The simulation also reveals that the joint bidding in both the spot and contingency FCAS markets can yield a much higher profit than in individual markets. Our work provides a viable use case for the BESS, contributing to the power system operation with high penetration of renewables.

Authors: Jinhao Li (Monash University); Changlong Wang (Monash University); Yanru Zhang (University of Electronic Science and Technology of China); Hao Wang (Monash University)

NeurIPS 2022 Curriculum Based Reinforcement Learning to Avert Cascading Failures in the Electric Grid (Papers Track)
Abstract and authors: (click to expand)

Abstract: We present an approach to integrate the domain knowledge of the electric power grid operations into reinforcement learning (RL) frameworks for effectively learning RL agents to prevent cascading failures. A curriculum-based approach with reward tuning is incorporated into the training procedure by modifying the environment using the network physics. Our procedure is tested on an actor-critic-based agent on the IEEE 14-bus test system using the RL environment developed by RTE, the French transmission system operator (TSO). We observed that naively training the RL agent without the curriculum approach failed to prevent cascading for most test scenarios, while the curriculum based RL agents succeeded in most test scenarios, illustrating the importance of properly integrating domain knowledge of physical systems for real-world RL applications.

Authors: Amarsagar Reddy Ramapuram Matavalam (Arizona State University); Kishan Guddanti (Pacific Northwest National Lab); Yang Weng (Arizona State University)

AAAI FSS 2022 Discovering Transition Pathways Towards Coviability with Machine Learning
Abstract and authors: (click to expand)

Abstract: This paper presents our ongoing French-Brazilian collaborative project which aims at: (1) establishing a diagnosis of socio-ecological coviability for several sites of interest in Nordeste, the North-East region of Brazil (in the states of Paraiba, Ceara, Pernambuco, and Rio Grande do Norte known for their biodiversity hotspots and vulnerabilities to climate change) using advanced data science techniques for multisource and multimodal data fusion and (2) finding transition pathways towards coviability equilibrium using machine learning techniques. Data collected in the field by scientists, ecologists, local actors combined with volunteered information, pictures from smart-phones, and data available on-line from satellite imagery, social media, surveys, etc. can be used to compute various coviability indicators of interest for the local actors. These indicators are useful to characterize and monitor the socio-ecological coviability status along various dimensions of anthropization, human welfare, ecological and biodiversity balance, and ecosystem intactness and vulnerabilities.

Authors: Laure Berti-Equille (IRD) and Rafael Raimundo (UFPB)

NeurIPS 2021 Being the Fire: A CNN-Based Reinforcement Learning Method to Learn How Fires Behave Beyond the Limits of Physics-Based Empirical Models (Papers Track)
Abstract and authors: (click to expand)

Abstract: Wildland fires pose an increasing threat in light of anthropogenic climate change. Fire-spread models play an underpinning role in many areas of research across this domain, from emergency evacuation to insurance analysis. We study paths towards advancing such models through deep reinforcement learning. Aggregating 21 fire perimeters from the Western United States in 2017, we construct 11-layer raster images representing the state of the fire area. A convolution neural network based agent is trained offline on one million sub-images to create a generalizable baseline for predicting the best action - burn or not burn - given the then-current state on a particular fire edge. A series of online, TD(0) Monte Carlo Q-Learning based improvements are made with final evaluation conducted on a subset of holdout fire perimeters. We examine the performance of the learned agent/model against the FARSITE fire-spread model. We also make available a novel data set and propose more informative evaluation metrics for future progress.

Authors: William L Ross (Stanford)

NeurIPS 2021 EcoLight: Reward Shaping in Deep Reinforcement Learning for Ergonomic Traffic Signal Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: Mobility, the environment, and human health are all harmed by sub-optimal control policies in transportation systems. Intersection traffic signal controllers are a crucial part of today's transportation infrastructure, as sub-optimal policies may lead to traffic jams and as a result increased levels of air pollution and wasted time. Many adaptive traffic signal controllers have been proposed in the literature, but research on their relative performance differences is limited. On the other hand, to the best of our knowledge there has been no work that directly targets CO2 emission reduction, even though pollution is currently a critical issue. In this paper, we propose a reward shaping scheme for various RL algorithms that not only produces lowers CO2 emissions, but also produces respectable outcomes in terms of other metrics such as travel time. We compare multiple RL algorithms --- sarsa, and A2C --- as well as diverse scenarios with a mix of different road users emitting varied amounts of pollution.

Authors: Pedram Agand (Simon Fraser University); Alexey Iskrov (Breeze Labs Inc.); Mo Chen (Simon Fraser University)

NeurIPS 2021 Decentralized Safe Reinforcement Learning for Voltage Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: Inverter-based distributed energy resources provide the possibility for fast time-scale voltage control by quickly adjusting their reactive power. The power-electronic interfaces allow these resources to realize almost arbitrary control law, but designing these decentralized controllers is nontrivial. Reinforcement learning (RL) approaches are becoming increasingly popular to search for policy parameterized by neural networks. It is difficult, however, to enforce that the learned controllers are safe, in the sense that they may introduce instabilities into the system. This paper proposes a safe learning approach for voltage control. We prove that the system is guaranteed to be exponentially stable if each controller satisfies certain Lipschitz constraints. The set of Lipschitz bound is optimized to enlarge the search space for neural network controllers. We explicitly engineer the structure of neural network controllers such that they satisfy the Lipschitz constraints by design. A decentralized RL framework is constructed to train local neural network controller at each bus in a model-free setting.

Authors: Wenqi Cui (University of Washington); Jiayi Li (University of Washington); Baosen Zhang (University of Washington)

NeurIPS 2021 Learning to Dissipate Traffic Jams with Piecewise Constant Control (Papers Track)
Abstract and authors: (click to expand)

Abstract: Greenhouse gases (GHGs), particularly carbon dioxide, are a key contributor to climate change. The transportation sector makes up 35% of CO2 emissions in the US and more than 70% of it is due to land transport. Previous work shows that simple driving interventions have the ability to significantly improve traffic flow on the road. Recent work shows that 5% of vehicles using piecewise constant controllers, designed to be compatible to the reaction times of human drivers, can prevent the formation of stop-and-go traffic congestion on a single-lane circular track, thereby mitigating land transportation emissions. Our work extends these results to consider more extreme traffic settings, where traffic jams have already formed, and environments with limited cooperation. We show that even with the added realism of these challenges, piecewise constant controllers, trained using deep reinforcement learning, can essentially eliminate stop-and-go traffic when actions are held fixed for up to 5 seconds. Even up to 10-second action holds, such controllers show congestion benefits over a human driving baseline. These findings are a stepping-stone for near-term deployment of vehicle-based congestion mitigation.

Authors: Mayuri Sridhar (MIT); Cathy Wu ()

NeurIPS 2021 Multi-objective Reinforcement Learning Controller for Multi-Generator Industrial Wave Energy Converter (Papers Track)
Abstract and authors: (click to expand)

Abstract: Waves are one of the greatest sources of renewable energy and are a promising resource to tackle climate challenges by decarbonizing energy generation. Lowering the Levelized Cost of Energy (LCOE) for wave energy converters is key to competitiveness with other forms of clean energy like wind and solar. Also, the complexity of control has gone up significantly with the state-of-the-art multi-generator multi-legged industrial Wave Energy Converters (WEC). This paper introduces a Multi-Agent Reinforcement Learning controller (MARL) architecture that can handle these multiple objectives for LCOE, helping the increase in energy capture efficiency, boosting revenue, reducing structural stress to limit maintenance and operating cost, and adaptively and proactively protect the wave energy converter from catastrophic weather events, preserving investments and lowering effective capital cost. We use a MARL implementing proximal policy optimization (PPO) with various optimizations to help sustain the training convergence in the complex hyperplane. The MARL is able to better control the reactive forces of the generators on multiple tethers (legs) of WEC than the commonly deployed spring damper controller. The design for trust is implemented to assure the operation of WEC within a safe zone of mechanical compliance and guarantee mechanical integrity. This is achieved through reward shaping for multiple objectives of energy capture and penalty for harmful motions to minimize stress and lower the cost of maintenance. We achieved double-digit gains in energy capture efficiency across the waves of different principal frequencies over the baseline Spring Damper controller with the proposed MARL controllers.

Authors: Soumyendu Sarkar (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterpise); Alexander Shmakov (UC Irvine); Sahand Ghorbanpour (Hewlett Packard Enterprise); Ashwin Ramesh Babu (Hewlett Packard Enterprise Labs); Paolo Faraboschi (HPE); mathieu Cocho (Carnegie Clean Energy); Alexandre Pichard (Carnegie Clean Energy); Jonathan Fievez (Carnegie Clean Energy)

NeurIPS 2021 Multi-agent reinforcement learning for renewable integration in the electric power grid (Proposals Track)
Abstract and authors: (click to expand)

Abstract: As part of the fight against climate change, the electric power system is transitioning from fuel-burning generators to renewable sources of power like wind and solar. To allow for the grid to rely heavily on renewables, important operational changes must be done. For example, novel approaches for frequency regulation, i.e., for balancing in real-time demand and generation, are required to ensure the stability of a renewable electric system. Demand response programs in which loads adjust in part their power consumption for the grid's benefit, can be used to provide frequency regulation. In this proposal, we present and motivate a collaborative multi-agent reinforcement learning approach to meet the algorithmic requirements for providing real-time power balancing with demand response.

Authors: Vincent Mai (Mila, Université de Montréal); Tianyu Zhang (Mila, Université de Montréal); Antoine Lesage-Landry (Polytechnique Montréal & GERAD)

NeurIPS 2021 Optimization of Agricultural Management for Soil Carbon Sequestration based on Deep Reinforcement Learning and Large-Scale Simulations (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Soil carbon sequestration in croplands has tremendous potential to help mitigate climate change; however, it is challenging to develop the optimal management practices for maximization of the sequestered carbon as well as the crop yield. This project aims to develop an intelligent agricultural management system using deep reinforcement learning (RL) and large-scale soil and crop simulations. To achieve this, we propose to build a simulator to model and simulate the complex soil-water-plant-atmosphere interaction. By formulating the management decision as an RL problem, we can leverage the state-of-the-art algorithms to train management policies through extensive interactions with the simulated environment. The trained policies are expected to maximize the stored organic carbon while maximizing the crop yield in the presence of uncertain weather conditions. The whole system will be tested using data of soil and crops in both mid-west of the United States and the central region of Portugal. The proposed research will impact food security and climate change, two of the most significant challenges currently facing humanity.

Authors: Jing Wu (University of Illinois Urbana-Champaign); Pan Zhao (University of Illinois Urbana-Champaign); Ran Tao (University of Illinois Urbana-Champaign); Naira Hovakimyan (UIUC); Guillermo Marcillo (University of Illinois at Urbana-Champaign); Nicolas Martin (University of Illinois at Urbana-Champaign); Carla Ferreira (Royal Institute of Technology); Zahra Kalantari (Royal Institute of Technology); Jennifer Hobbs (IntelinAir Inc.)

ICML 2021 Guided A* Search for Scheduling Power Generation Under Uncertainty (Papers Track)
Abstract and authors: (click to expand)

Abstract: Increasing renewables penetration motivates the development of new approaches to operating power systems under uncertainty. We apply a novel approach combining self-play reinforcement learning (RL) and traditional planning to solve the unit commitment problem, an essential power systems scheduling task. Applied to problems with stochastic demand and wind generation, our results show significant cost reductions and improvements to security of supply as compared with an industry-standard mixed-integer linear programming benchmark. Applying a carbon price of \$50/tCO$_2$ achieves carbon emissions reductions of up to 10\%. Our results demonstrate scalability to larger problems than tackled in existing literature, and indicate the potential for RL to contribute to decarbonising power systems.

Authors: Patrick de Mars (UCL); Aidan O'Sullivan (UCL)

ICML 2021 A Reinforcement Learning Approach to Home Energy Management for Modulating Heat Pumps and Photovoltaic Systems (Papers Track)
Abstract and authors: (click to expand)

Abstract: Efficient sector coupling in residential buildings plays a key role in supporting the energy transition. In this study, we analyze the potential of using reinforcement learning (RL) to control a home energy management system. We conduct this study by modeling a representative building with a modulating air-sourced heat pump, a photovoltaic system, a battery, and thermal storage systems for floor heating and hot-water supply. In our numerical analysis, we benchmark our reinforcement learning results using DDPG with the optimal solution generated with model predictive control using a mixed-integer linear model under full information. Our input data, models, and the RL environment, developed using the Julia programming language, will be available in an open-source manner.

Authors: Lissy Langer (TU Berlin)

ICML 2021 Reinforcement Learning for Optimal Frequency Control: A Lyapunov Approach (Papers Track)
Abstract and authors: (click to expand)

Abstract: Renewable energy resources play a vital role in reducing carbon emissions and are becoming increasingly common in the grid. On one hand, they are challenging to integrate into a power system because the lack of rotating mechanical inertia can lead to frequency instabilities. On the other hand, these resources have power electronic interfaces that are capable of implementing almost arbitrary control laws. To design these controllers, reinforcement learning has emerged as a popular method to search for policy parameterized by neural networks. The key challenge with learning based approaches is enforcing the constraint that the learned controller need to be stabilizing. Through a Lyapunov function, we explicitly identify the structure of neural network-based controllers such that they guarantee system stability by design. A recurrent RL architecture is used to efficiently train the controllers and they outperform other approaches as demonstrated by simulations.

Authors: Wenqi Cui (University of Washington); Baosen Zhang (University of Washington)

ICML 2021 Designing Bounded min-knapsack Bandits algorithm for Sustainable Demand Response (Papers Track)
Abstract and authors: (click to expand)

Abstract: Around 40% of global energy produced is consumed by buildings. By using renewable energy resources we can alleviate the dependence on electrical grids. Recent trends focus on incentivizing consumers to reduce their demand consumption during peak hours for sustainable demand response. To minimize the loss, the distributor companies should target the right set of consumers and demand the right amount of electricity reductions. This paper proposes a novel bounded integer min-knapsack algorithm and shows that the algorithm, while allowing for multiple unit reduction, also optimizes the loss to the distributor company within a factor of two (multiplicative) and a problem-dependent additive constant. Existing CMAB algorithms fail to work in this setting due to non-monotonicity of reward function and time-varying optimal sets. We propose a novel algorithm Twin-MinKPDR-CB to learn these compliance probabilities efficiently. Twin-MinKPDR-CB works for non-monotone reward functions bounded min-knapsack constraints and time-varying optimal sets. We find that Twin-MinKPDR-CB achieves sub-linear regret of O(log T) with T being the number of rounds demand response is run.

Authors: Akansha Singh (Indian Institute of Technology, Ropar); Meghana Reddy (Indian Institute of Technology, Ropar); Zoltan Nagy (University of Texas); Sujit P. Gujar (Machine Learning Laboratory, International Institute of Information Technology, Hyderabad); Shweta Jain (Indian Institute of Technology Ropar)

ICML 2021 A Set-Theoretic Approach to Safe Reinforcement Learning in Power Systems (Papers Track)
Abstract and authors: (click to expand)

Abstract: Reducing the carbon footprint of the energy sector will be a vital part of the fight against climate change, and doing so will require the widespread adoption of renewable energy resources. Optimally integrating a large number of these resources requires new control techniques that can both compensate for the variability of renewables and satisfy hard engineering constraints. Reinforcement learning (RL) is a promising approach to data-driven control, but it is difficult to verify that the policies derived from data will be safe. In this paper, we combine RL with set-theoretic control to propose a computationally efficient approach to safe RL. We demonstrate the method on a simplified power system model and compare it with other RL techniques.

Authors: Daniel Tabas (University of Washington); Baosen Zhang (University of Washington)

ICML 2021 Technical support project and analysis of the dissemination of carbon dioxide and methane from Lake Kivu in nature and its impact on biodiversity in the Great Lakes region since 2012 (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Straddling the Democratic Republic of the Congo and Rwanda, at an altitude of 1,460 m, Lake Kivu is one of the ten great lakes in Africa, alongside the main ones that are Victoria and Tanganyika. Kivu contains very high concentrations of gases (carbon dioxide and methane in particular), produced by volcanic activity in the region and the decomposition of organic matter. It has 2,700 km2 of this body of water, a depth that approaches 500 meters in places. It is estimated to contain 60 billion cubic meters of dissolved methane and about 300 billion cubic meters of carbon dioxide accumulated over time. Lake Kivu, located north of Lake Tanganyika and contains a very high amount of carbon dioxide and methane. Carbon dioxide (CO2) and methane (CH4) are both greenhouse gases that affect how well the planet works. The first stays in the atmosphere for a hundred years while the second stays there only for a dozen years. The effect of the dissemination of these in nature prompts me to collect as much data as possible on their circulation and to suggest possible solutions that are consistent with the Paris Agreement. In addition, many wastes come from households and/or small industries in the towns of Bukavu, Goma for the DRC and those of Gyangugu and Gisenyi for Rwanda constitute a high source of CH4 emissions which also contribute to global warming. The exploitation of methane expected in the near future is an additional threat to the sustainable development of ecosystem resources. For various reasons, Lake Kivu constitutes an adequate model for studying the responses of large tropical lakes to changes linked to human activity: indeed, despite its physical and biogeochemical peculiarities, the limnological and ecological processes of its pelagic waters are subject to the same forcings as in other large lakes in the same region, as shown by recent studies.

Authors: Bulonze Chibaderhe (FEMAC Asbl)

ICML 2021 Preserving the integrity of the Canadian northern ecosystems through insights provided by reinforcement learning-based Arctic fox movement models (Proposals Track)
Abstract and authors: (click to expand)

Abstract: Realistic modeling of the movement of the Arctic fox, one of the main predators of the circumpolar world, is crucial to understand the processes governing the distribution of the Canadian Arctic biodiversity. Current methods, however, are unable to adequately account for complex behaviors as well as intra- and interspecific relationships. We propose to harness the potential of reinforcement learning to develop innovative models that will address these shortcomings and provide the backbone to predict how vertebrate communities may be affected by environmental changes in the Arctic, an essential step towards the elaboration of rational conservation actions.

Authors: Catherine Villeneuve (Université Laval); Frédéric Dulude-De Broin (Université Laval); Pierre Legagneux (Université Laval); Dominique Berteaux (Université du Québec à Rimouski); Audrey Durand (Université Laval)

NeurIPS 2020 pymgrid: An Open-Source Python Microgrid Simulator for Applied Artificial Intelligence Research (Papers Track)
Abstract and authors: (click to expand)

Abstract: Microgrids – self-contained electrical grids that are capable of disconnecting from the main grid – hold potential in both tackling climate change mitigation via reducing CO$_2$ emissions and adaptation by increasing infrastructure resiliency. Due to their distributed nature, microgrids are often idiosyncratic; as a result, control of these systems is nontrivial. While microgrid simulators exist, many are limited in scope and in the variety of microgrids they can simulate. We propose \HL{pymgrid}, an open-source Python package to generate and simulate a large number of microgrids, and the first open-source tool that can generate more than 600 different microgrids. \HL{pymgrid} abstracts most of the domain expertise, allowing users to focus on control algorithms. In particular, \HL{pymgrid} is built to be a reinforcement learning (RL) platform, and includes the ability to model microgrids as Markov decision processes. \HL{pymgrid} also introduces two pre-computed list of microgrids, intended to allow for research reproducibility in the microgrid setting.

Authors: Gonzague Henri (Total); Tanguy Levent (Ecole Polytechnique); Avishai Halev (Total, UC Davis); Reda ALAMI (Total R&D); Philippe Cordier (Total S.A.)

NeurIPS 2020 Towards Optimal District Heating Temperature Control in China with Deep Reinforcement Learning (Papers Track)
Abstract and authors: (click to expand)

Abstract: Achieving efficiency gains in Chinese district heating networks, thereby reducing their carbon footprint, requires new optimal control methods going beyond current industry tools. Focusing on the secondary network, we propose a data-driven deep reinforcement learning (DRL) approach to address this task. We build a recurrent neural network, trained on simulated data, to predict the indoor temperatures. This model is then used to train two DRL agents, with or without expert guidance, for the optimal control of the supply water temperature. Our tests in a multi-apartment setting show that both agents can ensure a higher thermal comfort and at the same time a smaller energy cost, compared to an optimized baseline strategy.

Authors: Adrien Le Coz (EDF); Tahar Nabil (EDF); Francois Courtot (EDF)

NeurIPS 2020 Deep Reinforcement Learning in Electricity Generation Investment for the Minimization of Long-Term Carbon Emissions and Electricity Costs (Papers Track)
Abstract and authors: (click to expand)

Abstract: A change from a high-carbon emitting electricity power system to one based on renewables would aid in the mitigation of climate change. Decarbonization of the electricity grid would allow for low-carbon heating, cooling and transport. Investments in renewable energy must be made over a long time horizon to maximise return of investment of these long life power generators. Over these long time horizons, there exist multiple uncertainties, for example in future electricity demand and costs to consumers and investors. To mitigate for imperfect information of the future, we use the deep deterministic policy gradient (DDPG) deep reinforcement learning approach to optimize for a low-cost, low-carbon electricity supply using a modified version of the FTT:Power model. In this work, we model the UK and Ireland electricity markets. The DDPG algorithm is able to learn the optimum electricity mix through experience and achieves this between the years of 2017 and 2050. We find that a change from fossil fuels and nuclear power to renewables, based upon wind, solar and wave would provide a cheap and low-carbon alternative to fossil fuels.

Authors: Alexander J. M. Kell (Newcastle University); Pablo Salas (University of Cambridge); Jean-Francois Mercure (University of Exeter); Matthew Forshaw (Newcastle University); A. Stephen McGough (Newcastle University)

NeurIPS 2020 Revealing the Oil Majors' Adaptive Capacity to the Energy Transition with Deep Multi-Agent Reinforcement Learning (Papers Track)
Abstract and authors: (click to expand)

Abstract: A low-carbon energy transition is transpiring to combat climate change, posing an existential threat to oil and gas companies, particularly the Majors. Though Majors yield the resources and expertise to adapt to low-carbon business models, meaningful climate-aligned strategies have yet to be enacted. A 2-degrees pathways (2DP) wargame was developed to assess climate-compatible pathways for the oil Majors. Recent advances in deep multi-agent reinforcement learning (MARL) have achieved superhuman-level performance in solving high-dimensional continuous control problems. Modeling within a Markovian framework, we present the novel 2DP-MARL model which applies deep MARL methods to solve the 2DP wargame across a multitude of transition scenarios. Designed to best mimic Majors in real- life competition, the model reveals all Majors quickly adapt to low-carbon business models to remain robust amidst energy transition uncertainty. The purpose of this work is provide tangible metrics to support the call for oil Majors to diversify into low-carbon business models and, thus, accelerate the energy transition.

Authors: Dylan Radovic (Imperial College London); Lucas Kruitwagen (University of Oxford); Christian Schroeder de Witt (University of Oxford)

NeurIPS 2020 OfficeLearn: An OpenAI Gym Environment for Building Level Energy Demand Response (Papers Track)
Abstract and authors: (click to expand)

Abstract: Energy Demand Response (DR) will play a crucial role in balancing renewable energy generation with demand as grids decarbonize. There is growing interest in developing Reinforcement Learning (RL) techniques to optimize DR pricing, as pricing set by electric utilities often cannot take behavioral irrationality into account. However, so far, attempts to standardize RL efforts in this area do not exist. In this paper, we present a first of the kind OpenAI gym environment for testing DR with occupant level building dynamics. We demonstrate the variety of parameters built into our office environment allowing the researcher to customize a building to meet their specifications of interest. We hope that this work enables future work in DR in buildings.

Authors: Lucas Spangher (U.C. Berkeley); Akash Gokul (University of California at Berkeley); Utkarsha Agwan (U.C. Berkeley); Joseph Palakapilly (UC Berkeley); Manan Khattar (University of California at Berkeley); Akaash Tawade (University of California at Berkeley); Costas J. Spanos (University of California at Berkeley)

NeurIPS 2019 Stripping off the implementation complexity of physics-based model predictive control for buildings via deep learning (Papers Track)
Abstract and authors: (click to expand)

Abstract: Over the past decade, model predictive control (MPC) has been considered as the most promising solution for intelligent building operation. Despite extensive effort, transfer of this technology into practice is hampered by the need to obtain an accurate controller model with minimum effort, the need of expert knowledge to set it up, and the need of increased computational power and dedicated software to run it. A promising direction that tackles the last two problems was proposed by approximate explicit MPC where the optimal control policies are learned from MPC data via a suitable function approximator, e.g., a deep learning (DL) model. The main advantage of the proposed approach stems from simple evaluation at execution time leading to low computational footprints and easy deployment on embedded HW platforms. We present the energy savings potential of physics-based (also called 'white-box') MPC applied to an office building in Belgium. Moreover, we demonstrate how deep learning approximators can be used to cut the implementation and maintenance costs of MPC deployment without compromising performance. We also critically assess the presented approach by pointing out the major challenges and remaining open-research questions.

Authors: Jan Drgona (Pacific Northwest National Laboratory); Lieve Helsen (KU Leuven); Draguna Vrabie (PNNL)