Holistic Sustainability for Geo-Distributed Data Centers using Hierarchical Optimization (Papers Track)
Antonio Guillen-Perez (Hewlett Packard Enterprise); Avisek Naug (Hewlett Packard Enterprise); Vineet Gundecha (Hewlett Packard Enterprise); Sahand Ghorbanpour (Hewlett Packard Enterprise); Ricardo Luna Gutierrez (Hewlett Packard Enterprise); Soumyendu Sarkar (Hewlett Packard Enterprise)
Abstract
The escalating energy demands and carbon footprint of large-scale AI necessitate intelligent workload management across globally distributed data centers. This challenge is complex, involving a dynamic interplay of time-varying grid carbon intensity, electricity prices, and data center cooling efficiency. In this work, we use a high-fidelity simulation to systematically investigate the distinct impacts of geographical and temporal scheduling decisions. Our results demonstrate that a multi-agent reinforcement learning approach, which jointly optimizes both spatial and temporal task placement, significantly reduces carbon emissions and operational costs compared to common industry heuristics. We find this optimization reveals a critical trade-off, as the agent strategically uses temporal deferral to wait for favorable conditions, which in turn increases Service Level Agreement violations. Building on this optimized global scheduler, we show that a hierarchical approach is essential for maximizing impact. By introducing a local reinforcement learning agent to dynamically control HVAC systems, we unlock an additional 11.5% reduction in carbon emissions, with further gains achieved by simulating a heat recovery unit. These findings underscore that achieving holistic sustainability requires moving beyond isolated scheduling problems. Instead, we must co-optimize logical workloads and physical infrastructure in a tightly integrated, hierarchical optimization framework.