EcoEval: A Benchmark for Evaluating Large Language Model Handling of Climate Change Misinformation, False Beliefs, and Climate Policy Sentiment (Papers Track)
Nick Lechtenboerger (HPI); Pat Pataranutaporn (MIT Media Lab); Pattie Maes (MIT Media Lab)
Abstract
As Large Language Models (LLMs) become primary sources of factual knowledge, their ability to accurately communicate climate science, resist misinformation, and provide balanced policy guidance becomes critically important. However, existing evaluation frameworks lack a comprehensive assessment of LLM performance across the multifaceted challenges of climate communication. We introduce EcoEval, an open-source benchmark evaluating LLM performance across three dimensions: (1) giving users correct information, while correcting user misconceptions, (2) avoiding generation of fabricated climate content, and (3) expressing balanced climate policy sentiment. Our results span 8 commercially deployed models, revealing substantial variation in policy sentiment, sycophancy, and willingness to generate misinformation.