Machine Learning Prediction of Soil Organic Carbon in Southeast Asia: Methods and Climate Implications (Proposals Track)

Tram Tran (Denison University)

Slides PDF Cite
Carbon Capture & Sequestration Agriculture & Food Interpretable ML

Abstract

Soil organic carbon (SOC) is a critical indicator of soil health and a key lever for climate mitigation. Yet, SOC monitoring remains limited by costly and labor-intensive soil sampling. While machine learning (ML) methods have been applied to SOC prediction, most rely on global datasets or remote sensing proxies that overlook local management practices. This study develops ML models specifically tailored to Southeast Asia, using a harmonized dataset of 2600 field observations enriched with agricultural management metadata such as tillage, cropping systems, and fertilizer use. We compare Random Forest (RF) and XGBoost models under cross-validation, achieving strong predictive performance (R² = 0.8) while emphasizing interpretability. Beyond variable importance, we identify how management and climate factors—such as soil thickness, precipitation, and crop type—shape SOC variability under regional conditions. Unlike process-based models (e.g., RothC), our approach captures localized effects of smallholder farming practices, offering a more context-sensitive tool. The ultimate application is to inform sustainable agriculture strategies and support smallholder farmers in accessing carbon finance by quantifying SOC gains from improved practices. By combining regional data with interpretable ML, this work contributes both methodological insights and practical pathways for climate mitigation in one of the world’s most vulnerable agricultural regions.