ICLR 2024 Analyzing the secondary wastewater-treatment process using Faster R-CNN and YOLOv5 object detection algorithms (Papers Track)
Abstract: The activated sludge (AS) process is the most common type of secondary wastewater treatment, applied worldwide. Due to the complexity of microbial communities, imbalances between the different types of bacteria may occur and disturb the process, with pronounced economical and environmental consequences. Microscopic inspection of the morphology of flocs and microorganisms provides key information on AS properties and function. This is a time-consuming, highly skilled, and expensive process that is not readily available in all locations. Thus, most wastewater-treatment plants do not carry out this essential analysis, resulting in frequent operational faults. In this study, we develop a novel deep learning (DL) object detection algorithm to analyze and monitor the AS process based on a unique microscopic image database of flocs and microorganisms. Specifically, we applied YOLOv5 and Faster R-CNN algorithms as tools for segmentation and object detection to analyze the wastewater. The mean average precision (mAP) of the YOLOv5 was 0.67, outperforming the Faster R-CNN by 15%. Histogram equalization preprocessing of both bright-field and phase-contrast images significantly improved the results of the algorithm in all classes. In the case of YOLOv5, the mAP increased by 16.67%, to 0.77, where the AP of protozoa, filaments, and open floc classes outperformed the previous model by over 20%. These results demonstrate the potential of leveraging DL algorithms to enhance the analysis and monitoring of WWTPs in an affordable manner, consequently reducing environmental pollution caused by contaminated effluent. The fundamental challenge addressed herein has important global relevance, especially in an era in which the demand for high-quality wastewater reuse is expected to increase dramatically.

Authors: Offir Inbar (Tel-Aviv University); Moni Shahar (Tel Aviv University); Jacob Gidron (Tel-Aviv University); Ido Cohen (Tel-Aviv University); Dror Avisar (Tel-Aviv University)

ICLR 2024 Estimating the age of buildings from satellite and morphological features to create a pan-EU Digital Building Stock Model (Papers Track)
Abstract: The acceleration in the effects of global warming and the recent turbulences in the energy market are further highlighting the need to act quicker and smarter in terms of decisions to transition to greener energy and reduce our overall energy consumption. With buildings accounting for about 40% of the energy consumption in Europe, it is crucial to have a comprehensive understanding of the building stock and their energy-related characteristics, including their age, in order to make informed decisions for energy savings. This study introduces a novel way to approach the age estimation of buildings at scale, using a machine learning method that integrates satellite-based imagery with morphological features of buildings. The findings demonstrate the benefits of combining these data sources and underscore the importance of incorporating local data to enable accurate prediction across different cities.

Authors: Jeremias Wenzel (Universiteit Twente); Ana M. Martinez (European Commission - Joint Research Centre); Pietro Florio (European Commission - Joint Research Centre); Katarzyna Goch (Institute of Geography and Spatial Organization Polish Academy of Sciences)

NeurIPS 2023 Climate-sensitive Urban Planning through Optimization of Tree Placements (Papers Track)
Abstract: Climate change is increasing the intensity and frequency of many extreme weather events, including heatwaves, which results in increased thermal discomfort and mortality rates. While global mitigation action is undoubtedly necessary, so is climate adaptation, e.g., through climate-sensitive urban planning. Among the most promising strategies is harnessing the benefits of urban trees in shading and cooling pedestrian-level environments. Our work investigates the challenge of optimal placement of such trees. Physical simulations can estimate the radiative and thermal impact of trees on human thermal comfort but induce high computational costs. This rules out optimization of tree placements over large areas and considering effects over longer time scales. Hence, we employ neural networks to simulate the point-wise mean radiant temperatures--a driving factor of outdoor human thermal comfort--across various time scales, spanning from daily variations to extended time scales of heatwave events and even decades. To optimize tree placements, we harness the innate local effect of trees within the iterated local search framework with tailored adaptations. We show the efficacy of our approach across a wide spectrum of study areas and time scales. We believe that our approach is a step towards empowering decision-makers, urban designers and planners to proactively and effectively assess the potential of urban trees to mitigate heat stress.

Authors: Simon Schrodi (University of Freiburg); Ferdinand Briegel (University of Freiburg); Max J. Argus (University Of Freiburg); Andreas Christen (University of Freiburg); Thomas Brox (University of Freiburg)

NeurIPS 2023 A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents (Papers Track)
Abstract: For low carbon transport planning it's essential to grasp the location-specific cause-and-effect mechanisms that the built environment has on travel. Yet, current research falls short in representing causal relationships between the "6D" urban form variables and travel, generalizing across different regions, and modelling urban form effects at high spatial resolution. Here, we address these gaps by utilizing a causal discovery and an explainable machine learning framework to detect urban form effects on intra-city travel emissions based on high-resolution mobility data of six cities across three continents. We show that distance to center, demographics and density indirectly affect other urban form features and that location-specific influences align across cities, yet vary in magnitude. In addition, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits. Our work is a starting point for location-specific analysis of urban form effects on mobility using causal discovery approaches, which is highly relevant municipalities across continents.

Authors: Felix Wagner (TU Berlin, MCC Berlin); Florian Nachtigall (MCC Berlin); Lukas B Franken (University of Edinburgh); Nikola Milojevic-Dupont (Mercator Research Institute on Global Commons and Climate Change (MCC)); Marta C. González (Berkeley); Jakob Runge (TU Berlin); Rafael Pereira (IPEA); Felix Creutzig (Mercator Research Institute on Global Commons and Climate Change (MCC))

NeurIPS 2023 SAM-CD: Change Detection in Remote Sensing Using Segment Anything Model (Papers Track)
Abstract: In remote sensing, Change Detection (CD) refers to locating surface changes in the same area over time. Changes can occur due to man-made or natural activities, and CD is important for analyzing climate changes. The recent advancements in satellite imagery and deep learning allow the development of affordable and powerful CD solutions. The breakthroughs in computer vision Foundation Models (FMs) bring new opportunities for better and more flexible remote sensing solutions. However, solving CD using FMs has not been explored before and this work presents the first FM-based deep learning model, SAM-CD. We propose a novel model that adapts the Segment Anything Model (SAM) for solving CD. The experimental results show that the proposed approach achieves the state of the art when evaluated on two challenging benchmark public datasets LEVIR-CD and DSIFN-CD.

Authors: Faroq ALTam (Elm Company); Thariq Khalid (Elm Company); Athul Mathew (Elm Company); Andrew Carnell (Elm Company); Riad Souissi (Elm Company)

NeurIPS 2023 The built environment and induced transport CO2 emissions: A double machine learning approach to account for residential self-selection (Papers Track)
Abstract: Understanding why travel behavior differs between residents of urban centers and suburbs is key to sustainable urban planning. Especially in light of rapid urban growth, identifying housing locations that minimize travel demand and induced CO2 emissions is crucial to mitigate climate change. While the built environment plays an important role, the precise impact on travel behavior is obfuscated by residential self-selection. To address this issue, we propose a double machine learning approach to obtain unbiased, spatially-explicit estimates of the effect of the built environment on travel-related CO2 emissions for each neighborhood by controlling for residential self-selection. We examine how socio-demographics and travel-related attitudes moderate the effect and how it decomposes across the 5Ds of the built environment. Based on a case study for Berlin and the travel diaries of 32,000 residents, we find that the built environment causes household travel-related CO2 emissions to differ by a factor of almost two between central and suburban neighborhoods in Berlin. To highlight the practical importance for urban climate mitigation, we evaluate current plans for 64,000 new residential units in terms of total induced transport CO2 emissions. Our findings underscore the significance of spatially differentiated compact development to decarbonize the transport sector.

Authors: Florian Nachtigall (Technical University of Berlin); Felix Wagner (TU Berlin, MCC Berlin); Peter Berrill (Technical University of Berlin); Felix Creutzig (Mercator Research Institute on Global Commons and Climate Change (MCC))

NeurIPS 2023 Towards autonomous large-scale monitoring the health of urban trees using mobile sensing (Papers Track)
Abstract: Healthy urban greenery is a fundamental asset to mitigate climate change phenomenons such as extreme heat and air pollution. However, urban trees are often affected by abiotic and biotic stressors that hamper their functionality, and whenever not timely managed, even their survival. The current visual or instrumented inspection techniques often require a high amount of human labor making frequent assessments infeasible at a city-wide scale. In this work, we present the GreenScan Project, a ground-based sensing system designed to provide health assessment of urban trees at high space-time resolutions, with low costs. The system utilises thermal and multi-spectral imaging sensors, fused using computer vision models to estimate two tree health indexes, namely NDVI and CTD. Preliminary evaluation of the system was performed through data collection experiments in Cambridge, USA. Overall, this work illustrates the potential of autonomous mobile ground-based tree health monitoring on city-wide scales at high temporal resolutions with low-costs.

Authors: Akshit Gupta (Delft University of Technology); Martine Rutten (Delft University of Technology); RANGA RAO VENKATESHA PRASAD (TUDelft); Remko Uijlenhoet (Delft University of Technology)

ICLR 2023 CityLearn: A Tutorial on Reinforcement Learning Control for Grid-Interactive Efficient Buildings and Communities (Tutorials Track)
Abstract: Buildings are responsible for up to 75% of electricity consumption in the United States. Grid-Interactive Efficient Buildings can provide flexibility to solve the issue of power supply-demand mismatch, particularly brought about by renewables. Their high energy efficiency and self-generating capabilities can reduce demand without affecting the building function. Additionally, load shedding and shifting through smart control of storage systems can further flatten the load curve and reduce grid ramping cost in response to rapid decrease in renewable power supply. The model-free nature of reinforcement learning control makes it a promising approach for smart control in grid-interactive efficient buildings, as it can adapt to unique building needs and functions. However, a major challenge for the adoption of reinforcement learning in buildings is the ability to benchmark different control algorithms to accelerate their deployment on live systems. CityLearn is an open source OpenAI Gym environment for the implementation and benchmarking of simple and advanced control algorithms, e.g., rule-based control, model predictive control or deep reinforcement learning control thus, provides solutions to this challenge. This tutorial leverages CityLearn to demonstrate different control strategies in grid-interactive efficient buildings. Participants will learn how to design three controllers of varying complexity for battery management using a real-world residential neighborhood dataset to provide load shifting flexibility. The algorithms will be evaluated using six energy flexibility, environmental and economic key performance indicators, and their benefits and shortcomings will be identified. By the end of the tutorial, participants will acquire enough familiarity with the CityLearn environment for extended use in new datasets or personal projects.

Authors: Kingsley E Nweye (The University of Texas at Austin); Allen Wu (The University of Texas at Austin); Hyun Park (The University of Texas at Austin); Yara Almilaify (The University of Texas at Austin); Zoltan Nagy (The University of Texas at Austin)

ICLR 2023 On the impact of small-data diversity on forecasts: evidence from meteorologically-driven electricity demand in Mediterranean zones. (Papers Track)
Abstract: In this paper, we compare the improvement of probabilistic electricity demand forecasts for three specific coastal and island regions using raw and pre-computed meteorological features based on empirically-tested formulations drawn from climate science literature. Typically for the general task of time-series forecasting with strong weather/climate drivers, go-to models like the Autoregressive Integrated Moving Average (ARIMA) model are built with assumptions of how independent variables will affect a dependent one and are at best encoded with a handful of exogenous features with known impact. Depending on the geographical region and/or cultural practices of a population, such a selection process may yield a non-optimal feature set which would ultimately drive a weak impact on underline demand forecasts. The aim of this work is to assess the impact of a documented set of meteorological features on electricity demand using deep learning models in comparative studies. Leveraging the defining computational architecture of the Temporal Fusion Transformer (TFT), we discover the unimportance of weather features for improving probabilistic forecasts for the targeted regions. However, through experimentation, we discover that the more stable electricity demand of the coastal Mediterranean regions, the Ceuta and Melilla autonomous cities in Morocco, improved the forecast accuracy of the strongly tourist-driven electricity demand for the Balearic islands located in Spain during the time of travel restrictions (i.e., during COVID19 (2020))--a root mean squared error (RMSE) from ~0.090 to ~0.012 with a substantially improved 10th/90th quantile bounding.

Authors: Reginald Bryant (IBM Research - Africa); Julian Kuehnert (IBM Research)

NeurIPS 2022 Scene-to-Patch Earth Observation: Multiple Instance Learning for Land Cover Classification (Papers Track)
Abstract: Land cover classification (LCC), and monitoring how land use changes over time, is an important process in climate change mitigation and adaptation. Existing approaches that use machine learning with Earth observation data for LCC rely on fully-annotated and segmented datasets. Creating these datasets requires a large amount of effort, and a lack of suitable datasets has become an obstacle in scaling the use of LCC. In this study, we propose Scene-to-Patch models: an alternative LCC approach utilising Multiple Instance Learning (MIL) that requires only high-level scene labels. This enables much faster development of new datasets whilst still providing segmentation through patch-level predictions, ultimately increasing the accessibility of using LCC for different scenarios. On the DeepGlobe-LCC dataset, our approach outperforms non-MIL baselines on both scene- and patch-level prediction. This work provides the foundation for expanding the use of LCC in climate change mitigation methods for technology, government, and academia.

Authors: Joseph Early (University of Southampton); Ying-Jung C Deweese (Georgia Insititute of Technology); Christine Evers (University of Southampton); Sarvapali Ramchurn (University of Southampton)

NeurIPS 2022 Land Use Prediction using Electro-Optical to SAR Few-Shot Transfer Learning (Papers Track)
Abstract: Satellite image analysis has important implications for land use, urbanization, and ecosystem monitoring. Deep learning methods can facilitate the analysis of different satellite modalities, such as electro-optical (EO) and synthetic aperture radar (SAR) imagery, by supporting knowledge transfer between the modalities to compensate for individual shortcomings. Recent progress has shown how distributional alignment of neural network embeddings can produce powerful transfer learning models by employing a sliced Wasserstein distance (SWD) loss. We analyze how this method can be applied to Sentinel-1 and -2 satellite imagery and develop several extensions toward making it effective in practice. In an application to few-shot Local Climate Zone (LCZ) prediction, we show that these networks outperform multiple common baselines on datasets with a large number of classes. Further, we provide evidence that instance normalization can significantly stabilize the training process and that explicitly shaping the embedding space using supervised contrastive learning can lead to improved performance.

Authors: Marcel Hussing (University of Pennsylvania); Karen Li (University of Pennsylvania); Eric Eaton (University of Pennsylvania)

NeurIPS 2022 Estimating Chicago’s tree cover and canopy height using multi-spectral satellite imagery (Papers Track)
Abstract: Information on urban tree canopies is fundamental to mitigating climate change as well as improving quality of life. Urban tree planting initiatives face a lack of up-to-date data about the horizontal and vertical dimensions of the tree canopy in cities. We present a pipeline that utilizes LiDAR data as ground-truth and then trains a multi-task machine learning model to generate reliable estimates of tree cover and canopy height in urban areas using multi-source multi-spectral satellite imagery for the case study of Chicago.

Authors: John Francis (University College London)

NeurIPS 2022 Heat Demand Forecasting with Multi-Resolutional Representation of Heterogeneous Temporal Ensemble (Papers Track)
Abstract: One of the primal challenges faced by utility companies is ensuring efficient supply with minimal greenhouse gas emissions. The advent of smart meters and smart grids provide an unprecedented advantage in realizing an optimised supply of thermal energies through proactive techniques such as load forecasting. In this paper, we propose a forecasting framework for heat demand based on neural networks where the time series are encoded as scalograms equipped with the capacity of embedding exogenous variables such as weather, and holiday/non-holiday. Subsequently, CNNs are utilized to predict the heat load multi-step ahead. Finally, the proposed framework is compared with other state-of-the-art methods, such as SARIMAX and LSTM. The quantitative results from retrospective experiments show that the proposed framework consistently outperforms the state-of-the-art baseline method with real-world data acquired from Denmark. A minimal mean error of 7.54% for MAPE and 417kW for RMSE is achieved with the proposed framework in comparison to all other methods.

Authors: Satyaki Chatterjee (Pattern Recognition Lab, FAU Erlangen-Nuremberg); Adithya Ramachandran (Pattern Recognition Lab, Friedrich Alexander University, Erlangen); Thorkil Flensmark Neergaard (Brønderslev Forsyning A/S); Andreas K Maier (Pattern Recognition Lab, FAU Erlangen-Nuremberg); Siming Bayer (Pattern Recognition Lab, Friedrich-Alexander University)

NeurIPS 2022 Machine Learning for Activity-Based Road Transportation Emissions Estimation (Papers Track)
Abstract: Measuring and attributing greenhouse gas (GHG) emissions remains a challenging problem as the world strives towards meeting emissions reductions targets. As a significant portion of total global emissions, the road transportation sector represents an enormous challenge for estimating and tracking emissions at a global scale. To meet this challenge, we have developed a hybrid approach for estimating road transportation emissions that combines the strengths of machine learning and satellite imagery with localized emissions factors data to create an accurate, globally scalable, and easily configurable GHG monitoring framework.

Authors: Derek Rollend (JHU); Kevin Foster (JHU); Tomek Kott (JHU); Rohita Mocharla (JHU); Rodrigo Rene Rai Munoz Abujder (Johns Hopkins Applied Physics Laboratory); Neil Fendley (JHU/APL); Chace Ashcraft (JHU/APL); Frank Willard (JHU); Marisa Hughes (JHU)

NeurIPS 2022 Analyzing Micro-Level Rebound Effects of Energy Efficient Technologies (Papers Track)
Abstract: Energy preservation is central to prevent resource depletion, climate change and environment degradation. Investment in raising efficiency of appliances is among the most significant attempts to save energy. Ironically, introduction of many such energy saving appliances increased the total energy consumption instead of reducing it. This effect in literature is attributed to the inherent Jevons paradox (JP) and optimism bias (OB) in consumer behavior. However, the magnitude of these instincts vary among different people. Identification of this magnitude for each household can enable the development of appropriate policies that induce desired energy saving behaviour. Using the RECS 2015 dataset, the paper uses machine learning for each electrical appliance to determine the dependence of their total energy consumption on their energy star rating. This shows that only substitutable appliances register increase in energy demand upon boosted efficiency. Lastly, an index is noted to indicate the varying influence of JP and OB on different households.

Authors: Mayank Jain (University College Dublin); Mukta Jain (Delhi School of Economics); Tarek T. Alskaif (Wageningen University); Soumyabrata Dev (University College Dublin)

NeurIPS 2022 A Global Classification Model for Cities using ML (Papers Track)
Abstract: This paper develops a novel data set for three key resources use; namely, food, water, and energy, for 9000 cities globally. The data set is then utilized to develop a clustering approach as a starting point towards a global classification model. This novel clustering approach aims to contribute to developing an inclusive view of resource efficiency for all urban centers globally. The proposed clustering algorithm is comprised of three steps: first, outlier detection to address specific city characteristics, then a Variational Autoencoder (VAE), and finally, Agglomerative Clustering (AC) to improve the classification results. Our results show that this approach is more robust and yields better results in creating delimited clusters with high Calinski-Harabasz Index scores and Silhouette Coefficient than other baseline clustering methods.

Authors: Doron Hazan (MIT); Mohamed Habashy (Massachusetts Institute of Technology); Mohanned ElKholy (Massachusetts Institute of Technology); Omer Mousa (American University in Cairo); Norhan M Bayomi (MIT Environmental Solutions Initiative); Matias Williams (Massachusetts Institute of Technology); John Fernandez (Massachusetts Institute of Technology)

NeurIPS 2022 Learning Surrogates for Diverse Emission Models (Papers Track)
Abstract: Transportation plays a major role in global CO2 emission levels, a factor that directly connects with climate change. Roadway interventions that reduce CO2 emission levels have thus become a timely requirement. An integral need in assessing the impact of such roadway interventions is access to industry-standard programmatic and instantaneous emission models with various emission conditions such as fuel types, vehicle types, cities of interest, etc. However, currently, there is a lack of well-calibrated emission models with all these properties. Addressing these limitations, this paper presents 1100 programmatic and instantaneous vehicular CO2 emission models with varying fuel types, vehicle types, road grades, vehicle ages, and cities of interest. We hope the presented emission models will facilitate future research in tackling transportation-related climate impact. The released version of the emission models can be found here.

Authors: Edgar Ramirez Sanchez (MIT); Catherine H Tang (Massachusetts Institute of Technology); Vindula Jayawardana (MIT); Cathy Wu (MIT)

NeurIPS 2022 Modelling the performance of delivery vehicles across urban micro-regions to accelerate the transition to cargo-bike logistics (Proposals Track)
Abstract: Light goods vehicles (LGV) used extensively in the last mile of delivery are one of the leading polluters in cities. Cargo-bike logistics has been put forward as a high impact candidate for replacing LGVs, with experts estimating over half of urban van deliveries being replaceable by cargo bikes, due to their faster speeds, shorter parking times and more efficient routes across cities. By modelling the relative delivery performance of different vehicle types across urban micro-regions, machine learning can help operators evaluate the business and environmental impact of adding cargo-bikes to their fleets. In this paper, we introduce two datasets, and present initial progress in modelling urban delivery service time (e.g. cruising for parking, unloading, walking). Using Uber’s H3 index to divide the cities into hexagonal cells, and aggregating OpenStreetMap tags for each cell, we show that urban context is a critical predictor of delivery performance.

Authors: Max C Schrader (University of Alabama); Navish Kumar (IIT Kharagpur); Nicolas Collignon (University of Edinburgh); Maria S Astefanoaei (IT University of Copenhagen); Esben Sørig (Kale Collective); Soonmyeong Yoon (Kale Collective); Kai Xu (University of Edinburgh); Akash Srivastava (MIT-IBM)

NeurIPS 2022 Urban Heat Island Detection and Causal Inference Using Convolutional Neural Networks (Proposals Track)
Abstract: Compared to rural areas, urban areas experience higher temperatures for longer periods of time because of the urban heat island (UHI) effect. This increased heat stress leads to greater mortality, increased energy demand, regional changes to precipitation patterns, and increased air pollution. Urban developers can minimize the UHI effect by incorporating features that promote air flow and heat dispersion (e.g., increasing green space). However, understanding which urban features to implement is complex, as local meteorology strongly dictates how the environment responds to changes in urban form. In this proposal we describe a methodology for estimating the causal relationship between changes in urban form and changes in the UHI effect. Changes in urban form and temperature changes are measured using convolutional neural networks, and a causal inference matching approach is proposed to estimate causal relationships. The success of this methodology will enable urban developers to implement city-specific interventions to mitigate the warming planet's impact on cities.

Authors: Zachary D Calhoun (Duke University); Ziyang Jiang (Duke University); Mike Bergin (Duke University); David Carlson (Duke University)

NeurIPS 2022 Estimating Heating Loads in Alaska using Remote Sensing and Machine Learning Methods (Proposals Track)
Abstract: Alaska and the larger Arctic region are in much greater need of decarbonization than the rest of the globe as a result of the accelerated consequences of climate change over the past ten years. Heating for homes and businesses accounts for over 75% of the energy used in the Arctic region. However, the lack of thorough and precise heating load estimations in these regions poses a significant obstacle to the transition to renewable energy. In order to accurately measure the massive heating demands in Alaska, this research pioneers a geospatial-first methodology that integrates remote sensing and machine learning techniques. Building characteristics such as height, size, year of construction, thawing degree days, and freezing degree days are extracted using open-source geospatial information in Google Earth Engine (GEE). These variables coupled with heating load forecasts from the AK Warm simulation program are used to train models that forecast heating loads on Alaska’s Railbelt utility grid. Our research greatly advances geospatial capability in this area and considerably informs the decarbonization activities currently in progress in Alaska.

Authors: Madelyn Gaumer (University of Washington); Nick Bolten (Paul G. Allen School of Computer Science and Engineering, University of Washington); Vidisha Chowdhury (Heinz College of Information Systems and Public Policy, Carnegie Mellon University); Philippe Schicker (Heinz College of Information Systems and Public Policy, Carnegie Mellon University); Shamsi Soltani (Department of Epidemiology and Population Health, Stanford University School of Medicine); Erin D Trochim (University of Alaska Fairbanks)

NeurIPS 2022 Automating the creation of LULC datasets for semantic segmentation (Tutorials Track)
Abstract: High resolution and accurate Land Use and Land Cover mapping (LULC) datasets are increasingly important and can be widely used in monitoring climate change impacts in agriculture, deforestation, and the carbon cycle. These datasets represent physical classifications of land types and spatial information over the surface of the Earth. These LULC datasets can be leveraged in a plethora of research topics and industries to mitigate and adapt to environmental changes. High resolution urban mappings can be used to better monitor and estimate building albedo and urban heat island impacts, and accurate representation of forests and vegetation can even be leveraged to better monitor the carbon cycle and climate change through improved land surface modelling. The advent of machine learning (ML) based CV techniques over the past decade provides a viable option to automate LULC mapping. One impediment to this has been the lack of large ML datasets. Large vector datasets for LULC are available, but can’t be used directly by ML practitioners due to a knowledge gap in transforming the input into a dataset of paired satellite images and segmentation masks. We demonstrate a novel end-to-end pipeline for LULC dataset creation that takes vector land cover data and provides a training-ready dataset. We will use Sentinel-2 satellite imagery and the European Urban Atlas LULC data. The pipeline manages everything from downloading satellite data, to creating and storing encoded segmentation masks and automating data checks. We then use the resulting dataset to train a semantic segmentation model. The aim of the pipeline is to provide a way for users to create their own custom datasets using various combinations of multispectral satellite and vector data. In addition to presenting the pipeline, we aim to provide an introduction to multispectral imagery, geospatial data and some of the challenges in using it for ML.

Authors: Sambhav S Rohatgi (; Anthony Mucia (