Automatic data cleaning via tensor factorization for large urban environmental sensor networks (Papers Track)

Yue Hu (Vanderbilt University); Yanbing Wang (Vanderbilt University); Canwen Jiao (Vanderbilt University); Rajesh Sankaran (Argonne National Lab); Charles Catlett (Argonne National Lab); Daniel Work (Vanderbilt University)

Paper PDF


The US Environmental Protection Agency identifies that urban heat islands can negatively impact a community’s environment and quality of life. Using low cost urban sensing networks, it is possible to measure the impacts of mitigation strategies at a fine-grained scale, informing context-aware policies and infrastructure design. However, fine-grained city-scale data analysis is complicated by tedious data cleaning including removing outliers and imputing missing data. To address the challenge of data cleaning, this article introduces a robust low-rank tensor factorization method to automatically correct anomalies and impute missing entries for high-dimensional urban environmental datasets. We validate the method on a synthetically degraded National Oceanic and Atmospheric Administration temperature dataset, with a recovery error of 4%, and apply it to the Array of Things city-scale sensor network in Chicago, IL.