Resource Efficient and Generalizable Representation Learning of High-Dimensional Weather and Climate Data (Papers Track)

Juan Nathaniel (Columbia University); Marcus Freitag (IBM); Patrick Curran (Environment and Climate Change Canada); Isabel Ruddick (Environment and Climate Change Canada); Johannes Schmude (IBM)

Paper PDF Poster File NeurIPS 2023 Poster Cite
Unsupervised & Semi-Supervised Learning Meta- and Transfer Learning

Abstract

We study self-supervised representation learning on high-dimensional data under resource constraints. Our work is motivated by applications of vision transformers to weather and climate data. Such data frequently comes in the form of tensors that are both higher dimensional and of larger size than the RGB imagery one encounters in many computer vision experiments. This raises scaling issues and brings up the need to leverage available compute resources efficiently. Motivated by results on masked autoencoders, we show that it is possible to use sampling of subtensors as the sole augmentation strategy for contrastive learning with a sampling ratio of $\sim$1\%. This is to be compared to typical masking ratios of $75\%$ or $90\%$ for image and video data respectively. In an ablation study, we explore extreme sampling ratios and find comparable skill for ratios as low as $\sim$0.0625\%. Pursuing efficiencies, we are finally investigating whether it is possible to generate robust embeddings on dimension values which were not present at training time. We answer this question to the positive by using learnable position encoders which have continuous dependence on dimension values.