One Prompt Fits All: Visual Prompt-Tuning for Remote Sensing Segmentation (Tutorials Track) Spotlight

Xuekun Wang (Vector Institute); John Willes (Vector Institute); Deval Pandya (Vector Institute)

Slides PDF Poster File Cite
Computer Vision & Remote Sensing Earth Observation & Monitoring Forests

Abstract

Image segmentation is crucial in climate change research for analyzing satellite imagery. This technique is vital for ecosystems mapping, natural disasters assessment, and urban and agricultural planning. The advent of vision-based foundational models like the Segment Anything Model (SAM) opens new avenues in climate research and remote sensing (RS). SAM can perform segmentation tasks on any object from manually-crafted prompts. However, the efficacy of SAM largely depends on the quality of these prompts. This issue is particularly pronounced with RS data, which are inherently complex. To use SAM for accurate segmentation at scale for RS, one would need to create complex prompts for each image, which typically involves selecting dozens of points. To address this, we introduce Prompt-Tuned SAM (PT-SAM), a method that minimizes the need for manual input through a trainable, lightweight prompt embedding. This embedding captures key semantic information for specific objects of interest that would be applicable to unseen images. Our approach merges the zero-shot generalization capabilities of the pre-trained SAM model with supervised learning. Importantly, the training process for the prompt embedding not only has minimal hardware requirements, allowing it to be conducted on a CPU, but it also requires only a small dataset. With PT-SAM, image segmentation on RS data can be performed at scale without human intervention, achieving accuracies comparable to those of human-designed prompts with SAM. For example, PT-SAM can be used for analyzing forest cover across vast areas, a key factor in understanding the impact of human activities on forests. Its capability to segment a multitude of images makes it ideal for monitoring widespread land-cover changes, providing deeper insights into urbanization. This tutorial will explore how to train and utilize PT-SAM for large-scale segmentation tasks, specifically focusing on training embeddings that capture forests, and buildings.