ForestViT: A Vision Transformer Network for Convolution-free Multi-label Image Classification in Deforestation Analysis (Papers Track)

Maria Kaselimi (National Technical University of Athens); Athanasios Voulodimos (University of West Attica); Ioannis Daskalopoulos (University of West Attica); Nikolaos Doulamis (National Technical University of Athens); Anastasios Doulamis (Technical University of Crete)

Understanding the dynamics of deforestation as well as land uses of neighboring areas is of vital importance for the design and development of appropriate forest conservation and management policies. In this paper, we approach deforestation as a multi-label classification problem in an endeavor to capture the various relevant land uses from satellite images. To this end, we propose a multi-label vision transformer model, ForestViT, which leverages the benefits of self-attention mechanism, obviating any convolution operations involved in commonly used deep learning models utilized for deforestation detection.