Bioacoustic Multi-Step Attention: Underwater Ecosystem Monitoring in Climate Change Context (Papers Track)

Amine Razig (Insitut Polytechnique de Paris); Youssef Soulaymani (Universite de Montreal); Loubna Benabbou (Universite du Quebec a Rimouski); Pierre Cauchy (Universite du Quebec a Rimouski)

Paper PDF Slides PDF Poster File Cite
Ecosystems & Biodiversity Oceans & Marine Systems Computer Vision & Remote Sensing

Abstract

Automated monitoring of marine mammals in the St. Lawrence Estuary faces extreme challenges: calls span low-frequency moans to ultrasonic clicks, often overlap, and are embedded in variable anthropogenic and environmental noise. We introduce a multi-modal, attention-guided framework that first segments spectrograms to generate soft masks of biologically relevant energy and then fuses these masks with the raw inputs for multi-band, denoised classification. Image and mask embeddings are integrated via mid-level fusion, enabling the model to focus on salient spectrogram regions while preserving global context. Using real-world recordings from the Saguenay–St. Lawrence Marine Park Research Station in Canada, we demonstrate that segmentation-driven attention and mid-level fusion improve signal discrimination, reduce false positive detections, and produce reliable representations for operational marine mammal monitoring across diverse environmental conditions and signal-to-noise ratios. By integrating attention-guided denoising with biodiversity-oriented evaluation metrics, our framework transforms raw hydrophone data streams into robust, operationally actionable presence signals, thereby supporting marine biodiversity conservation and climate-adaptation monitoring initiatives.