Reducing the Barriers of Acquiring Ground-truth from Biodiversity Rich Audio Datasets Using Intelligent Sampling Techniques (Papers Track)

Jacob G Ayers (UC San Diego); Sean Perry (UC San Diego); Vaibhav Tiwari (UC San Diego); Mugen Blue (Cal Poly San Luis Obispo); Nishant Balaji (UC San Diego); Curt Schurgers (UC San Diego); Ryan Kastner (University of California San Diego); Mathias Tobler (San Diego Zoo Wildlife Alliance); Ian Ingram (San Diego Zoo Wildlife Alliance)

Paper PDF Recorded Talk NeurIPS 2021 Poster Cite
Ecosystems & Biodiversity Computer Vision & Remote Sensing


The potential of passive acoustic monitoring (PAM) as a method to reveal the consequences of climate change on the biodiversity that make up natural soundscapes can be undermined by the discrepancy between the low barrier of entry to acquire large field audio datasets and the higher barrier of acquiring reliable species level training, validation, and test subsets from the field audio. These subsets from a deployment are often required to verify any machine learning models used to assist researchers in understanding the local biodiversity. Especially as many models convey promising results from various sources that may not translate to the collected field audio. Labeling such datasets is a resource intensive process due to the lack of experts capable of identifying bioacoustics at a species level as well as the overwhelming size of many PAM audiosets. To address this challenge, we have tested different sampling techniques on an audio dataset collected over a two-week long August audio array deployment on the Scripps Coastal Reserve (SCR) Biodiversity Trail in La Jolla, California. These sampling techniques involve creating four subsets using stratified random sampling, limiting samples to the daily bird vocalization peaks, and using a hybrid convolutional neural network (CNN) and recurrent neural network (RNN) trained for bird presence/absence audio classification. We found that a stratified random sample baseline only achieved a bird presence rate of 44% in contrast with a sample that randomly selected clips with high hybrid CNN-RNN predictions that were collected during bird activity peaks at dawn and dusk yielding a bird presence rate of 95%. The significantly higher bird presence rate demonstrates how intelligent, machine learning-assisted selection of audio data can significantly reduce the amount of time that domain experts listen to audio without vocalizations of interest while building a ground truth for machine learning models.

Recorded Talk (direct link)