Reducing the barriers of acquiring species-level ground-truth on passive acoustic monitoring data using neural network assisted intelligent sampling techniques (Papers Track)

Jacob G Ayers (UC San Diego); Ryan Kastner (University of California San Diego)



The potential of passive acoustic monitoring (PAM) as a method to reveal the consequences of climate change on the biodiversity that make up natural soundscapes can be undermined by the discrepancy between the low barrier of entry to acquire large field audio datasets and the higher barrier of acquiring reliable species level training, validation, and test subsets from the field audio. These subsets from a deployment are often required to verify any machine learning models used to assist researchers in understanding the local biodiversity. Especially as many models convey promising results from various sources that may not translate to the collected field audio. Labeling such datasets is a resource intensive process due to the lack of experts capable of identifying bioacoustics at a species level as well as the overwhelming size of many PAM audiosets. To address this challenge, we have tested different sampling techniques on an audio dataset collected over a two-week long August audio array deployment on the Scripps Coastal Reserve (SCR) Biodiversity Trail in La Jolla, California. These sampling techniques involve creating four subsets using stratified random sampling, limiting samples to the daily bird vocalization peaks, and using a hybrid convolutional neural network (CNN) and recurrent neural network (RNN) trained for bird presence/absence audio classification. We found that a stratified random sample baseline only achieved a bird presence rate of 44\% in contrast with a sample that randomly selected clips with high hybrid CNN-RNN predictions that were collected during bird activity peaks at dawn and dusk yielding a bird presence rate of 95\%. The significantly higher bird presence rate demonstrates how intelligent, machine learning-assisted selection of audio data can significantly reduce the amount of time that domain experts listen to audio without vocalizations of interest while building a ground truth for machine learning models.