Denoising of Ecological Bird Song Recordings Using Autoencoder Model

< back
[pdf] [code]

In the task of bird song detection, it is crucial to consider the impact of environmental noise, as both xeno-canto and Warblr accept user-uploaded recordings, which often include recordings made with smartphones in various environments. These recordings may contain uncertain environmental noises, such as white noise, footsteps, or even sharp sounds that can occur at the beginning of a recording. As Denton et al.1 described, the presence of significant environmental noise and overlapping vocalizations in the wildlife audio recordings can interface with accurate classification. Apart from the white noise in the background, another possible source of noise is during the dawn chorus, when many species are singing at the same time.

In the implementation of autoencoder for this coursework, the effective removal of background white noise will be the main objective. Two models, including autoencoder denoising model and a CNN model are used for the evaluation of the final classification task.

The result of the evaluation suggests that denoising the audio input may be beneficial for certain aspects of the classification task, such as reducing the impact of noise or artefacts, but may not necessarily lead to improvements in all performance metrics. Further analysis and experimentation may be required to better understand the impact of denoising on the specific bird audio classification task and the trade-offs involved.

1

Tom Denton, Scott Wisdom, and John R. Hershey. Improving bird classification with unsupervised sound separation. 2021.