The goal of this study is to present an automatic vocalization recognition system of giant pandas (GPs). Over 12800 vocal samples of GPs were recorded at Chengdu Research Base of Giant Panda Breeding (CRBGPB) and labeled by CRBGPB animal husbandry staff. These vocal samples were divided into 16 categories, each with 800 samples. A novel deep neural network (DNN) named 3Fbank-GRU was proposed to automatically give labels to GP's vocalizations. Unlike existing human vocalization recognition frameworks based on Mel filter bank (Fbank) which used low-frequency features of voice only, we extracted the high, medium and low frequency features by Fbank and two self-deduced filter banks, named Medium Mel Filter bank (MFbank) and Reversed Mel Filter bank (RFbank). The three frequency features were sent into the 3Fbank-GRU to train and test. By training models using datasets labeled by CRBGPB animal husbandry staff and subsequent testing of trained models on recognizing tasks, the proposed method achieved recognition accuracy over 95%, which means that the automatic system can be used to accurately label large data sets of GP vocalizations collected by camera traps or other recording methods.
Citation: Zhiwu Liao, Shaoxiang Hu, Rong Hou, Meiling Liu, Ping Xu, Zhihe Zhang, Peng Chen. Automatic recognition of giant panda vocalizations using wide spectrum features and deep neural network[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 15456-15475. doi: 10.3934/mbe.2023690
The goal of this study is to present an automatic vocalization recognition system of giant pandas (GPs). Over 12800 vocal samples of GPs were recorded at Chengdu Research Base of Giant Panda Breeding (CRBGPB) and labeled by CRBGPB animal husbandry staff. These vocal samples were divided into 16 categories, each with 800 samples. A novel deep neural network (DNN) named 3Fbank-GRU was proposed to automatically give labels to GP's vocalizations. Unlike existing human vocalization recognition frameworks based on Mel filter bank (Fbank) which used low-frequency features of voice only, we extracted the high, medium and low frequency features by Fbank and two self-deduced filter banks, named Medium Mel Filter bank (MFbank) and Reversed Mel Filter bank (RFbank). The three frequency features were sent into the 3Fbank-GRU to train and test. By training models using datasets labeled by CRBGPB animal husbandry staff and subsequent testing of trained models on recognizing tasks, the proposed method achieved recognition accuracy over 95%, which means that the automatic system can be used to accurately label large data sets of GP vocalizations collected by camera traps or other recording methods.
[1] | G. Peters, A note on the vocal behaviour of the giant panda, Ailuropoda melanoleuca (David, 1869), Z. Saeugetierkd., 47 (1982), 236–246. |
[2] | D. G. Kleiman, Ethology and reproduction of captive giant pandas (Ailuropoda melanoleuca), Z. Tierpsychol., 62 (1983), 1–46. |
[3] | G. B. Schaller, J. Hu, W. Pan, J. Zhu, The Giant Pandas of Wolong, University of Chicago Press in Chicago, 1985. |
[4] | B. Charlton, Z. H. Zhang, R. Snyder, The information content of giant panda, Ailuropoda melanoleuca, bleats: acoustic cues to sex, age and size, Anim. Behav., 78 (2009), 893–898. https://doi.org/10.1016/j.anbehav.2009.06.029 doi: 10.1016/j.anbehav.2009.06.029 |
[5] | B. Charlton, Y. Huang, R. Swaisgood, Vocal discrimination of potential mates by female giant pandas (Ailuropoda melanoleuca), Biol. Lett., 5 (2009), 597–599. https://doi.org/10.1098/rsbl.2009.0331 doi: 10.1098/rsbl.2009.0331 |
[6] | M. Xu, Z. P. Wang, D. Z. Liu, Cross-modal signaling in giant pandas, Chin. Sci. Bull., 57 (2012), 344–348. https://doi.org/10.1007/s11434-011-4843-y doi: 10.1007/s11434-011-4843-y |
[7] | A. S. Stoeger, A. Baotic, D. Li, B. D. Charlton, Acoustic features indicate arousal in infant giant panda vocalisations, Ethology, 118 (2012), 896–905. https://doi.org/10.1111/j.1439-0310.2012.02080.x doi: 10.1111/j.1439-0310.2012.02080.x |
[8] | B. Anton, A. S. Stoeger, D. S. Li, C. X. Tang, B. D. Charlton, The vocal repertoire of infant giant pandas (Ailuropoda melanoleuca), Bioacoustics, 23 (2014), 15–28, http://doi.org/10.1080/09524622.2013.798744 doi: 10.1080/09524622.2013.798744 |
[9] | B. D. Charlton, M. S. Martin-Wintle, M. A. Owen, H. Zhang, R. R. Swaisgood, Vocal behaviour predicts mating success in giant pandas, R. Soc. Open Sci., 10 (2018), 181323. https://doi.org/10.1098/rsos.181323 doi: 10.1098/rsos.181323 |
[10] | B. D. Charlton, M. A. Owen, X. Zhou, H. Zhang, R. R. Swaisgood, Influence of season and social context on male giant panda (Ailuropoda melanoleuca) vocal behaviour, PloS One, 14 (2019), e0225772. https://doi.org/10.1371/journal.pone.0225772 doi: 10.1371/journal.pone.0225772 |
[11] | K. F. Lee, H. W. Hon, R. Reddy, An overview of the SPHINX speech recognition system, IEEE Trans. Acoust. Speech Signal Process., 38 (1990), 35–45. http://doi.org/10.1109/29.45616 doi: 10.1109/29.45616 |
[12] | L. R. Bahl, P. F. Brown, P. V. D. Souza, R. L. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in ICASSP'86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 11 (1986), 49–52. http://doi.org/10.1109/ICASSP.1986.1169179 |
[13] | D. A. Reynolds, R. C. Rose, Robust text-independent identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3 (1995), 72–83. http://doi.org/10.1109/89.365379 doi: 10.1109/89.365379 |
[14] | W. B. Cavnar, J. M. Trenkle, N-gram-based text categorization, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, (1994), 14. http://doi.org/161175.10.1.1.21.3248 & rep = rep1 & type = pdf |
[15] | J. Colonna, T. Peet, C. A. Ferreira, A. M. Jorge, E. F. Gomes, J. Gama, Automatic classification of anuran sounds using convolutional neural networks, in Proceedings of the Ninth International c* Conference on Computer Science & Software Engineering, ACM, (2016), 73–78. http://doi.org/10.1145/2948992.2949016 |
[16] | H. Goëau, H. Glotin, W. P. Vellinga, R. Planqué, A. Joly, LifeCLEF bird identification task 2016: the arrival of deep learning, in CLEF: Conference and Labs of the Evaluation Forum, Évora, Portugal, (2016), 440–449. |
[17] | D. Stowell, Computational bioacoustics with deep learning: a review and roadmap, PeerJ, 10 (2021), e13152. http://doi.org/10.7717/peerj.13152 doi: 10.7717/peerj.13152 |
[18] | A. Graves, A. R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, (2013), 6645–6649. http://doi.org/10.1109/ICASSP.2013.6638947 |
[19] | F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Continual prediction with LSTM, Neural Comput., 12 (2000), 2451–2471. http://doi.org/10.1049/cp:19991218 doi: 10.1049/cp:19991218 |
[20] | F. A. Gers, N. N. Schraudolph, J. Schmidhuber, Learning precise timing with LSTM recurrent networks, J. Mach. Learn. Res., 3 (2002), 115–143. http://doi.org/10.1162/153244303768966139 doi: 10.1162/153244303768966139 |
[21] | J. Xie, S. Zhao, X. Li, D. Ni, J. Zhang, KD-CLDNN: Lightweight automatic recognition model based on bird vocalization, Appl. Acoust., 188 (2022), 108550. http://doi.org/10.1016/j.apacoust.2021.108550 doi: 10.1016/j.apacoust.2021.108550 |
[22] | C. Bergler, M. Schmitt, R. X. Cheng, H. Schröter, A. Maier, V. Barth, et al., Deep representation learning for orca call type classification, in Text, Speech, and Dialogue: 22nd International Conference, TSD 2019, Ljubljana, Slovenia, September 11–13, 2019, Proceedings 22, Springer, 11697 (2019), 274–286. http://doi.org/10.1007/978-3-030-27947-9_23 |
[23] | E. E. Waddell, J. H. Rasmussen, A. Širović, Applying artificial intelligence methods to detect and classify fish calls from the northern gulf of Mexico, J. Mar. Sci. Eng., 9 (2021), 1128. http://doi.org/10.3390/jmse9101128. doi: 10.3390/jmse9101128 |
[24] | J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555. |
[25] | W. Yan, M. Tang, Z. Chen, P. Chen, Q. Zhao, P. Que, et al., Automatically predicting giant panda mating success based on acoustic features, Global Ecol. Conserv., 24 (2020), e01301. https://doi.org/10.1016/j.gecco.2020.e01301 doi: 10.1016/j.gecco.2020.e01301 |