Automatic recognition of giant panda vocalizations using wide spectrum features and deep neural network

Zhiwu Liao; Shaoxiang Hu; Rong Hou; Meiling Liu; Ping Xu; Zhihe Zhang; Peng Chen; Zhiwu Liao; Shaoxiang Hu; Rong Hou; Meiling Liu; Ping Xu; Zhihe Zhang; Peng Chen

doi:10.3934/mbe.2023690

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 8: 15456-15475. doi: 10.3934/mbe.2023690

Previous Article Next Article

Research article Special Issues

Automatic recognition of giant panda vocalizations using wide spectrum features and deep neural network

1.
Key Laboratory of Land Resources Evaluation and Monitoring in Southwest China, Ministry of Education, Sichuan Normal University, Chengdu, China
2.
Academy of Global Governance and Area Studies, Sichuan Normal University, Chengdu, China
3.
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
4.
Chengdu Research Base of Giant Panda Breeding, Sichuan Key Laboratory of Conservation Biology for Endangered Wildlife, Chengdu 610081, China
5.
Giant Panda National Park Chengdu Administration, Chengdu 610096, China

Academic Editor: Yiu-ming Cheung

Received: 06 May 2023 Revised: 19 June 2023 Accepted: 04 July 2023 Published: 24 July 2023

The goal of this study is to present an automatic vocalization recognition system of giant pandas (GPs). Over 12800 vocal samples of GPs were recorded at Chengdu Research Base of Giant Panda Breeding (CRBGPB) and labeled by CRBGPB animal husbandry staff. These vocal samples were divided into 16 categories, each with 800 samples. A novel deep neural network (DNN) named 3Fbank-GRU was proposed to automatically give labels to GP's vocalizations. Unlike existing human vocalization recognition frameworks based on Mel filter bank (Fbank) which used low-frequency features of voice only, we extracted the high, medium and low frequency features by Fbank and two self-deduced filter banks, named Medium Mel Filter bank (MFbank) and Reversed Mel Filter bank (RFbank). The three frequency features were sent into the 3Fbank-GRU to train and test. By training models using datasets labeled by CRBGPB animal husbandry staff and subsequent testing of trained models on recognizing tasks, the proposed method achieved recognition accuracy over 95%, which means that the automatic system can be used to accurately label large data sets of GP vocalizations collected by camera traps or other recording methods.
- Mel filter bank (Fbank),
- medium Mel filter bank (MFbank),
- reversed Mel filter bank (RFbank),
- gated recurrent unit (GRU),
- 3Fbank-GRU,
- vocalization recognition,
- deep neural network (DNN)
Citation: Zhiwu Liao, Shaoxiang Hu, Rong Hou, Meiling Liu, Ping Xu, Zhihe Zhang, Peng Chen. Automatic recognition of giant panda vocalizations using wide spectrum features and deep neural network[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 15456-15475. doi: 10.3934/mbe.2023690

Related Papers:

Abstract

The goal of this study is to present an automatic vocalization recognition system of giant pandas (GPs). Over 12800 vocal samples of GPs were recorded at Chengdu Research Base of Giant Panda Breeding (CRBGPB) and labeled by CRBGPB animal husbandry staff. These vocal samples were divided into 16 categories, each with 800 samples. A novel deep neural network (DNN) named 3Fbank-GRU was proposed to automatically give labels to GP's vocalizations. Unlike existing human vocalization recognition frameworks based on Mel filter bank (Fbank) which used low-frequency features of voice only, we extracted the high, medium and low frequency features by Fbank and two self-deduced filter banks, named Medium Mel Filter bank (MFbank) and Reversed Mel Filter bank (RFbank). The three frequency features were sent into the 3Fbank-GRU to train and test. By training models using datasets labeled by CRBGPB animal husbandry staff and subsequent testing of trained models on recognizing tasks, the proposed method achieved recognition accuracy over 95%, which means that the automatic system can be used to accurately label large data sets of GP vocalizations collected by camera traps or other recording methods.

References

[1]	G. Peters, A note on the vocal behaviour of the giant panda, Ailuropoda melanoleuca (David, 1869), Z. Saeugetierkd., 47 (1982), 236–246.
[2]	D. G. Kleiman, Ethology and reproduction of captive giant pandas (Ailuropoda melanoleuca), Z. Tierpsychol., 62 (1983), 1–46.
[3]	G. B. Schaller, J. Hu, W. Pan, J. Zhu, The Giant Pandas of Wolong, University of Chicago Press in Chicago, 1985.
[4]	B. Charlton, Z. H. Zhang, R. Snyder, The information content of giant panda, Ailuropoda melanoleuca, bleats: acoustic cues to sex, age and size, Anim. Behav., 78 (2009), 893–898. https://doi.org/10.1016/j.anbehav.2009.06.029 doi: 10.1016/j.anbehav.2009.06.029
[5]	B. Charlton, Y. Huang, R. Swaisgood, Vocal discrimination of potential mates by female giant pandas (Ailuropoda melanoleuca), Biol. Lett., 5 (2009), 597–599. https://doi.org/10.1098/rsbl.2009.0331 doi: 10.1098/rsbl.2009.0331
[6]	M. Xu, Z. P. Wang, D. Z. Liu, Cross-modal signaling in giant pandas, Chin. Sci. Bull., 57 (2012), 344–348. https://doi.org/10.1007/s11434-011-4843-y doi: 10.1007/s11434-011-4843-y
[7]	A. S. Stoeger, A. Baotic, D. Li, B. D. Charlton, Acoustic features indicate arousal in infant giant panda vocalisations, Ethology, 118 (2012), 896–905. https://doi.org/10.1111/j.1439-0310.2012.02080.x doi: 10.1111/j.1439-0310.2012.02080.x
[8]	B. Anton, A. S. Stoeger, D. S. Li, C. X. Tang, B. D. Charlton, The vocal repertoire of infant giant pandas (Ailuropoda melanoleuca), Bioacoustics, 23 (2014), 15–28, http://doi.org/10.1080/09524622.2013.798744 doi: 10.1080/09524622.2013.798744
[9]	B. D. Charlton, M. S. Martin-Wintle, M. A. Owen, H. Zhang, R. R. Swaisgood, Vocal behaviour predicts mating success in giant pandas, R. Soc. Open Sci., 10 (2018), 181323. https://doi.org/10.1098/rsos.181323 doi: 10.1098/rsos.181323
[10]	B. D. Charlton, M. A. Owen, X. Zhou, H. Zhang, R. R. Swaisgood, Influence of season and social context on male giant panda (Ailuropoda melanoleuca) vocal behaviour, PloS One, 14 (2019), e0225772. https://doi.org/10.1371/journal.pone.0225772 doi: 10.1371/journal.pone.0225772
[11]	K. F. Lee, H. W. Hon, R. Reddy, An overview of the SPHINX speech recognition system, IEEE Trans. Acoust. Speech Signal Process., 38 (1990), 35–45. http://doi.org/10.1109/29.45616 doi: 10.1109/29.45616
[12]	L. R. Bahl, P. F. Brown, P. V. D. Souza, R. L. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in ICASSP'86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 11 (1986), 49–52. http://doi.org/10.1109/ICASSP.1986.1169179
[13]	D. A. Reynolds, R. C. Rose, Robust text-independent identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3 (1995), 72–83. http://doi.org/10.1109/89.365379 doi: 10.1109/89.365379
[14]	W. B. Cavnar, J. M. Trenkle, N-gram-based text categorization, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, (1994), 14. http://doi.org/161175.10.1.1.21.3248 & rep = rep1 & type = pdf
[15]	J. Colonna, T. Peet, C. A. Ferreira, A. M. Jorge, E. F. Gomes, J. Gama, Automatic classification of anuran sounds using convolutional neural networks, in Proceedings of the Ninth International c Conference on Computer Science & Software Engineering*, ACM, (2016), 73–78. http://doi.org/10.1145/2948992.2949016
[16]	H. Goëau, H. Glotin, W. P. Vellinga, R. Planqué, A. Joly, LifeCLEF bird identification task 2016: the arrival of deep learning, in CLEF: Conference and Labs of the Evaluation Forum, Évora, Portugal, (2016), 440–449.
[17]	D. Stowell, Computational bioacoustics with deep learning: a review and roadmap, PeerJ, 10 (2021), e13152. http://doi.org/10.7717/peerj.13152 doi: 10.7717/peerj.13152
[18]	A. Graves, A. R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, (2013), 6645–6649. http://doi.org/10.1109/ICASSP.2013.6638947
[19]	F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Continual prediction with LSTM, Neural Comput., 12 (2000), 2451–2471. http://doi.org/10.1049/cp:19991218 doi: 10.1049/cp:19991218
[20]	F. A. Gers, N. N. Schraudolph, J. Schmidhuber, Learning precise timing with LSTM recurrent networks, J. Mach. Learn. Res., 3 (2002), 115–143. http://doi.org/10.1162/153244303768966139 doi: 10.1162/153244303768966139
[21]	J. Xie, S. Zhao, X. Li, D. Ni, J. Zhang, KD-CLDNN: Lightweight automatic recognition model based on bird vocalization, Appl. Acoust., 188 (2022), 108550. http://doi.org/10.1016/j.apacoust.2021.108550 doi: 10.1016/j.apacoust.2021.108550
[22]	C. Bergler, M. Schmitt, R. X. Cheng, H. Schröter, A. Maier, V. Barth, et al., Deep representation learning for orca call type classification, in Text, Speech, and Dialogue: 22nd International Conference, TSD 2019, Ljubljana, Slovenia, September 11–13, 2019, Proceedings 22, Springer, 11697 (2019), 274–286. http://doi.org/10.1007/978-3-030-27947-9_23
[23]	E. E. Waddell, J. H. Rasmussen, A. Širović, Applying artificial intelligence methods to detect and classify fish calls from the northern gulf of Mexico, J. Mar. Sci. Eng., 9 (2021), 1128. http://doi.org/10.3390/jmse9101128. doi: 10.3390/jmse9101128
[24]	J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555.
[25]	W. Yan, M. Tang, Z. Chen, P. Chen, Q. Zhao, P. Que, et al., Automatically predicting giant panda mating success based on acoustic features, Global Ecol. Conserv., 24 (2020), e01301. https://doi.org/10.1016/j.gecco.2020.e01301 doi: 10.1016/j.gecco.2020.e01301

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)