As speech recognition technology continues to advance in sophistication and computer processing power, more and more recognition technologies are being integrated into a variety of software platforms, enabling intelligent speech processing. We create a comprehensive processing platform for multilingual resources used in business and security fields based on speech recognition and distributed processing technology. Based on the federated learning model, this study develops speech recognition and its mathematical model for languages in South China. It also creates a speech dataset for dialects in South China, which at present includes three dialects of Mandarin and Cantonese, Chaoshan and Hakka that are widely spoken in the Guangdong region. Additionally, it uses two data enhancement techniques—audio enhancement and spectrogram enhancement—for speech signal characteristics in order to address the issue of unequal label distribution in the dataset. With a macro-average F-value of 91.54% and when compared to earlier work in the field, experimental results show that this structure is combined with hyperbolic tangent activation function and spatial domain attention to propose a dialect classification model based on hybrid domain attention.
Citation: Weiwei Lai, Yinglong Zheng. Speech recognition of south China languages based on federated learning and mathematical construction[J]. Electronic Research Archive, 2023, 31(8): 4985-5005. doi: 10.3934/era.2023255
As speech recognition technology continues to advance in sophistication and computer processing power, more and more recognition technologies are being integrated into a variety of software platforms, enabling intelligent speech processing. We create a comprehensive processing platform for multilingual resources used in business and security fields based on speech recognition and distributed processing technology. Based on the federated learning model, this study develops speech recognition and its mathematical model for languages in South China. It also creates a speech dataset for dialects in South China, which at present includes three dialects of Mandarin and Cantonese, Chaoshan and Hakka that are widely spoken in the Guangdong region. Additionally, it uses two data enhancement techniques—audio enhancement and spectrogram enhancement—for speech signal characteristics in order to address the issue of unequal label distribution in the dataset. With a macro-average F-value of 91.54% and when compared to earlier work in the field, experimental results show that this structure is combined with hyperbolic tangent activation function and spatial domain attention to propose a dialect classification model based on hybrid domain attention.
[1] | G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process Mag., 29 (2012), 82–97. https://doi.org/10.1109/MSP.2012.2205597 doi: 10.1109/MSP.2012.2205597 |
[2] | E. T. Affonso, R. D. Nunes, R. L. Rosa, G. F. Pivaro, D. Z. Rodriguez, Speech quality assessment in wireless voip communication using deep belief network, IEEE Access, 6 (2018), 77022–77032. https://doi.org/10.1109/ACCESS.2018.2871072 doi: 10.1109/ACCESS.2018.2871072 |
[3] | B. Alekhya, R. Sasikumar, An ensemble approach for healthcare application and diagnosis using natural language processing, Cognit. Neurodyn., 16 (2022), 1203–1220. https://doi.org/10.1007/s11571-021-09758-y doi: 10.1007/s11571-021-09758-y |
[4] | J. H. Hansen, T. Hasan, Speaker recognition by machines and humans: A tutorial review, IEEE Signal Process Mag., 32 (2015), 74–99. https://doi.org/10.1109/MSP.2015.2462851 doi: 10.1109/MSP.2015.2462851 |
[5] | D. Li, Z. Luo, B. Cao, Blockchain-based federated learning methodologies in smart environments, Cluster Comput., 25 (2022), 2585–2599. https://doi.org/10.1007/s10586-021-03424-y doi: 10.1007/s10586-021-03424-y |
[6] | T. Samad, J. S. Bay, D. Godbole, Network-centric systems for military operations in urban terrain: The role of UAVs, Proc. IEEE, 95 (2007), 92–107. https://doi.org/10.1109/JPROC.2006.887327 doi: 10.1109/JPROC.2006.887327 |
[7] | Y. Bai, Y. Zhao, Y. Shao, X. Zhang, X. Yuan, Deep learning in different remote sensing image categories and applications: status and prospects, Int. J. Remote Sens., 43 (2022), 1800–1847. https://doi.org/10.1080/01431161.2022.2048319 doi: 10.1080/01431161.2022.2048319 |
[8] | J. C. Zhou, J. M. Sun, W. S. Zhang, Z. F. Lin, Multi-view underwater image enhancement method via embedded fusion mechanism, Eng. Appl. Artif. Intell., 121 (2023), 105946. https://doi.org/10.1016/j.engappai.2023.105946 doi: 10.1016/j.engappai.2023.105946 |
[9] | A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, K. Shaalan, Speech recognition using deep neural networks: A systematic review, IEEE Access, 7 (2019), 19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880 doi: 10.1109/ACCESS.2019.2896880 |
[10] | M. Kubanek, J. Bobulski, J. Kulawik, A method of speech coding for speech recognition using a convolutional neural network, Symmetry, 11 (2019), 1185. https://doi.org/10.3390/sym11091185 doi: 10.3390/sym11091185 |
[11] | G. E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio Speech Lang. Process., 20 (2011), 30–42. https://doi.org/10.1109/TASL.2011.2134090 doi: 10.1109/TASL.2011.2134090 |
[12] | Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: concepts and applications, ACM Trans. Intell. Syst. Technol., 10 (2019), 1–19. https://doi.org/10.1145/3298981 doi: 10.1145/3298981 |
[13] | Y. Liu, Y. Kang, C. Xing, T. Chen, Q. Yang, A secure federated transfer learning framework, IEEE Intell. Syst., 35 (2020), 70–82. https://doi.org/10.1109/MIS.2020.2988525 doi: 10.1109/MIS.2020.2988525 |
[14] | C. Nadiger, A. Kumar, S. Abdelhak, Federated reinforcement learning for fast personalization, in 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 9 (2019), 123–127. https://doi.org/10.1109/AIKE.2019.00031 |
[15] | K. Cheng, T. Fan, Y. Jin, Y. Liu, T. Chen, D. Papadopoulos, et al., Secureboost: A lossless federated learning framework, IEEE Intell. Syst., 36 (2021), 87–98. https://doi.org/10.1109/MIS.2021.3082561 doi: 10.1109/MIS.2021.3082561 |
[16] | S. Zhang, L. Yao, A. Sun, A. Sun, Deep learning-based recommender systems: a survey and new perspectives, ACM Comput. Surv., 52 (2019), 1–38. https://doi.org/10.1145/3285029 doi: 10.1145/3285029 |
[17] | S. S. Khanal, P. W. C. Prasad, A. Alsadoon, A. Maag, A systematic review: machine learning based recommendation systems for e-learning, Educ. Inf. Technol., 25 (2020), 2635–2664. https://doi.org/10.1007/s10639-019-10063-9 doi: 10.1007/s10639-019-10063-9 |
[18] | Z. Batmaz, A. Yurekli, A. Bilge, C. Kaleli, A review on deep learning for recommender systems: challenges and remedies, Artif. Intell. Rev., 52 (2019), 1–37. https://doi.org/10.1007/s10462-018-9654-y doi: 10.1007/s10462-018-9654-y |
[19] | P. Bell, J. Fainberg, O. Klejch, J. Li, S. Renals, P. Swietojanski, Adaptation algorithms for neural network-based speech recognition: An overview, IEEE Open J. Signal Process., 2 (2021), 33–66. https://doi.org/10.1109/OJSP.2020.3045349 doi: 10.1109/OJSP.2020.3045349 |
[20] | J. C. Zhou, L. Pang, D. Zhang, W. S. Zhang, Underwater image enhancement method via multi-interval subhistogram perspective equalization, IEEE J. Oceanic Eng., 482023), 474–488. https://doi.org/10.1109/JOE.2022.3223733 doi: 10.1109/JOE.2022.3223733 |
[21] | M. T. Patrick, K. Raja, K. Miller, J. Sotzen, J. E. Gudjonsson, J. T. Elder, et al., Drug repurposing prediction for immune-mediated cutaneous diseases using a word-embedding–based machine learning approach, J. Invest. Dermatol., 139 (2019), 683–691. https://doi.org/10.1016/j.jid.2018.09.018 doi: 10.1016/j.jid.2018.09.018 |
[22] | L. Li, Y. Wang, K. Y. Lin, Preventive maintenance scheduling optimization based on opportunistic production-maintenance synchronization, J. Intell. Manuf., 32 (2021), 545–558. https://doi.org/10.1007/s10845-020-01588-9 doi: 10.1007/s10845-020-01588-9 |
[23] | S. Lloyd, C. Weedbrook, Quantum generative adversarial learning, Phys. Rev. Lett., 121 (2018), 040502. https://doi.org/10.1103/PhysRevLett.121.040502 doi: 10.1103/PhysRevLett.121.040502 |
[24] | H. Kim, J. Park, M. Bennis, S. L. Kim, Blockchained on-device federated learning, IEEE Commun. Lett., 24 (2019), 1279–1283. https://doi.org/10.1109/LCOMM.2019.2921755 doi: 10.1109/LCOMM.2019.2921755 |
[25] | T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, W. Shi, Federated learning of predictive models from federated electronic health records, Int. J. Med. Inf., 112 (2018), 59–67. https://doi.org/10.1016/j.ijmedinf.2018.01.007 doi: 10.1016/j.ijmedinf.2018.01.007 |
[26] | P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, et al., Advances and open problems in federated learning, Found. Trends Mach. Learn., 14 (2021), 1–210. https://doi.org/10.1561/2200000083 doi: 10.1561/2200000083 |
[27] | C. Zhang, M. Li, D. Wu, Federated multidomain learning with graph ensemble autoencoder GMM for emotion recognition, IEEE Trans. Intell. Transp. Syst., 24 (2023), 7631–7641. https://doi.org/10.1109/TITS.2022.3203800 doi: 10.1109/TITS.2022.3203800 |
[28] | J. Men, G. Xu, Z. Han, Z. Sun, X. Zhou, W. Lian, et al., Finding sands in the eyes: vulnerabilities discovery in IoT with EUFuzzer on human machine interface, IEEE Access, 7 (2019), 103751–103759. https://doi.org/10.1109/ACCESS.2019.2931061 doi: 10.1109/ACCESS.2019.2931061 |
[29] | S. Truex, L. Liu, M. E. Gursoy, L. Yu, W. Wei, Demystifying membership inference attacks in machine learning as a service, IEEE Trans. Serv. Comput., 14 (2019), 2073–2089. https://doi.org/10.1109/TSC.2019.2897554 doi: 10.1109/TSC.2019.2897554 |
[30] | M. Shen, H. Wang, B. Zhang, L. Zhu, K. Xu, Q. Li, et al., Exploiting unintended property leakage in blockchain-assisted federated learning for intelligent edge computing, IEEE Internet Things J., 8 (2020), 2265–2275. https://doi.org/10.1109/JIOT.2020.3028110 doi: 10.1109/JIOT.2020.3028110 |
[31] | S. W. Graham, R. G. Olmstead, Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms, Am. J. Bot., 87 (2000), 1712–1730. https://doi.org/10.2307/2656749 doi: 10.2307/2656749 |
[32] | J. R. Bolton, I. Mayor‐Smith, K. G. Linden, Rethinking the concepts of fluence (UV dose) and fluence rate: the importance of photon‐based units–a systemic review, Photochem. Photobiol., 91 (2015), 1252–1262. https://doi.org/10.1111/php.12512 doi: 10.1111/php.12512 |