Classification of dementia from spoken speech using feature selection and the bag of acoustic words model

Marko Niemelä; Mikaela von Bonsdorff; Sami Äyrämö; Tommi Kärkkäinen; Marko Niemelä; Mikaela von Bonsdorff; Sami Äyrämö; Tommi Kärkkäinen

doi:10.3934/aci.2024004

Applied Computing and Intelligence

2024, Volume 4, Issue 1: 45-65. doi: 10.3934/aci.2024004

Previous Article Next Article

Research article

Classification of dementia from spoken speech using feature selection and the bag of acoustic words model

1.
Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland
2.
Faculty of Sport and Health Sciences and Gerontology Research Center, University of Jyväskylä, Jyväskylä, Finland
3.
Public Health Programme, Folkhälsan Research Center, Helsinki, Finland
4.
Wellbeing Services County of Central Finland, Finland

Academic Editor: Pasi Fränti

Received: 20 June 2024 Revised: 19 July 2024 Accepted: 22 July 2024 Published: 29 July 2024

Memory disorders and dementia are a central factor in the decline of functioning and daily activities in older individuals. The workload related to standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken speech. This study presented a bag of acoustic words approach for distinguishing dementia patients from control individuals based on audio speech recordings. In this approach, each individual's speech was segmented into voiced periods, and these segments were characterized by acoustic features using the open-source openSMILE library. Word histogram representations were formed from the characterized speech segments of each speaker, which were used for classifying subjects. The formation of word histograms involved a clustering phase where feature vectors were quantized. It is well-known that partitional clustering involves instability in clustering results due to the selection of starting points, which can cause variability in classification outcomes. This study aimed to address instability by utilizing robust K-spatial-medians clustering, efficient K-means$ ++ $ clustering initialization, and selecting the smallest clustering error from repeated clusterings. Additionally, the study employed feature selection based on the Wilcoxon signed-rank test to achieve computational efficiency in the methods. The results showed that it is possible to achieve a consistent 75% classification accuracy using only twenty-five features, both with the external ADReSS 2020 test data and through leave-one-subject-out cross-validation of the entire dataset. The results rank at the top compared to international research, where the same dataset and only acoustic features have been used to diagnose patients.
- Alzheimer,
- classification,
- spontaneous speech,
- acoustic features,
- bag of acoustic words
Citation: Marko Niemelä, Mikaela von Bonsdorff, Sami Äyrämö, Tommi Kärkkäinen. Classification of dementia from spoken speech using feature selection and the bag of acoustic words model[J]. Applied Computing and Intelligence, 2024, 4(1): 45-65. doi: 10.3934/aci.2024004

Related Papers:

Abstract

Memory disorders and dementia are a central factor in the decline of functioning and daily activities in older individuals. The workload related to standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken speech. This study presented a bag of acoustic words approach for distinguishing dementia patients from control individuals based on audio speech recordings. In this approach, each individual's speech was segmented into voiced periods, and these segments were characterized by acoustic features using the open-source openSMILE library. Word histogram representations were formed from the characterized speech segments of each speaker, which were used for classifying subjects. The formation of word histograms involved a clustering phase where feature vectors were quantized. It is well-known that partitional clustering involves instability in clustering results due to the selection of starting points, which can cause variability in classification outcomes. This study aimed to address instability by utilizing robust K-spatial-medians clustering, efficient K-means$ ++ $ clustering initialization, and selecting the smallest clustering error from repeated clusterings. Additionally, the study employed feature selection based on the Wilcoxon signed-rank test to achieve computational efficiency in the methods. The results showed that it is possible to achieve a consistent 75% classification accuracy using only twenty-five features, both with the external ADReSS 2020 test data and through leave-one-subject-out cross-validation of the entire dataset. The results rank at the top compared to international research, where the same dataset and only acoustic features have been used to diagnose patients.

References

[1]	M. W. Bondi, D. P. Salmon, A. W. Kaszniak, The neuropsychology of dementia, In: Neuropsychological assessment of neuropsychiatric and neuromedical disorders, Oxford: Oxford University Press, 2009,159–198.
[2]	World Health Organization, Global action plan on the public health response to dementia 2017–2025, World Health Organization, 2017.
[3]	R. N. Kalaria, G. E. Maestre, R. Arizaga, R. P. Friedland, D. Galasko, K. Hall, et al., Alzheimer's disease and vascular dementia in developing countries: prevalence, management, and risk factors, Lancet Neurol., 7 (2008), 812–826. http://dx.doi.org/10.1016/S1474-4422(08)70169-8 doi: 10.1016/S1474-4422(08)70169-8
[4]	T. Ngandu, J. Lehtisalo, A. Solomon, E. Levälahti, S. Ahtiluoto, R. Antikainen, et al., A 2 year multidomain intervention of diet, exercise, cognitive training, and vascular risk monitoring versus control to prevent cognitive decline in at-risk elderly people (FINGER): a randomised controlled trial, Lancet Neurol., 385 (2015), 2255–2263. http://dx.doi.org/10.1016/S0140-6736(15)60461-5 doi: 10.1016/S0140-6736(15)60461-5
[5]	M. F. Folstein, S. E. Folstein, P. R. McHugh, "Mini-mental state": a practical method for grading the cognitive state of patients for the clinician, J. Psychiat. Res., 12 (1975), 189–198.
[6]	Z. S. Nasreddine, N. A. Phillips, V. Bédirian, S. Charbonneau, V. Whitehead, I. Collin, et al., The montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment, J. Am. Geriatr. Soc., 53 (2005), 695–699. http://dx.doi.org/10.1111/j.1532-5415.2005.53221.x doi: 10.1111/j.1532-5415.2005.53221.x
[7]	A. Heyman, G. Fillenbaum, F. Nash, Consortium to establish a registry for Alzheimer's disease: the CERAD experience, Neurology, 49 (1997), 1–26.
[8]	A. Konig, A. Satt, A. Sorin, R. Hoory, A. Derreumaux, R. David, et al., Use of speech analyses within a mobile application for the assessment of cognitive impairment in elderly people, Curr. Alzheimer Res., 15 (2018), 120–129. http://dx.doi.org/10.2174/1567205014666170829111942 doi: 10.2174/1567205014666170829111942
[9]	A. Roshanzamir, H. Aghajan, S. M. Soleymani, Transformer-based deep neural network language models for Alzheimer's disease risk assessment from targeted speech, BMC Med. Inform. Decis. Mak., 21 (2021), 92. http://dx.doi.org/10.1186/s12911-021-01456-3 doi: 10.1186/s12911-021-01456-3
[10]	C. Guo, G. Pleiss, Y. Sun, K. Q. Weinberger, On calibration of modern neural networks, Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 1321–1330.
[11]	S. de la Fuente Garcia, C. W. Ritchie, S. Luz, Artificial intelligence, speech, and language processing approaches to monitoring Alzheimer's disease: a systematic review, Journal of Alzheimer's Disease, 78 (2020), 1547–1574. http://dx.doi.org/10.3233/JAD-200888 doi: 10.3233/JAD-200888
[12]	M. F. McTear, Z. Callejas, D. Griol, The conversational interface: talking to smart devices, Cham: Springer, 2016. http://dx.doi.org/10.1007/978-3-319-32967-3
[13]	G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, ECCV, 1 (2004), 1–16.
[14]	M. Schmitt, F. Ringeval, B. Schuller, At the border of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech, Proceedings of Interspeech, 2016,495–499. http://dx.doi.org/10.21437/Interspeech.2016-1124
[15]	L. Hernández-Domínguez, S. Ratté, G. Sierra-Martínez, A. Roche-Bergua, Computer-based evaluation of Alzheimer's disease and mild cognitive impairment patients during a picture description task, Alzh. Dement.-DADM, 10 (2018), 260–268. http://dx.doi.org/10.1016/j.dadm.2018.02.004 doi: 10.1016/j.dadm.2018.02.004
[16]	S. Luz, Longitudinal monitoring and detection of Alzheimer's type dementia from spontaneous speech data, Proceedings of IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), 2017, 45–46. http://dx.doi.org/10.1109/CBMS.2017.41
[17]	K. Lopez-de Ipiña, J. B. Alonso, J. Solé-Casals, N. Barroso, P. Henriquez, M. Faundez-Zanuy, et al., On automatic diagnosis of Alzheimer's disease based on spontaneous speech analysis and emotional temperature, Cogn. Comput., 7 (2015), 44–55. http://dx.doi.org/10.1007/s12559-013-9229-9 doi: 10.1007/s12559-013-9229-9
[18]	F. Haider, S. De La Fuente, S. Luz, An assessment of paralinguistic acoustic features for detection of Alzheimer's dementia in spontaneous speech, IEEE J.-STSP, 14 (2020), 272–281. http://dx.doi.org/10.1109/JSTSP.2019.2955022 doi: 10.1109/JSTSP.2019.2955022
[19]	S. Luz, F. Haider, S. de la Fuente Garcia, D. Fromm, B. Macwhinney, Alzheimer's dementia recognition through spontaneous speech: the ADReSS challenge, Proceedings of Interspeech, 2020, 2172–2176. http://dx.doi.org/10.21437/Interspeech.2020-2571
[20]	F. Eyben, F. Weninger, F. Gross, B. Schuller, Recent developments in openSMILE, the munich open-source multimedia feature extractor, Proceedings of the 21st ACM International Conference on Multimedia, 2013,835–838. http://dx.doi.org/10.1145/2502081.2502224
[21]	F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, et al., The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing, IEEE T. Affect. Comput., 7 (2016), 190–202. http://dx.doi.org/10.1109/TAFFC.2015.2457417 doi: 10.1109/TAFFC.2015.2457417
[22]	F. Eyben, M. Wöllmer, B. Schuller, OpenSMILE: the munich versatile and fast open-source audio feature extractor, Proceedings of the 18th ACM International Conference on Multimedia, 2010, 1459–1462. http://dx.doi.org/10.1145/1873951.1874246
[23]	M. S. S. Syed, Z. S. Syed, M. Lech, E. Pirogova, Automated screening for Alzheimer's dementia through spontaneous speech, Proceedings of Interspeech, 2020, 2222–2226. http://dx.doi.org/10.21437/Interspeech.2020-3158
[24]	M. Schmitt, B. Schuller, OpenXBOW–Introducing the passau open-source crossmodal bag-of-words toolkit, J. Mach. Learn. Res., 18 (2017), 1–5.
[25]	M. E. Celebi, H. A. Kingravi, P. A. Vela, A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Syst. Appl., 40 (2013), 200–210. http://dx.doi.org/10.1016/j.eswa.2012.07.021 doi: 10.1016/j.eswa.2012.07.021
[26]	J. Hämäläinen, S. Jauhiainen, T. Kärkkäinen, Comparison of internal clustering validation indices for prototype-based clustering, Algorithms, 10 (2017), 105. http://dx.doi.org/10.3390/a10030105 doi: 10.3390/a10030105
[27]	M. Niemelä, T. Kärkkäinen, Improving clustering and cluster validation with missing data using distance estimation methods, In: Computational sciences and artificial intelligence in industry, Cham: Springer, 2022,123–133. http://dx.doi.org/10.1007/978-3-030-70787-3_9
[28]	J. T. Becker, F. Boiler, O. L. Lopez, J. Saxton, K. L. McGonigle, The natural history of Alzheimer's disease: description of study cohort and accuracy of diagnosis, Arch. Neurol., 51 (1994), 585–594. http://dx.doi.org/10.1001/archneur.1994.00540180063015 doi: 10.1001/archneur.1994.00540180063015
[29]	K. Hechmi, T. N. Trong, V. Hautamäki, T. Kinnunen, Voxceleb enrichment for age and gender recognition, Proceedings of 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021,687–693. http://dx.doi.org/10.1109/ASRU51503.2021.9688085
[30]	European Broadcasting Union, Loudness normalisation and permitted maximum level of audio signals, EBU Recommendation, 2023.
[31]	L. Breiman, Random forests, Mach. Learn., 45 (2001), 5–32. http://dx.doi.org/10.1023/A: 1010933404324
[32]	I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res., 3 (2003), 1157–1182.
[33]	A. K. Jain, Data clustering: 50 years beyond k-means, Pattern Recogn. Lett., 31 (2010), 651–666. http://dx.doi.org/10.1016/j.patrec.2009.09.011 doi: 10.1016/j.patrec.2009.09.011
[34]	S. Äyrämö, Knowledge mining using robust clustering, Jyväskylä: University of Jyväskylä Printing, 2006.
[35]	S. Äyrämö, T. Kärkkäinen, K. Majava, Robust refinement of initial prototypes for partitioning-based clustering algorithms, In: Recent advances in stochastic modeling and data analysis, Chania: World Scientific, 2007,473–482. http://dx.doi.org/10.1142/9789812709691_0056
[36]	D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, 2007, 1027–1035.
[37]	T. Kärkkäinen, S. Äyrämö, On computation of spatial median for robust data mining, Peoceedings of Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, 2005, 1–14.
[38]	M. Niemelä, S. Äyrämö, T. Kärkkäinen, Toolbox for distance estimation and cluster validation on data with missing values, IEEE Access, 10 (2022), 352–367. http://dx.doi.org/10.1109/ACCESS.2021.3136435 doi: 10.1109/ACCESS.2021.3136435
[39]	T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE T. Informa. Theory, 13 (1967), 21–27. http://dx.doi.org/10.1109/TIT.1967.1053964 doi: 10.1109/TIT.1967.1053964
[40]	Y. Guo, T. Hastie, R. Tibshirani, Regularized linear discriminant analysis and its application in microarrays, Biostatistics, 8 (2007), 86–100. http://dx.doi.org/10.1093/biostatistics/kxj035 doi: 10.1093/biostatistics/kxj035
[41]	T. Kärkkäinen, Extreme minimal learning machine: Ridge regression with distance-based basis, Neurocomputing, 342 (2019), 33–48. http://dx.doi.org/10.1016/j.neucom.2018.12.078 doi: 10.1016/j.neucom.2018.12.078
[42]	N. Cristianini, J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods, Cambridge: Cambridge university press, 2000. http://dx.doi.org/10.1017/CBO9780511801389
[43]	J. Zhang, M. Marszałek, S. Lazebnik, C. Schmid, Local features and kernels for classification of texture and object categories: a comprehensive study, Int. J. Comput. Vision, 73 (2007), 213–238. http://dx.doi.org/10.1007/s11263-006-9794-4 doi: 10.1007/s11263-006-9794-4
[44]	F. Wilcoxon, Individual comparisons by ranking methods, In: Breakthroughs in statistics, New York: Springer, 1992,196–202. http://dx.doi.org/10.1007/978-1-4612-4380-9_16
[45]	F. Haider, S. Pollak, P. Albert, S. Luz, Emotion recognition in low-resource settings: an evaluation of automatic feature selection methods, Comput. Speech Lang., 65 (2021), 101119. http://dx.doi.org/10.1016/j.csl.2020.101119 doi: 10.1016/j.csl.2020.101119
[46]	P. Fränti, Efficiency of random swap clustering, J. Big Data, 5 (2018), 13. http://dx.doi.org/10.1186/s40537-018-0122-y doi: 10.1186/s40537-018-0122-y
[47]	T. F. Yap, J. Epps, E. Ambikairajah, E. H. C. Choi, Formant frequencies under cognitive load: effects and classification, EURASIP J. Adv. Signal Process., 2021 (2011), 219253. http://dx.doi.org/10.1155/2011/219253 doi: 10.1155/2011/219253
[48]	T. F. Yap, J. Epps, E. Ambikairajah, E. H. C. Choi, Voice source features for cognitive load classification, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, 5700–5703. http://dx.doi.org/10.1109/ICASSP.2011.5947654
[49]	S. B. Scott, J. E. Graham-Engeland, C. G. Engeland, J. M. Smyth, D. M. Almeida, M. J. Katz, et al., The effects of stress on cognitive aging, physiology and emotion (ESCAPE) project, BMC Psychiatry, 15 (2015), 146. http://dx.doi.org/10.1186/s12888-015-0497-7 doi: 10.1186/s12888-015-0497-7
[50]	D. V. L. Sidtis, W. Hanson, C. Jackson, A. Lanto, D. Kempler, E. J. Metter, Fundamental frequency (f0) measures comparing speech tasks in aphasia and Parkinson disease, J. Med. Speech-Lang. Pa., 12 (2004), 207–213.
[51]	M. Little, P. McSharry, E. Hunter, J. Spielman, L. Ramig, Suitability of dysphonia measurements for telemonitoring of Parkinson's disease, Nat. Prec., 2008, 1–27. http://dx.doi.org/10.1038/npre.2008.2298.1
[52]	R. Alshammri, G. Alharbi, E. Alharbi, I. Almubark, Machine learning approaches to identify Parkinson's disease using voice signal features, Front. Artif. Intell., 6 (2023), 1084001. http://dx.doi.org/10.3389/frai.2023.1084001 doi: 10.3389/frai.2023.1084001
[53]	D. Nickson, C. Meyer, L. Walasek, C. Toro, Prediction and diagnosis of depression using machine learning with electronic health records data: a systematic review, BMC Med. Inform. Decis. Mak., 23 (2023), 271. http://dx.doi.org/10.1186/s12911-023-02341-x doi: 10.1186/s12911-023-02341-x

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)