Disease prediction based on multi-type data fusion from Chinese electronic health record

Zhaoyu Liang; Zhichang Zhang; Haoyuan Chen; Ziqin Zhang; Zhaoyu Liang; Zhichang Zhang; Haoyuan Chen; Ziqin Zhang

doi:10.3934/mbe.2022640

Mathematical Biosciences and Engineering

2022, Volume 19, Issue 12: 13732-13746. doi: 10.3934/mbe.2022640

Previous Article Next Article

Research article Special Issues

Disease prediction based on multi-type data fusion from Chinese electronic health record

College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China

Academic Editor: Vladimir Mityushev

Received: 31 May 2022 Revised: 06 August 2022 Accepted: 21 August 2022 Published: 19 September 2022

Disease prediction by using a variety of healthcare data to assist doctors in disease diagnosis is becoming a more and more important research topic recently. This paper proposes a disease prediction model that fuses multiple types of encoded representations of Chinese electronic health records (EHRs). The model framework utilizes a multi-head self-attention mechanism, which combines textual and numerical features to enhance text representations. The BiLSTM-CRF and TextCNN models are used, respectively, to extract entities and then obtain the embedding representations of them. The representations of text and entities in it are combined together for formulating representations of EHRs. The experimental results on EHRs data collected from a Three Grade Class B Hospital General in Gansu Province, China, show that our model achieved an F1 score of 91.92$ \% $, which outperforms the previous baseline methods.
- Chinese electronic health record,
- disease prediction,
- BERT,
- TextCNN,
- multi-type data
Citation: Zhaoyu Liang, Zhichang Zhang, Haoyuan Chen, Ziqin Zhang. Disease prediction based on multi-type data fusion from Chinese electronic health record[J]. Mathematical Biosciences and Engineering, 2022, 19(12): 13732-13746. doi: 10.3934/mbe.2022640

Related Papers:

Abstract

Disease prediction by using a variety of healthcare data to assist doctors in disease diagnosis is becoming a more and more important research topic recently. This paper proposes a disease prediction model that fuses multiple types of encoded representations of Chinese electronic health records (EHRs). The model framework utilizes a multi-head self-attention mechanism, which combines textual and numerical features to enhance text representations. The BiLSTM-CRF and TextCNN models are used, respectively, to extract entities and then obtain the embedding representations of them. The representations of text and entities in it are combined together for formulating representations of EHRs. The experimental results on EHRs data collected from a Three Grade Class B Hospital General in Gansu Province, China, show that our model achieved an F1 score of 91.92$ \% $, which outperforms the previous baseline methods.

References

[1]	G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, et al., A survey on deep learning in medical image analysis, Med. Image Anal., 42 (2017), 60–88. https://doi.org/10.1016/j.media.2017.07.005 doi: 10.1016/j.media.2017.07.005
[2]	J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun, J. Eisenstein, Explainable prediction of medical codes from clinical text, preprint, arXiv: 1802.05695.
[3]	L. Chen, X. Li, J. Han, MedRank: discovering influential medical treatments from literature by information network analysis, in Proceedings of the Twenty-Fourth Australasian Database Conference, 137 (2013), 3–12.
[4]	W. Farhan, Z. Wang, Y. Huang, S. Wang, F. Wang, X. Jiang, A predictive model for medical events based on contextual embedding of temporal sequences, JMIR Med. Inf., 4 (2016), e5977. https://medinform.jmir.org/2016/4/e39
[5]	W. Yu, T. Liu, R. Valdez, M. Gwinn, M. J. Khoury, Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes, BMC Med. Inf. Decis. Making, 10 (2010), 1–7. https://doi.org/10.1186/1472-6947-10-16 doi: 10.1186/1472-6947-10-16
[6]	M. Khalilia, S. Chakraborty, M. Popescu, Predicting disease risks from highly imbalanced data using random forest, BMC Med. Inf. Decis. Making, 11 (2011), 1–13. https://doi.org/10.1186/1472-6947-11-51 doi: 10.1186/1472-6947-11-51
[7]	Z. Liang, J. Liu, A. Ou, H. Zhang, Z. Li, J. X. Huang, Deep generative learning for automated EHR diagnosis of traditional Chinese medicine, Comput. Methods Programs Biomed., 174 (2019), 17–23. https://doi.org/10.1016/j.cmpb.2018.05.008 doi: 10.1016/j.cmpb.2018.05.008
[8]	B. Jin, C. Che, Z. Liu, S. Zhang, X. Yin, X. Wei, Predicting the risk of heart failure with EHR sequential data modeling, IEEE Access, 6 (2018), 9256–9261. https://ieeexplore.ieee.org/abstract/document/8245772
[9]	Z. Zhu, C. Yin, B. Qian, Y. Cheng, J. Wei, F. Wang, Measuring patient similarities via a deep architecture with medical concept embedding, in 2016 IEEE 16th International Conference on Data Mining (ICDM), (2016), 749–758. https://ieeexplore.ieee.org/abstract/document/7837899
[10]	J. W. Ha, A. Kim, D. Kim, J. Kim, J. W. Kim, J. J. Park, et al., Predicting high-risk prognosis from diagnostic histories of adult disease patients via deep recurrent neural networks, in 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), (2017), 394–399. https://ieeexplore.ieee.org/abstract/document/7881742
[11]	J. Pestian, C. Brew, P. Matykiewicz, D. J. Hovermale, N. Johnson, K. B. Cohen, et al., A shared task involving multi-label classification of clinical free text, Biol., Transl., Clin. Lang. Process., 2007 (2007), 97–104.
[12]	S. Palaniappan, R. Awang, Intelligent heart disease prediction system using data mining techniques, in IEEE/ACS International Conference on Computer Systems and Applications, (2008), 108–115. https://ieeexplore.ieee.org/abstract/document/4493524
[13]	N. Ananthakrishnan, T. Cai, G. Savova, S. C. Cheng, P. Chen, R. G. Perez, et al., Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach, Inflammatory Bowel Dis., 19 (2013), 1441–1420. https://ieeexplore.ieee.org/abstract/document/4493524
[14]	S. Dreiseitl, L. Ohno-Machado, Logistic regression and artificial neural network classification models: a methodology review, J. Biomed. Inf., 35 (2002), 352–359. https://doi.org/10.1016/S1532-0464(03)00034-0 doi: 10.1016/S1532-0464(03)00034-0
[15]	Z. Yang, Y. Huang, Y. Jiang, Y. Sun, Y. J. Zhang, P. Luo, Clinical assistant diagnosis for electronic medical record based on convolutional neural network, Sci. Rep., 8 (2018), 1–9. https://doi.org/10.1038/s41598-018-24389-w doi: 10.1038/s41598-018-24389-w
[16]	Y. An, K. Tang, J. Wang, Time-aware multi-type data fusion representation learning framework for risk prediction of cardiovascular diseases, in IEEE/ACM Transactions on Computational Biology and Bioinformatics 2021, 2021. https://ieeexplore.ieee.org/abstract/document/9563246
[17]	T. Wang, P. Xuan, Z. Liu, T. Zhang, Assistant diagnosis with Chinese electronic medical records based on CNN and BiLSTM with phrase-level and word-level attentions, BMC Bioinf., 21 (2020), 1–16. https://doi.org/10.1186/s12859-020-03554-x doi: 10.1186/s12859-020-03554-x
[18]	Y. Du, H. Wang, W. Cui, H. Zhu, Y. Guo, F. A. Dharejo, et al., Foodborne disease risk prediction using multigraph structural long short-term memory networks: Algorithm design and validation study, JMIR Med. Inf., 9 (2021), e29433. https://doi.org/10.2196/29433 doi: 10.2196/29433
[19]	L. Rasmy, M. Nigo, B. S. Kannadath, Z. Xie, B. Mao, K. Patel, et al., Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data, Lancet Digital Health, 4 (2022), E415–E425. https://doi.org/10.1016/S2589-7500(22)00049-8 doi: 10.1016/S2589-7500(22)00049-8
[20]	Y. Sha, M. D. Wang, Interpretable predictions of clinical outcomes with an attention-based recurrent neural network, in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics 2017, (2017), 233–240. https://doi.org/10.1145/3107411.3107445
[21]	M. E. Peters, M. Neumann, M. lyyer, M. Gardner, C. Clark, K. Lee, et al., Deep contextualized word representations, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1 (2018), 2227–2237. https://doi.org/10.18653/v1/N18-1202
[22]	A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by generative pre-training, OpenAI, 2018.
[23]	J. Devlin, M. W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, preprint, 2018, arXiv: 1810.04805.
[24]	K. Zhang, C. Liu, X. Duan, L. Zhou, Y. Zhao, H. Zan, Bert with enhanced layer for assistant diagnosis based on Chinese obstetric EMRs, in 2019 International Conference on Asian Language Processing (IALP), (2019), 384–389. https://ieeexplore.ieee.org/abstract/document/9037721
[25]	J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, et al., BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, 36 (2020), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682 doi: 10.1093/bioinformatics/btz682
[26]	C. Mugisha, I. Paik, Pneumonia outcome prediction using structured and unstructured data from EHR, in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2020), 2640–2646. https://ieeexplore.ieee.org/abstract/document/9312987
[27]	T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, 2020, preprint, arXiv: 1301.3781 2013.
[28]	A. Stubbs, Ö. Uzuner, Annotating risk factors for heart disease in clinical narratives for diabetic patients, J. Biomed. Inf., 58 (2015), S78–S91. https://doi.org/10.1016/j.jbi.2015.05.009 doi: 10.1016/j.jbi.2015.05.009
[29]	Z. Zhang, L. Zhu, P. Yu, Multi-level representation learning for Chinese medical entity recognition: Model development and validation, JMIR Med. Inf., 8 (2020), e17637. https://doi.org/10.2196/17637 doi: 10.2196/17637
[30]	M. Usama, B. Ahmad, J. Wan, M. S. Hossain, M. F. Alhamid, M. A. Hossain, Deep feature learning for disease risk assessment based on convolutional neural network with intra-layer recurrent connection by using hospital big data, IEEE Access, 6 (2018), 67927–67939. https://ieeexplore.ieee.org/abstract/document/8519726

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)