Research article Special Issues

Data governance and Gensini score automatic calculation for coronary angiography with deep-learning-based natural language extraction

  • Received: 09 December 2023 Revised: 24 January 2024 Accepted: 01 February 2024 Published: 23 February 2024
  • With the widespread adoption of electronic health records, the amount of stored medical data has been increasing. Clinical data, often in the form of semi-structured or unstructured electronic medical records (EMRs), contains rich patient information. However, due to the use of natural language by physicians when composing these records, the effectiveness of traditional methods such as dictionaries, rule matching, and machine learning in the extraction of information from these unstructured texts falls short of clinical standards. In this paper, a novel deep-learning-based natural language extraction method is proposed to overcome current shortcomings in data governance and Gensini score automatic calculation in coronary angiography. A pre-trained model called bidirectional encoder representation from transformers (BERT) with strong text feature representation capabilities is employed as the feature representation layer. It is combined with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) models to extract both global and local features from the text. The study included an evaluation of the model on a dataset from a hospital in China and it was compared with another model to validate its practical advantages. Hence, the BiLSTM-CRF model was employed to automatically extract relevant coronary angiogram information from EMR texts. The achieved F1 score was 91.19, which is approximately 0.87 higher than the BERT-BiLSTM-CRF model.

    Citation: Feng Li, Mingfeng Jiang, Hongzeng Xu, Yi Chen, Feng Chen, Wei Nie, Li Wang. Data governance and Gensini score automatic calculation for coronary angiography with deep-learning-based natural language extraction[J]. Mathematical Biosciences and Engineering, 2024, 21(3): 4085-4103. doi: 10.3934/mbe.2024180

    Related Papers:

  • With the widespread adoption of electronic health records, the amount of stored medical data has been increasing. Clinical data, often in the form of semi-structured or unstructured electronic medical records (EMRs), contains rich patient information. However, due to the use of natural language by physicians when composing these records, the effectiveness of traditional methods such as dictionaries, rule matching, and machine learning in the extraction of information from these unstructured texts falls short of clinical standards. In this paper, a novel deep-learning-based natural language extraction method is proposed to overcome current shortcomings in data governance and Gensini score automatic calculation in coronary angiography. A pre-trained model called bidirectional encoder representation from transformers (BERT) with strong text feature representation capabilities is employed as the feature representation layer. It is combined with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) models to extract both global and local features from the text. The study included an evaluation of the model on a dataset from a hospital in China and it was compared with another model to validate its practical advantages. Hence, the BiLSTM-CRF model was employed to automatically extract relevant coronary angiogram information from EMR texts. The achieved F1 score was 91.19, which is approximately 0.87 higher than the BERT-BiLSTM-CRF model.



    加载中


    [1] T. Wang, P. Xuan, Z. Liu, T. Zhang, Assistant diagnosis with Chinese electronic medical records based on CNN and BiLSTM with phrase-level and word-level attentions, BMC Bioinf. , 21 (2020). https://doi.org/10.1186/s12859-020-03554-x doi: 10.1186/s12859-020-03554-x
    [2] J. Tsai, G. Bond, A comparison of electronic records to paper records in mental health centers, Int. J. Qual. Health Care, 20 (2008), 136–143. https://doi.org/10.1093/intqhc/mzm064 doi: 10.1093/intqhc/mzm064
    [3] Y. Hu, Research on the information diagnostic technology based on medical information, University of Electronic Science and Technology of China, 2015.
    [4] Z. Obermeyer, E. J. Emanuel, Predicting the future—big data, machine learning, and clinical medicine, N. Engl. J. Med. , 375 (2016), 1216–1219. https://doi.org/10.1056/NEJMp1606181 doi: 10.1056/NEJMp1606181
    [5] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 (2015), 436–444. https://doi.org/10.1038/nature14539 doi: 10.1038/nature14539
    [6] J. Yang, Y. Guan, B. He, C. Qu, Q. Yu, Y. Liu, et al., Corpus construction for named entities and entity relations on chinese electronic medical records, J. Softw. , 27 (2016), 2725–2746.
    [7] L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, 77 (1989), 257–286. https://doi.org/10.1109/5.18626 doi: 10.1109/5.18626
    [8] A. Roberts, R. Gaizauskas, M. Hepple, Extracting clinical relationships from patient narratives, in Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, (2008), 10–18. https://doi.org/10.3115/1572306.1572309
    [9] J. Lafferty, A. McCallum, F. C. N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), (2001), 282–289. https://repository.upenn.edu/handle/20.500.14332/6188
    [10] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. , 9 (1997), 1735–1780.
    [11] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, preprint, arXiv: 1810.048052018.
    [12] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, et al., Deep contextualized word representations, Assoc. Comput. Linguist. , 1 (2018), 2227–2237. https://doi.org/10.18653/v1/N18-1202 doi: 10.18653/v1/N18-1202
    [13] T. Younga, D. Hazarikab, S. Poriac, E. Cambriad, Recent trends in deep learning based natural language processing, IEEE Comput. Intell. Mag. , 13 (2018), 55–75. https://doi.org/10.1109/MCI.2018.2840738 doi: 10.1109/MCI.2018.2840738
    [14] L. Ouyang, Y. Tian, H. Tang, B. Zhang, Chinese named entity recognition based on B-LSTM neural network with additional features, in International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, (2017), 269–279. https://doi.org/10.1007/978-3-319-72389-1_22
    [15] Y. Xiang, Chinese named entity recognition with character-word mixed embedding, in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (2017), 2055–2058.
    [16] H. Yang, H. Gao, Toward sustainable virtualized healthcare: Extracting medical entities from Chinese online health consultations using deep neural networks, Sustainability, 10 (2018), 3292. https://doi.org/10.3390/su10093292 doi: 10.3390/su10093292
    [17] W. Zhang, S. Jiang, S. Zhao, K. Hou, Y. Liu, L. Zhang, A BERT-BiLSTM-CRF model for Chinese electronic medical records named entity recognition, in 2019 12th International Conference on Intelligent Computation Technology and Automation (ICICTA), (2019), 166–169. https://doi.org/10.1109/ICICTA49267.2019.00043
    [18] X. Zhang, Y. Zhang, Q. Zhang, Y. Ren, T. Qiu, J. Ma, et al., Extracting comprehensive clinical information for breast cancer using deep learning methods, Int. J. Med. Inf. , 132 (2019), 103985.
    [19] L. Li, L. Jin, Y. Jiang, D. Huang, Recognizing biomedical named entities based on the sentence vector/twin word embeddings conditioned bidirectional LSTM, in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, (2016), 165–176. https://doi.org/10.1007/978-3-319-47674-2_15
    [20] M. Habibi, L. Weber, M. Neves, D. L. Wiegandt, U. Leser, Deep learning with word embeddings improves biomedical named entity recognition, Bioinformatics, 33 (2017), i37–i48. https://doi.org/10.1093/bioinformatics/btx228 doi: 10.1093/bioinformatics/btx228
    [21] J. P. C. Chiu, E. Nichols, Named entity recognition with bidirectional LSTM-CNNs, Trans. Assoc. Comput. Linguist., 4 (2016), 357–370. https://doi.org/10.1162/tacl_a_00104 doi: 10.1162/tacl_a_00104
    [22] L. Li, Y. Guo, Biomedical named entity recognition with CNN-BLSTM-CRF, J. Chin. Inf. Newsp., (2018), 116–122.
    [23] D. S. Sachan, P. Xie, M. Sachan, P. Xing, Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition, in Proceedings of the 3rd Machine Learning for Healthcare Conference, (2018), 383–402.
    [24] E. F. Tjong K. Sang, J. Veenstra, in Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, (1999), 173–179. https://doi.org/10.3115/977035.977059
    [25] X. Dong, S. Chowdhury, L. Qian, Y. Guan, J. Yang, Q. Yu, Transfer bi-directional LSTM rnn for named entity recognition in chinese electronic medical records, in 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), (2017), 12–15. https://doi.org/10.1109/HealthCom.2017.8210840
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1018) PDF downloads(78) Cited by(0)

Article outline

Figures and Tables

Figures(6)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog