Research article Special Issues

Construction of cardiovascular information extraction corpus based on electronic medical records


  • Received: 31 March 2023 Revised: 12 May 2023 Accepted: 24 May 2023 Published: 09 June 2023
  • Cardiovascular disease has a significant impact on both society and patients, making it necessary to conduct knowledge-based research such as research that utilizes knowledge graphs and automated question answering. However, the existing research on corpus construction for cardiovascular disease is relatively limited, which has hindered further knowledge-based research on this disease. Electronic medical records contain patient data that span the entire diagnosis and treatment process and include a large amount of reliable medical information. Therefore, we collected electronic medical record data related to cardiovascular disease, combined the data with relevant work experience and developed a standard for labeling cardiovascular electronic medical record entities and entity relations. By building a sentence-level labeling result dictionary through the use of a rule-based semi-automatic method, a cardiovascular electronic medical record entity and entity relationship labeling corpus (CVDEMRC) was constructed. The CVDEMRC contains 7691 entities and 11,185 entity relation triples, and the results of consistency examination were 93.51% and 84.02% for entities and entity-relationship annotations, respectively, demonstrating good consistency results. The CVDEMRC constructed in this study is expected to provide a database for information extraction research related to cardiovascular diseases.

    Citation: Hongyang Chang, Hongying Zan, Shuai Zhang, Bingfei Zhao, Kunli Zhang. Construction of cardiovascular information extraction corpus based on electronic medical records[J]. Mathematical Biosciences and Engineering, 2023, 20(7): 13379-13397. doi: 10.3934/mbe.2023596

    Related Papers:

  • Cardiovascular disease has a significant impact on both society and patients, making it necessary to conduct knowledge-based research such as research that utilizes knowledge graphs and automated question answering. However, the existing research on corpus construction for cardiovascular disease is relatively limited, which has hindered further knowledge-based research on this disease. Electronic medical records contain patient data that span the entire diagnosis and treatment process and include a large amount of reliable medical information. Therefore, we collected electronic medical record data related to cardiovascular disease, combined the data with relevant work experience and developed a standard for labeling cardiovascular electronic medical record entities and entity relations. By building a sentence-level labeling result dictionary through the use of a rule-based semi-automatic method, a cardiovascular electronic medical record entity and entity relationship labeling corpus (CVDEMRC) was constructed. The CVDEMRC contains 7691 entities and 11,185 entity relation triples, and the results of consistency examination were 93.51% and 84.02% for entities and entity-relationship annotations, respectively, demonstrating good consistency results. The CVDEMRC constructed in this study is expected to provide a database for information extraction research related to cardiovascular diseases.



    加载中


    [1] N. Health, F. P. C. of the People's Republic of China, Electronic medical records application management standards (trial), Chin. Pract. J. Rural Doctor, 24 (2017), 3.
    [2] K. A. Spackman, K. E. Campbell, R. A. Côté, Snomed rt: a reference terminology for health care, in Proceedings of the AMIA Annual Fall Symposium, American Medical Informatics Association, (1997), 640.
    [3] M. O'neil, C. Payne, J. Read, Read codes version 3: a user led terminology, Methods Inf. Med., 34 (1995), 187–192. https://doi.org/10.1055/s-0038-1634585 doi: 10.1055/s-0038-1634585
    [4] M. Q. Stearns, C. Price, K. A. Spackman, A. Y. Wang, Snomed clinical terms: overview of the development process and project status, in Proceedings of the AMIA Symposium, American Medical Informatics Association, (2001), 662.
    [5] S. Meystre, P. J. Haug, Natural language processing to extract medical problems from electronic clinical documents: performance evaluation, J. Biomed. Inf., 39 (2006), 589–599. https://doi.org/10.1016/j.jbi.2005.11.004 doi: 10.1016/j.jbi.2005.11.004
    [6] G. K. Savova, J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper-Schuler, et al., Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications, J. Am. Med. Inf. Assoc., 17 (2010), 507–513. https://doi.org/10.1136/jamia.2009.001560 doi: 10.1136/jamia.2009.001560
    [7] A. Roberts, R. Gaizauskas, M. Hepple, G. Demetriou, Y. Guo, I. Roberts, et al., Building a semantically annotated corpus of clinical texts, J. Biomed. Inf., 42 (2009), 950–966. https://doi.org/10.1016/j.jbi.2008.12.013 doi: 10.1016/j.jbi.2008.12.013
    [8] Ö. Uzuner, B. R. South, S. Shen, S. L. DuVall, 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text, J. Am. Med. Inf. Assoc., 18 (2011), 552–556. https://doi.org/10.1136/amiajnl-2011-000203 doi: 10.1136/amiajnl-2011-000203
    [9] M. Morita, Y. Kano, T. Ohkuma, M. Miyabe, E. Aramaki, Overview of the ntcir-10 mednlp task., in NTCIR, (2013), 1.
    [10] L. Campillos, L. Deléger, C. Grouin, T. Hamon, A. L. Ligozat, A. Névéol, A french clinical corpus with comprehensive semantic annotations: development of the medical entity and relation limsi annotated text corpus (merlot), Lang. Resour. Eval., 52 (2018), 571–601. https://doi.org/10.1007/s10579-017-9382-y doi: 10.1007/s10579-017-9382-y
    [11] J. Lei, B. Tang, X. Lu, K. Gao, M. Jiang, H. Xu, A comprehensive study of named entity recognition in chinese clinical text, J. Am. Med. Inf. Assoc., 21 (2014), 808–814. https://doi.org/10.1136/amiajnl-2013-002381 doi: 10.1136/amiajnl-2013-002381
    [12] Y. Wang, Z. Yu, L. Chen, Y. Chen, Y. Liu, X. Hu, et al., Supervised methods for symptom name recognition in free-text clinical records of traditional chinese medicine: an empirical study, J. Biomed. Inf., 47 (2014), 91–104. https://doi.org/10.1016/j.jbi.2013.09.008 doi: 10.1016/j.jbi.2013.09.008
    [13] J. Yang, Q. Yu, Y. Guan, Z. Jiang, An overview of research on electronic medical record oriented named entity recognition and entity relation extraction, Acta Autom. Sin., 40 (2014), 1537–1562.
    [14] J. Su, B. He, H. Wu, J. Yang, Y. Guan, J. Jiang, et al., Cardiovascular disease risk factor labeling system and corpus construction based on Chinese electronic medical records, Acta Autom. Sin., 45 (2019), 420. https://doi.org/10.16383/j.aas.2018.c170206. doi: 10.16383/j.aas.2018.c170206}
    [15] H. Y. Zan, T. Liu, C. Y. Niu, Y. Zhao, Y. Zhang, Z. Sui, Construction and application of named entity and entity relations corpus for pediatric diseases, J. Chin. Inf. Process., 34 (2020), 19–26.
    [16] H. Zan, Y. Han, Y. Fan, C. Niu, K. Zhang, Z. Sui, Construction and analysis of symptom knowledge base in chinese, J. Chin. Inf. Process., 34 (2020), 33–40.
    [17] T. Guan, H. Zan, X. Zhou, H. Xu, K. Zhang, Cmeie: Construction and evaluation of Chinese medical information extraction dataset, in Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, Springer, (2020), 270–282.
    [18] Y. Ye, B. Hu, K. Zhang, H. Zan, Construction of corpus for entity and relation annotation of diabetes electronic medical records, in Proceedings of the 20th Chinese National Conference on Computational Linguistics, (2021), 622–632.
    [19] Z. Wu, S. Xuan, J. Xie, C. Lin, C. Lu, How to ensure the confidentiality of electronic medical records on the cloud: A technical perspective, Comput. Biol. Med., 147 (2022), 105726. https://doi.org/10.1016/j.compbiomed.2022.105726 doi: 10.1016/j.compbiomed.2022.105726
    [20] J. M. Beinecke, P. Anders, T. Schurrat, D. Heider, M. Luster, D. Librizzi, et al., Evaluation of machine learning strategies for imaging confirmed prostate cancer recurrence prediction on electronic health records, Comput. Biol. Med., 143 (2022), 105263. https://doi.org/10.1016/j.compbiomed.2022.105263 doi: 10.1016/j.compbiomed.2022.105263
    [21] H. Chang, H. Zan, T. Guan, K. Zhang, Z. Sui, Application of cascade binary pointer tagging in joint entity and relation extraction of chinese medical text, Math. Biosci. Eng., 19 (2022), 10656–10672. https://doi.org/10.3934/mbe.2022498 doi: 10.3934/mbe.2022498
    [22] E. Hossain, R. Rana, N. Higgins, J. Soar, P. D. Barua, A. R. Pisani, et al., Natural language processing in electronic health records in relation to healthcare decision-making: A systematic review, Comput. Biol. Med., 155 (2023), 106649. https://doi.org/10.1016/j.compbiomed.2023.106649 doi: 10.1016/j.compbiomed.2023.106649
    [23] H. Zan, Y. Han, Y. Fan, C. Niu, K. Zhang, Z. Sui, Establishment and analysis of chinese symptom knowledge base, J. Chin. Inf. Process., 34 (2020), 30–37.
    [24] E. Wu, Medical Imaging, 5th edition, 2003.
    [25] J. Yang, Y. Guan, B. He, C. Qu, Q. Yu, Y. Liu, et al., Corpus construction for named entities and entity relations on Chinese electronic medical records, J. Software, 27 (2016), 2725–2746.
    [26] Y. S. Zhao, K. L. Zhang, H. C. Ma, K. Li, Leveraging text skeleton for de-identification of electronic medical records, BMC Med. Inf. Decis. Making, 18 (2018), 65–72. https://doi.org/10.1186/s12911-018-0598-6 doi: 10.1186/s12911-018-0598-6
    [27] O. Uzuner, P. Szolovits, I. Kohane, i2b2 workshop on natural language processing challenges for clinical records, in Proceedings of the Fall Symposium of the American Medical Informatics Association, Citeseer, 2006.
    [28] K. Zhang, X. Zhao, T. Guan, B. Shang, Y. Li, H. Zan, Construction and application of medical text oriented entity and relationship annotation platform, J. Chin. Inf. Process., 34 (2020), 117–125.
    [29] R. Artstein, M. Poesio, Inter-coder agreement for computational linguistics, Comput. Ling., 34 (2008), 555–596. https://doi.org/10.1162/coli.07-034-R2 doi: 10.1162/coli.07-034-R2
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1666) PDF downloads(69) Cited by(2)

Article outline

Figures and Tables

Figures(4)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog