Extracting entity relations from unstructured Chinese electronic medical records is an important task in medical information extraction. However, Chinese electronic medical records mostly have document-level volumes, and existing models are either unable to handle long text sequences or exhibit poor performance. This paper proposes a neural network based on feature augmentation and cascade binary tagging framework. First, we utilize a pre-trained model to tokenize the original text and obtain word embedding vectors. Second, the word vectors are fed into the feature augmentation network and fused with the original features and position features. Finally, the cascade binary tagging decoder generates the results. In the current work, we built a Chinese document-level electronic medical record dataset named VSCMeD, which contains 595 real electronic medical records from vascular surgery patients. The experimental results show that the model achieves a precision of 87.82% and recall of 88.47%. It is also verified on another Chinese medical dataset CMeIE-V2 that the model achieves a precision of 54.51% and recall of 48.63%.
Citation: Xiaoqing Lu, Jijun Tong, Shudong Xia. Entity relationship extraction from Chinese electronic medical records based on feature augmentation and cascade binary tagging framework[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 1342-1355. doi: 10.3934/mbe.2024058
Extracting entity relations from unstructured Chinese electronic medical records is an important task in medical information extraction. However, Chinese electronic medical records mostly have document-level volumes, and existing models are either unable to handle long text sequences or exhibit poor performance. This paper proposes a neural network based on feature augmentation and cascade binary tagging framework. First, we utilize a pre-trained model to tokenize the original text and obtain word embedding vectors. Second, the word vectors are fed into the feature augmentation network and fused with the original features and position features. Finally, the cascade binary tagging decoder generates the results. In the current work, we built a Chinese document-level electronic medical record dataset named VSCMeD, which contains 595 real electronic medical records from vascular surgery patients. The experimental results show that the model achieves a precision of 87.82% and recall of 88.47%. It is also verified on another Chinese medical dataset CMeIE-V2 that the model achieves a precision of 54.51% and recall of 48.63%.
[1] | E. Hossain, R. Rajib, N. Higgins, J. Soar, P. D. Barua, A. R. Pisani, et al., Natural language processing in electronic health records in relation to healthcare decision-making: A systematic review, Comput. Biol. Med., 155 (2023), 106649. https://doi.org/10.1016/j.compbiomed.2023.106649 doi: 10.1016/j.compbiomed.2023.106649 |
[2] | C. A. Nelson, R. Bove, A. J. Butte, S. E. Baranzini, Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis, J. Am. Med. Inf. Assoc., 29 (2021), 424–434. https://doi.org/10.1093/jamia/ocab270 doi: 10.1093/jamia/ocab270 |
[3] | Z. Ning, D. Du, C. Tu, Q. Feng, Y. Zhang, Relation-aware shared representation learning for cancer prognosis analysis with auxiliary clinical variables and incomplete multi-modality data, IEEE Trans. Med. Imag., 41 (2022), 186–198. https://doi.org/10.1109/TMI.2021.3108802 doi: 10.1109/TMI.2021.3108802 |
[4] | X. Li, H. Liu, X. Zhao, G. Zhang, C. Xing, Automatic approach for constructing a knowledge graph of knee osteoarthritis in Chinese, Health Inf. Sci. Syst., 8 (2020), 12. https://doi.org/10.1007/s13755-020-0102-4 doi: 10.1007/s13755-020-0102-4 |
[5] | F. Liu, M. Liu, M. Li, Y. Xin, D. Gao, J. Wu, et al., Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction, Quant. Imaging Med. Surg., 13 (2023), 3873−3890. https://doi.org/10.21037/qims-22-1158 doi: 10.21037/qims-22-1158 |
[6] | J. Wu, X, Liu, X. Zhang, Z. He, P. Lv, Master clinical medical knowledge at certificated-doctor-level with deep learning model, Nat. Commun., 9 (2018), 4352. https://doi.org/10.1038/s41467-018-06799-6 doi: 10.1038/s41467-018-06799-6 |
[7] | T. Sun, K. Yan, T. Li, X. Lu, Q. Dong, Auxiliary diagnosis of type 2 diabetes complication based on text mining, in 2022 IEEE 5th International Conference on Big Data and Artificial Intelligence (BDAI), (2022), 190–194. https://doi.org/10.1109/BDAI56143.2022.9862667 |
[8] | D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, Relation classification via convolutional deep neural network, in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, (2014), 2335–2344. |
[9] | G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2016), 260–270. https://doi.org/10.18653/v1/N16-1030 |
[10] | X. Shi, Y. Yi, Y. Xiong, B. Tang, Q. Chen, X. Wang, et al., Extracting entities with attributes in clinical text via joint deep learning, J. Am. Med. Inf. Assoc., 26 (2019), 1584–1591. https://doi.org/10.1093/jamia/ocz158 doi: 10.1093/jamia/ocz158 |
[11] | Q. Wei, Z. Ji, Z. Li, J. Du, J. Wang, J. Xu, et al., A study of deep learning approaches for medication and adverse drug event extraction from clinical text, J. Am. Med. Inf. Assoc., 27 (2019), 13–21. https://doi.org/10.1093/jamia/ocz063 doi: 10.1093/jamia/ocz063 |
[12] | X. Yang, J. Bian, Y. Gong, W. R. Hogan, Y. Wu, MADEx: A system for detecting medications, adverse drug events, and their relations from clinical notes, Drug Saf., 42 (2019), 123–133. https://doi.org/10.1007/s40264-018-0761-0 doi: 10.1007/s40264-018-0761-0 |
[13] | X. Yang, J. Bian, R. Fang, R. I. Bjarnadottir, W. R. Hogan, Y. Wu, Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting, J. Am. Med. Inf. Assoc., 27 (2019), 65–72. https://doi.org/10.1093/jamia/ocz144 doi: 10.1093/jamia/ocz144 |
[14] | J. D. Lafferty, A. McCallum, F. C. N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proceedings of the Eighteenth International Conference on Machine Learning, (2001), 282–89. |
[15] | S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735 |
[16] | Z. Wei, J. Su, Y. Wang, Y. Tian, Y. Chang, A novel cascade binary tagging framework for relational triple extraction, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (2020), 1476–1488. https://doi.org/10.18653/v1/2020.acl-main.136 |
[17] | J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1 (2019), 4171–4186. https://doi.org/10.18653/v1/N19-1423 |
[18] | Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, et al., Domain-specific language model pretraining for biomedical natural language processing, ACM Trans. Comput. Healthcare, 3 (2021). https://doi.org/10.1145/3458754 doi: 10.1145/3458754 |
[19] | E. Alsentzer, J. Murphy, W. Boag, W. Weng, D. Jindi, T. Naumann, et al., Publicly available clinical BERT embeddings, in Proceedings of the 2nd Clinical Natural Language Processing Workshop, (2019), 72–78. https://doi.org/10.18653/v1/W19-1909 |
[20] | C. Vasantharajan, K. Z. Tun, H. Thi-Nga, S. Jain, T. Rong, C. E. Siong, MedBERT: A pre-trained language model for biomedical named entity recognition, in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), (2022), 1482–1488. https://doi.org/10.23919/APSIPAASC55919.2022.9980157 |
[21] | H. Wang, M. Tan, M. Yu, S. Chang, D. Wang, K. Xu, et al., Extracting multiple-relations in one-pass with pre-trained transformers, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (2019), 1371–1377. https://doi.org/10.18653/v1/P19-1132 |
[22] | Y. Shang, H. Huang, X. Mao, OneRel: Joint entity and relation extraction with one module in one step, in Proceedings of the AAAI Conference on Artificial Intelligence, 36 (2022), 11285–11293. https://doi.org/10.1609/aaai.v36i10.21379 |
[23] | Y. Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, et al., Unified structure generation for universal information extraction, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 1 (2022), 5755–5772. https://doi.org/10.18653/v1/2022.acl-long.395 |
[24] | Z. Zhong, D. Chen, A frustratingly easy approach for entity and relation extraction, in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), 50–61. https://doi.org/10.18653/v1/2021.naacl-main.5 |
[25] | W. Sun, A. Rumshisky, O. Uzuner, Annotating temporal information in clinical narratives, J. Biomed. Inf., 46 (2013), S5–S12. https://doi.org/10.1016/j.jbi.2013.07.004 doi: 10.1016/j.jbi.2013.07.004 |
[26] | C. Wei, Y Peng, R. Leaman, A. P. Davis, C. J. Mattingly, J. Li, et al., Assessing the state of the art in biomedical relation extraction: overview of the biocreative V chemical-disease relation (CDR) task, Database, 2016 (2016), baw032. https://doi.org/10.1093/database/baw032 doi: 10.1093/database/baw032 |
[27] | A. E. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, et al., MIMIC-Ⅲ, a freely accessible critical care database, Sci. Data, 3 (2016), 160035. https://doi.org/10.1038/sdata.2016.35 doi: 10.1038/sdata.2016.35 |
[28] | A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, et al., MIMIC-Ⅳ, a freely accessible electronic health record dataset, Sci. Data, 10 (2023). https://doi.org/10.1038/s41597-022-01899-x doi: 10.1038/s41597-022-01899-x |
[29] | T. Li, Y. Xiong, X. Wang, Q. Chen, B. Tang, Document-level medical relation extraction via edge-oriented graph neural network based on document structure and external knowledge, BMC Med. Inf. Decis. Making, 21 (2021), 368. https://doi.org/10.1186/s12911-021-01733-1 doi: 10.1186/s12911-021-01733-1 |
[30] | T. Chen, M. Wu, H, Li, A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning, Database, 2019 (2019), baz116. https://doi.org/10.1093/database/baz116 doi: 10.1093/database/baz116 |
[31] | Y. Sun, J. Wang, H. Lin, Y. Zhang, Z. Yang, Knowledge guided attention and graph convolutional networks for chemical-disease relation extraction, IEEE/ACM Trans. Comput. Biol. Bioinf., 20 (2023), 489–499. https://doi.org/10.1109/TCBB.2021.3135844 doi: 10.1109/TCBB.2021.3135844 |
[32] | N. Zhang, M. Chen, Z. Bi, X. Liang, L. Li, X. Shang, et al., CBLUE: A chinese biomedical language understanding evaluation benchmark, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 1 (2022), 7888–7915. https://doi.org/10.18653/v1/2022.acl-long.544 |
[33] | H. Chang, H. Zan, T. Guan, K. Zhang, Z. Sui, Application of cascade binary pointer tagging in joint entity and relation extraction of chinese medical text, Math. Biosci. Eng., 19 (2022), 10656–10672. https://doi.org/10.3934/mbe.2022498 doi: 10.3934/mbe.2022498 |
[34] | Y. Pang, X. Qin, Z. Zhang, Specific relation attention-guided graph neural networks for joint entity and relation extraction in Chinese EMR, Appl. Sci., 12 (2022), 8493. https://doi.org/10.3390/app12178493 doi: 10.3390/app12178493 |
[35] | Q. Zhang, M. Wu, P. Lv, M. Zhang, L. Lv, Research on Chinese medical entity relation extraction based on syntactic dependency structure information, Appl. Sci., 12 (2022), 9781. https://doi.org/10.3390/app12199781 doi: 10.3390/app12199781 |
[36] | Q. Ye, T. Cai, X. Ji, T. Ruan, H. Zheng, Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts, BMC Med. Inf. Decis. Making, 23 (2023), 34. https://doi.org/10.1186/s12911-023-02127-1 doi: 10.1186/s12911-023-02127-1 |
[37] | T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, preprint, arXiv: 1301.3781. |
[38] | Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, et al., Ernie: Enhanced representation through knowledge integration, (2019). https://doi.org/10.48550/arXiv.1904.09223 |
[39] | J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555. |
[40] | O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, Med. Image Comput. Comput. Assisted Int., 9351 (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28 doi: 10.1007/978-3-319-24574-4_28 |
[41] | F. Ren, L. Zhang, X. Zhao, S. Yin, S. Liu, B. Li, A simple but effective bidirectional framework for relational triple extraction, in Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, (2022), 824–832. https://doi.org/10.1145/3488560.3498409 |
[42] | Y. Cui, W. Che, T. Liu, B. Qin, Z. Yang, Pre-training with whole word masking for Chinese BERT, IEEE/ACM Trans. Audio Speech Lang. Process., 29 (2021), 3504–3514. https://doi.org/10.1109/TASLP.2021.3124365 doi: 10.1109/TASLP.2021.3124365 |
[43] | L. Chen, L. Song, Y. Shao, D. Li, K. Ding, Using natural language processing to extract clinically useful information from chinese electronic medical records, Int. J. Med. Inf., 124 (2019), 6–12. https://doi.org/10.1016/j.ijmedinf.2019.01.004 doi: 10.1016/j.ijmedinf.2019.01.004 |