Research article Special Issues

A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks

  • Received: 06 November 2023 Revised: 15 December 2023 Accepted: 19 December 2023 Published: 28 December 2023
  • Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.

    Citation: Shuilong Zou, Zhaoyang Liu, Kaiqi Wang, Jun Cao, Shixiong Liu, Wangping Xiong, Shaoyi Li. A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 1489-1507. doi: 10.3934/mbe.2024064

    Related Papers:

  • Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.



    加载中


    [1] C. Yang, D. Xiao, Y. Luo, B. Li, X. Zhao, H. Zhang, A hybrid method based on semi-supervised learning for relation extraction in Chinese EMRs, BMC Med. Inf. Decis. Mak., 22 (2022), 169-181. https://doi.org/10.1186/s12911-022-01908-4 doi: 10.1186/s12911-022-01908-4
    [2] Q. Hu, T. Yu, J. Li, Q. Yu, L. Zhu, Y. Gu, End-to-End syndrome differentiation of Yin deficiency and Yang deficiency in traditional Chinese medicine, Comput. Methods Programs Biomed., 174 (2019), 9-15. https://doi.org/10.1016/j.cmpb.2018.10.011 doi: 10.1016/j.cmpb.2018.10.011
    [3] L. Gong, J. Jiang, S. Chen, M. Qi, A syndrome differentiation model of TCM based on multi-label deep forest using biomedical text mining, Front. Genet., 14 (2023). ttps://doi.org/10.3389/fgene.2023.1272016
    [4] T. Qi, S. Qiu, X. Shen, H. Chen, S. Yang, H. Wen, et al., KeMRE: Knowledge-enhanced medical relation extraction for Chinese medicine instructions, J. Biomed. Inf., 120 (2021), 103834. https://doi.org/10.1016/j.jbi.2021.103834 doi: 10.1016/j.jbi.2021.103834
    [5] H. Wan, M. F. Moens, W. Luyten, X. Zhou, Q. Mei, L. Liu, et al., Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks, J. Am. Med. Inf. Assoc., 23 (2016), 356-365. https://doi.org/10.1093/jamia/ocv092 doi: 10.1093/jamia/ocv092
    [6] X. Chen, C. Ruan, Y. Zhang, H. Chen, Heterogeneous information network based clustering for precision traditional Chinese medicine, BMC Med. Inf. Decis. Making, 19 (2019). https://doi.org/10.1186/s12911-019-0963-0 doi: 10.1186/s12911-019-0963-0
    [7] X. Liu, Y. Liu, H. Wu, Q. Guan, A tag based joint extraction model for Chinese medical text, Comput. Biol. Chem., 93 (2021). https://doi.org/10.1016/j.compbiolchem.2021.107508 doi: 10.1016/j.compbiolchem.2021.107508
    [8] H. Chang, H. Zan, T. Guan, K. Zhang, Z. Sui, Application of cascade binary pointer tagging in joint entity and relation extraction of Chinese medical text, Math. Biosci. Eng., 19 (2022), 10656-10672. https://doi.org/10.3934/mbe.2022498 doi: 10.3934/mbe.2022498
    [9] T. Savalia, A. Shukla, R. Bapi, A unified theoretical framework for cognitive sequencing, Front. Psychol., 7 (2016). https://doi.org/10.3389/fpsyg.2016.01821 doi: 10.3389/fpsyg.2016.01821
    [10] H. Le, D. Can, N. Collier, Exploiting document graphs for inter sentence relation extraction, Biomed. Semantics, 13 (2022), 15. https://doi.org/10.1186/s13326-022-00267-3 doi: 10.1186/s13326-022-00267-3
    [11] Y. Lin, S. Shen, Z. Liu, H. Luan, M. Sun, Neural relation extraction with selective attention over instances, Ann. Meet. Assoc. Comput. Linguist., (2016), 2124-2133. https://doi.org/10.18653/v1/P16-1200 doi: 10.18653/v1/P16-1200
    [12] L. Luo, Z. Yang, M. Cao, L. Wang, Y. Zhang, H. Lin, A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature, J. Biomed. Inf., 103 (2020). https://doi.org/10.1016/j.jbi.2020.103384 doi: 10.1016/j.jbi.2020.103384
    [13] H. Zhou, Deng H, Chen L, Yang Y, Jia C, Huang D, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016 (2016), baw048. https://doi.org/10.1093/database/baw048 doi: 10.1093/database/baw048
    [14] Y. Zhang, H. Lin, Z. Yang, J. Wang, S. Zhang, Y. Sun, et al., A hybrid model based on neural networks for biomedical relation extraction, J. Biomed. Inf., 81 (2018), 83-92. https://doi.org/10.1016/j.jbi.2018.03.011 doi: 10.1016/j.jbi.2018.03.011
    [15] C. Quirk, H. Poon, Distant supervision for relation extraction beyond the sentence boundary, preprint, arXiv: 1609.04873.
    [16] Y. Shi, Y. Xiao, P. Quan, M. Lei, L. Niu, Distant supervision relation extraction via adaptive dependency-path and additional knowledge graph supervision, Neural Netw., 134 (2021), 42-53. https://doi.org/10.1016/j.neunet.2020.10.012 doi: 10.1016/j.neunet.2020.10.012
    [17] Y. Liang, F. Meng, Y. Zhang, Y. Chen, J. Xu, J. Zhou, Infusing multi-source knowledge with heterogeneous graph neural network for emotional conversation generation, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021). https://doi.org/10.1609/aaai.v35i15.17575
    [18] Y. Liang, F. Meng, Y. Zhang, Y. Chen, J. Xu, J. Zhou. Emotional conversation generation with heterogeneous graph neural network, Arti. Intell., 308 (2022). https://doi.org/10.1016/j.artint.2022.103714 doi: 10.1016/j.artint.2022.103714
    [19] X. Chu, B. Sun, Q. Huang, S. Peng, Y. Zhou, Y. Zhang, Quantitative knowledge presentation models of traditional Chinese medicine (TCM): A review, Arti. Intell. Med., 103 (2020). https://doi.org/10.1016/j.artmed.2020.101810 doi: 10.1016/j.artmed.2020.101810
    [20] X. Zhou, B. Liu, Z. Wu, Y. Feng, Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks, Arti. Intell. Med., 41 (2007), 87-104. https://doi.org/10.1016/j.artmed.2007.07.007 doi: 10.1016/j.artmed.2007.07.007
    [21] T. Li, À. Bravo, L. Furlong, B. Good, A. Su, A crowdsourcing workflow for extracting chemical-induced disease relations from free text, Database, 2016 (2016). https://doi.org/10.1093/database/baw051 doi: 10.1093/database/baw051
    [22] X. Yang, C. Wu, G. Nenadic, W. Wang, K. Lu, Mining a stroke knowledge graph from literature, BMC Bioinf., 22 (2021). https://doi.org/10.1186/s12859-021-04502-z doi: 10.1186/s12859-021-04502-z
    [23] G. Meng, Y. Huang, Q. Yu, Y. Ding, D. Wild, Y. Zhao, et al., Adopting text mining on rehabilitation therapy repositioning for stroke, Front. Neuroinf., 13 (2019), 17. https://doi.org/10.3389/fninf.2019.00017 doi: 10.3389/fninf.2019.00017
    [24] M. Ji, J. Zhou, N. Wei, AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model, PLoS One, 17 (2022). https://doi.org/10.1371/journal.pone.0273936 doi: 10.1371/journal.pone.0273936
    [25] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, et al., BioBERT: A pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, 4 (2020), 1234-1240. https://doi.org/10.1093/bioinformatics/btz682 doi: 10.1093/bioinformatics/btz682
    [26] H. Gong, X. You, M. Jin, Y. Meng, H. Zhang, S. Yang, et al., Graph neural network and multi-data heterogeneous networks for microbe-disease prediction, Front. Microbiol., 13 (2022). https://doi.org/10.3389/fmicb.2022.1077111 doi: 10.3389/fmicb.2022.1077111
    [27] Q. Liu, C. Long, J. Zhang, M. Xu, D. Tao, Aspect-aware graph attention network for heterogeneous information networks, IEEE Trans. Neural Netw. Learn. Syst., (2022). https://doi.org/10.36227/techrxiv.19311104 doi: 10.36227/techrxiv.19311104
    [28] Q. Zhao, D. Xu, J. Li, L. Zhao, F. A. Rajput, Knowledge guided distance supervision for biomedical relation extraction in Chinese electronic medical records, Expert Syst. Appl., 204 (2022), 117606. https://doi.org/10.1016/j.eswa.2022.117606 doi: 10.1016/j.eswa.2022.117606
    [29] J. Chen, W. Lin, S. Yang, M. F. Chiang, M. R. Hribar, Development of an open-source annotated glaucoma medication dataset from clinical notes in the electronic health record, Transl. Vis. Sci. Techn., 11 (2022), 20. https://doi.org/10.1167/tvst.11.11.20 doi: 10.1167/tvst.11.11.20
    [30] P. Kumar, B. Raman, A BERT based dual-channel explainable text emotion recognition system, Neural Netw., 150 (2022), 392-407. https://doi.org/10.1016/j.neunet.2022.03.017 doi: 10.1016/j.neunet.2022.03.017
    [31] G. Dai, X. Wang, X. Zou, C. Liu, S. Cen, MRGAT: Multi-relational graph attention network for knowledge graph completion, Neural Netw., 154 (2022), 234-245. https://doi.org/10.1016/j.neunet.2022.07.014 doi: 10.1016/j.neunet.2022.07.014
    [32] T. Dai, J. Zhao, D. Li, S. Tian, X. Zhao, S. Pan, Heterogeneous deep graph convolutional network with citation relational BERT for COVID-19 inline citation recommendation, Expert Syst. Appl., 213 (2023), 118841. https://doi.org/10.1016/j.eswa.2022.118841 doi: 10.1016/j.eswa.2022.118841
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1052) PDF downloads(75) Cited by(0)

Article outline

Figures and Tables

Figures(8)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog