The knowledge graph is a critical resource for medical intelligence. The general medical knowledge graph tries to include all diseases and contains much medical knowledge. However, it is challenging to review all the triples manually. Therefore the quality of the knowledge graph can not support intelligence medical applications. Breast cancer is one of the highest incidences of cancer at present. It is urgent to improve the efficiency of breast cancer diagnosis and treatment through artificial intelligence technology and improve the postoperative health status of breast cancer patients. This paper proposes a framework to construct a breast cancer knowledge graph from heterogeneous data resources in response to this demand. Specifically, this paper extracts knowledge triple from clinical guidelines, medical encyclopedias and electronic medical records. Furthermore, the triples from different data resources are fused to build a breast cancer knowledge graph (BCKG). Experimental results demonstrate that BCKG can support knowledge-based question answering, breast cancer postoperative follow-up and healthcare, and improve the quality and efficiency of breast cancer diagnosis, treatment and management.
Citation: Bo An. Construction and application of Chinese breast cancer knowledge graph based on multi-source heterogeneous data[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6776-6799. doi: 10.3934/mbe.2023292
The knowledge graph is a critical resource for medical intelligence. The general medical knowledge graph tries to include all diseases and contains much medical knowledge. However, it is challenging to review all the triples manually. Therefore the quality of the knowledge graph can not support intelligence medical applications. Breast cancer is one of the highest incidences of cancer at present. It is urgent to improve the efficiency of breast cancer diagnosis and treatment through artificial intelligence technology and improve the postoperative health status of breast cancer patients. This paper proposes a framework to construct a breast cancer knowledge graph from heterogeneous data resources in response to this demand. Specifically, this paper extracts knowledge triple from clinical guidelines, medical encyclopedias and electronic medical records. Furthermore, the triples from different data resources are fused to build a breast cancer knowledge graph (BCKG). Experimental results demonstrate that BCKG can support knowledge-based question answering, breast cancer postoperative follow-up and healthcare, and improve the quality and efficiency of breast cancer diagnosis, treatment and management.
[1] | X. Zou, A survey on application of knowledge graph, J. Phys. Conf. Ser., 1487 (2020), 12016. https://doi.org/10.1088/1742-6596/1487/1/012016 doi: 10.1088/1742-6596/1487/1/012016 |
[2] | M. Kejriwal, Knowledge graphs and COVID-19: opportunities, challenges, and implementation, Harv. Data Sci. Rev., 11 (2020), 300. |
[3] | Q. H. Nguyen, T. T. Do, Y. Wang, S. S. Heng, K. Chen, W. H. M. Ang, et al., Breast cancer prediction using feature selection and ensemble voting, in 2019 International Conference on System Science and Engineering (ICSSE), IEEE, (2019), 250–254. |
[4] | K. Zhang, X. Ren, L. Zhuang, H. Zan, W. Zhang, Z. Sui, Construction of chinese medicine knowledge base, in Workshop on Chinese Lexical Semantics, Springer, (2020), 665–675. https://doi.org/10.1007/978-3-030-81197-6_56 |
[5] | P. H. Martins, Z. Marinho, A. Martins, Joint learning of named entity recognition and entity linking, preprint, arXiv: 1907.08243. |
[6] | J. Noh, R. Kavuluru, Joint learning for biomedical ner and entity normalization: encoding schemes, counterfactual examples, and zero-shot evaluation, in Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, (2021), 1–10. |
[7] | L. Liu, M. Wang, M. Zhang, L. Qing, X. He, Uamner: uncertainty-aware multimodal named entity recognition in social media posts, Appl. Intell., 52 (2022), 4109–4125. https://doi.org/10.1007/s10489-021-02546-5 doi: 10.1007/s10489-021-02546-5 |
[8] | S. S. Paliwal, D. Vishwanath, R. Rahul, M. Sharma, L. Vig, Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images, in 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, (2019), 128–133. |
[9] | W. Xiang, B. Wang, A survey of event extraction from text, IEEE Access, 7 (2019), 173111–173137. https://doi.org/10.1109/ACCESS.2019.2956831 doi: 10.1109/ACCESS.2019.2956831 |
[10] | Y. Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, et al., Unified structure generation for universal information extraction, preprint, arXiv: 2203.12277. |
[11] | B. P. Nguyen, H. N. Pham, H. Tran, N. Nghiem, Q. H. Nguyen, T. T. Do, et al., Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records, Comput. Methods Programs Biomed., 182 (2019), 105055. https://doi.org/10.1016/j.cmpb.2019.105055 doi: 10.1016/j.cmpb.2019.105055 |
[12] | X. Zhao, Y. Jia, A. Li, R. Jiang, Y. Song, Multi-source knowledge fusion: a survey, World Wide Web, 23 (2020), 2567–2592. https://doi.org/10.1007/s11280-020-00811-0 doi: 10.1007/s11280-020-00811-0 |
[13] | A. Hogan, E. Blomqvist, M. Cochez, C. D'Amato, G. D. Melo, C. Gutierrez, et al., Knowledge graphs, ACM Comput. Surv., 54 (2021), 1–37. https://doi.org/10.1145/3466817 |
[14] | M. Wang, X. He, L. Liu, L. Qing, H. Chen, Y. Liu, et al., Medical visual question answering based on question-type reasoning and semantic space constraint, Artif. Intell. Med., 131 (2022), 102346. https://doi.org/10.1016/j.artmed.2022.102346 doi: 10.1016/j.artmed.2022.102346 |
[15] | X. Zhu, Z. Li, X. Wang, X. Jiang, P. Sun, X. Wang, et al., Multi-modal knowledge graph construction and application: A survey, preprint, arXiv: 2202.05786. |
[16] | L. Liu, M. Wang, X. He, L. Qing, H. Chen, Fact-based visual question answering via dual-process system, Knowl. Based Syst., 237 (2022), 107650. https://doi.org/10.1016/j.knosys.2021.107650 doi: 10.1016/j.knosys.2021.107650 |
[17] | A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, T. M. Mitchell, Toward an architecture for never-ending language learning, in Twenty-Fourth AAAI Conference on Artificial Intelligence, 24 (2010), 1306–1313. |
[18] | D. Vrandečić, Wikidata: A new platform for collaborative data collection, in Proceedings of the 21st International Conference on World Wide Web, (2012), 1063–1064. |
[19] | L. Liu, M. Wang, X. He, L. Qing, J. Zhang, Extracting relational facts based on hybrid syntax-guided transformer and pointer network, J. Intell. Fuzzy Syst., 40 (2021), 12167–12183. https://doi.org/10.3233/JIFS-210281 doi: 10.3233/JIFS-210281 |
[20] | H. Lv, H. Liang, F. Ma, Constructing knowledge graph for financial equities, Data Anal. Knowl. Discovery, 4 (2020), 27–37. |
[21] | F. Sovrano, M. Palmirani, F. Vitali, Legal knowledge extraction for knowledge graph based question-answering, in Legal Knowledge and Information Systems, IOS Press, (2020), 143–153. |
[22] | Y. Wei, H. Wang, J. Zhao, Y. Liu, Y. Zhang, B. Wu, Gelaigelai: a visual platform for analysis of classical chinese poetry based on knowledge graph, in 2020 IEEE International Conference on Knowledge Graph (ICKG), IEEE, (2020), 513–520. |
[23] | F. Gong, M. Wang, H. Wang, S. Wang, M. Liu, Smr: medical knowledge graph embedding for safe medicine recommendation, Big Data Res., 23 (2021), 100174. https://doi.org/10.1016/j.bdr.2020.100174 doi: 10.1016/j.bdr.2020.100174 |
[24] | H. Chen, N. Hu, G. Qi, H. Wang, Z. Bi, J. Li, et al., Openkg chain: A blockchain infrastructure for open knowledge graphs, Data Intell., 3 (2021), 205–227. |
[25] | A. Chatterjee, C. Nardi, C. Oberije, P. Lambin, Knowledge graphs for COVID-19: An exploratory review of the current landscape, J. Pers. Med., 11 (2021), 300. https://doi.org/10.3390/jpm11040300 doi: 10.3390/jpm11040300 |
[26] | S. Ji, S. Pan, E. Cambria, P. Marttinen, S. Y. Philip, A survey on knowledge graphs: Representation, acquisition, and applications, IEEE Trans. Neural Networks Learn. Syst., 33 (2021), 494–514. https://doi.org/10.1109/TNNLS.2021.3070843 doi: 10.1109/TNNLS.2021.3070843 |
[27] | B. Xie, S. Li, F. Lv, C. H. Liu, G. Wang, D. Wu, A collaborative alignment framework of transferable knowledge extraction for unsupervised domain adaptation, IEEE Trans. Knowl. Data Eng., 2022 (2022). https://doi.org/10.1109/TKDE.2022.3185233 |
[28] | J. Li, A. Sun, J. Han, C. Li, A survey on deep learning for named entity recognition, IEEE Trans. Knowl. Data Eng., 34 (2020), 50–70. https://doi.org/10.1007/s10618-019-00656-w doi: 10.1007/s10618-019-00656-w |
[29] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems 30 (NIPS 2017), (2017), 30. |
[30] | S. Edunov, A. Baevski, M. Auli, Pre-trained language model representations for language generation, preprint, arXiv: 1903.09722. |
[31] | L. X. Liang, L. Lin, E. Lin, W. S. Wen, G. Y. Huang, A joint learning model to extract entities and relations for chinese literature based on self-attention, Mathematics, 10 (2022), 2216. https://doi.org/10.3390/math10132216 doi: 10.3390/math10132216 |
[32] | M. Zhang, Y. Chen, J. Lin, A privacy-preserving optimization of neighborhood-based recommendation for medical-aided diagnosis and treatment, IEEE Internet Things J., 8 (2021), 10830–10842. https://doi.org/10.1109/JIOT.2021.3051060 doi: 10.1109/JIOT.2021.3051060 |
[33] | B. An, X. Han, C. Fu, L. Sun, Retrofitting soft rules for knowledge representation learning, Big Data Res., 24 (2021), 100156. https://doi.org/10.1016/j.bdr.2020.100156 doi: 10.1016/j.bdr.2020.100156 |
[34] | J. H. Gennari, M. A. Musen, R. W. Fergerson, W. E. Grosso, M. Crubézy, H. Eriksson, et al., The evolution of protégé: an environment for knowledge-based systems development, Int. J. Human Comput. Stud., 58 (2003), 89–123. https://doi.org/10.1016/S0031-9406(05)60588-3 doi: 10.1016/S0031-9406(05)60588-3 |
[35] | M. Peleg, Computer-interpretable clinical guidelines: a methodological review, J. Biomed. Inf., 46 (2013), 744–763. https://doi.org/10.1016/j.jbi.2013.06.009 doi: 10.1016/j.jbi.2013.06.009 |
[36] | Z. Dai, X. Wang, P. Ni, Y. Li, G. Li, X. Bai, Named entity recognition using bert bilstm crf for Chinese electronic health records, in 2019 12th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (cisp-bmei), IEEE, (2019), 1–5. |
[37] | Z. Ni, L. Ma, H. Zeng, J. Chen, C. Cai, K. K. Ma, Esim: Edge similarity for screen content image quality assessment, IEEE Trans. Image Process., 26 (2017), 4818–4831. https://doi.org/10.1109/TIP.2017.2718185 doi: 10.1109/TIP.2017.2718185 |
[38] | L. Li, R. Ma, Q. Guo, X. Xue, X. Qiu, Bert-attack: Adversarial attack against bert using bert, preprint, arXiv: 2004.09984. |
[39] | E. K. W. Leow, B. P. Nguyen, M. C. H. Chua, Robo-advisor using genetic algorithm and bert sentiments from tweets for hybrid portfolio optimisation, Expert Syst. Appl., 179 (2021), 115060. https://doi.org/10.1016/j.eswa.2021.115060 doi: 10.1016/j.eswa.2021.115060 |
[40] | T. Nguyen-Vo, Q. H. Trinh, L. Nguyen, T. T. Do, M. C. H. Chua, B. P. Nguyen, Predicting antimalarial activity in natural products using pretrained bidirectional encoder representations from transformers, J. Chem. Inf. Model., 62 (2021), 5050–5058. https://doi.org/10.1021/acs.jcim.1c00584 doi: 10.1021/acs.jcim.1c00584 |
[41] | Z. Niu, G. Zhong, H. Yu, A review on the attention mechanism of deep learning, Neurocomputing, 452 (2021), 48–62. https://doi.org/10.1007/s43830-021-0173-9 doi: 10.1007/s43830-021-0173-9 |
[42] | A. E. Patanwala, A practical guide to conducting and writing medical record review studies, Am. J. Health Syst. Pharm., 74 (2017), 1853–1864. https://doi.org/10.2146/ajhp170183 doi: 10.2146/ajhp170183 |
[43] | M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, et al., Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, preprint, arXiv: 1910.13461. |
[44] | Z. Yuan, Z. Zhao, H. Sun, J. Li, F. Wang, S. Yu, Coder: Knowledge-infused cross-lingual medical term embedding for term normalization, J. Biomed. Inf., 126 (2022), 103983. https://doi.org/10.1016/j.jbi.2021.103983 doi: 10.1016/j.jbi.2021.103983 |
[45] | Y. Shen, N. Ding, H. T. Zheng, Y. Li, M. Yang, Modeling relation paths for knowledge graph completion, IEEE Trans. Knowl. Data Eng., 33 (2020), 3607–3617. https://doi.org/10.1109/TKDE.2020.2970044 doi: 10.1109/TKDE.2020.2970044 |