Goal: With the continuing shortage and unequal distribution of medical resources, our objective is to develop a general diagnosis framework that utilizes a smaller amount of electronic medical records (EMRs) to alleviate the problem that the data volume requirement of prevailing models is too vast for medical institutions to afford. Methods: The framework proposed contains network construction, network expansion, and disease diagnosis methods. In the first two stages above, the knowledge extracted from EMRs is utilized to build and expense an EMR-based medical knowledge network (EMKN) to model and represent the medical knowledge. Then, percolation theory is modified to diagnose EMKN. Result: Facing the lack of data, our framework outperforms naïve Bayes networks, neural networks and logistic regression, especially in the top-10 recall. Out of 207 test cases, 51.7% achieved 100% in the top-10 recall, 21% better than what was achieved in one of our previous studies. Conclusion: The experimental results show that the proposed framework may be useful for medical knowledge representation and diagnosis. The framework effectively alleviates the lack of data volume by inferring the knowledge modeled in EMKN. Significance: The proposed framework not only has applications for diagnosis but also may be extended to other domains to represent and model the knowledge and inference on the representation.
Citation: Jingchi Jiang, Xuehui Yu, Yi Lin, Yi Guan. PercolationDF: A percolation-based medical diagnosis framework[J]. Mathematical Biosciences and Engineering, 2022, 19(6): 5832-5849. doi: 10.3934/mbe.2022273
Goal: With the continuing shortage and unequal distribution of medical resources, our objective is to develop a general diagnosis framework that utilizes a smaller amount of electronic medical records (EMRs) to alleviate the problem that the data volume requirement of prevailing models is too vast for medical institutions to afford. Methods: The framework proposed contains network construction, network expansion, and disease diagnosis methods. In the first two stages above, the knowledge extracted from EMRs is utilized to build and expense an EMR-based medical knowledge network (EMKN) to model and represent the medical knowledge. Then, percolation theory is modified to diagnose EMKN. Result: Facing the lack of data, our framework outperforms naïve Bayes networks, neural networks and logistic regression, especially in the top-10 recall. Out of 207 test cases, 51.7% achieved 100% in the top-10 recall, 21% better than what was achieved in one of our previous studies. Conclusion: The experimental results show that the proposed framework may be useful for medical knowledge representation and diagnosis. The framework effectively alleviates the lack of data volume by inferring the knowledge modeled in EMKN. Significance: The proposed framework not only has applications for diagnosis but also may be extended to other domains to represent and model the knowledge and inference on the representation.
[1] | M. L. Craig, C. A. Jackel, P. B. Gerrits, Selection of medical students and the maldistribution of the medical workforce in Queensland, Australia, Aust. J. Rural Health, 1 (1993), 17–21. https://doi.org/10.1111/j.1440-1584.1993.tb00075.x doi: 10.1111/j.1440-1584.1993.tb00075.x |
[2] | J. A. Osheroff, J. M. Teich, B. Middleton, E. B Steen, A. Wright, D. E. Detmer, A roadmap for national action on clinical decision support, J. Am. Med. Inf. Assoc., 14 (2007), 141–145. https://doi.org/10.1197/jamia.M2334 doi: 10.1197/jamia.M2334 |
[3] | D. Demner-Fushman, W. W. Chapman, C. J. McDonald, What can natural language processing do for clinical decision support? J. Biomed. Inf., 42 (2009), 760–772. https://doi.org/10.1016/j.jbi.2009.08.007 doi: 10.1016/j.jbi.2009.08.007 |
[4] | A. N. Kho, J. A. Pacheco, P. L. Peissig, L. Rasmussen, K. M. Newton, N. Weston, et al., Electronic medical records for genetic research: results of the emerge consortium, Sci. Transl. Med., 3 (2011) 79re1. https://doi.org/10.1126/scitranslmed.3001807 doi: 10.1126/scitranslmed.3001807 |
[5] | R. C. Wasserman, Electronic medical recor (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research, Acad. Pediatr., 11 (2011), 280–287. https://doi.org/10.1016/j.acap.2011.02.007 doi: 10.1016/j.acap.2011.02.007 |
[6] | A. Rajkomar, J. Dean, I. Kohane, Machine learning in medicine, N. Engl. J. Med., 2019. https://doi.org/10.1056/NEJMra1814259 doi: 10.1056/NEJMra1814259 |
[7] | T. Ma, A. Zhang, AffinityNet: semi-supervised few-shot learning for disease type prediction, in Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 1069–1076. https://doi.org/10.1609/aaai.v33i01.33011069 |
[8] | Y. Wang, Q. Yao, J. T. Kwok, L. M. Ni, Generalizing from a few examples: A survey on few-shot learning, preprint, arXiv: 1904.05046. |
[9] | M. E. J. Newman, The structure and function of complex networks, SIAM Rev., 45 (2003), 167–256. https://doi.org/10.1137/S003614450342480 doi: 10.1137/S003614450342480 |
[10] | A. L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: A network-based approach to human disease, Nat. Rev. Genet., 12 (2011), 56–68. https://doi.org/10.1038/nrg2918 doi: 10.1038/nrg2918 |
[11] | K. I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, A. L. Barabási, The human disease network, Proc. Natl. Acad. Sci., 104 (2007), 8685–8690. https://doi.org/10.1073/pnas.0701361104 doi: 10.1073/pnas.0701361104 |
[12] | C. A. Hidalgo, N. Blumm, A. L. Barabási, N. A. Christakis, A dynamic network approach for the study of human phenotypes, PLoS Comput. Biol., 5 (2009), e1000353. https://doi.org/10.1371/journal.pcbi.1000353 doi: 10.1371/journal.pcbi.1000353 |
[13] | X. Z. Zhou, J. Menche, A. L. Barabási, A. Sharma, Human symptoms–disease network, Nat. Commun., 5 (2014), 4212. https://doi.org/10.1038/ncomms5212 doi: 10.1038/ncomms5212 |
[14] | C. Zhao, J. Jiang, Z. Xu, Y. Guan, A study of EMR-based medical knowledge network and its applications, Comput. Methods Programs Biomed., 143 (2017), 13–23. https://doi.org/10.1016/j.cmpb.2017.02.016 doi: 10.1016/j.cmpb.2017.02.016 |
[15] | R. Alizadehsani, J. Habibi, M. J. Hosseini, H. Mashayekhi, R. Boghrati, A. Ghandeharioun, et al., A data mining approach for diagnosis of coronary artery disease, Comput. Methods Programs Biomed., 111 (2013), 52–61. https://doi.org/10.1016/j.cmpb.2013.03.004 doi: 10.1016/j.cmpb.2013.03.004 |
[16] | H. H. Rau, C. Y. Hsu, Y. A. Lin, S. Atique, A. Fuad, L. M. Wei, et al., Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network, Comput. Methods Programs Biomed., 125 (2016), 58–65. https://doi.org/10.1016/j.cmpb.2015.11.009 doi: 10.1016/j.cmpb.2015.11.009 |
[17] | E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, J. Sun, Gram: Graph-based attention model for healthcare representation learning, preprint, arXiv: 1611.07012. |
[18] | E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, W. F. Stewart, Retain: An interpretable predictive model for healthcare using reverse time attention mechanism, in Proceedings of the 30th International Conference on Neural Information Processing Systems, (2016), 3512–3520. Available from: https://dl.acm.org/doi/10.5555/3157382.3157490. |
[19] | Z. C. Lipton, D. C. Kale, C. Elkan, R. Wetzell, Learning to diagnose with LSTM recurrent neural networks, preprint, arXiv: 1511.03677. |
[20] | F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, J. Gao, Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2017), 1903–1911. https://doi.org/10.1145/3097983.3098088 |
[21] | E. Choi, C. Xiao, W. F. Stewart, J. Sun, Mime: Multilevel medical embedding of electronic health records for predictive healthcare, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, (2018), 4547–4557. Available from: https://dl.acm.org/doi/abs/10.5555/3327345.3327366. |
[22] | J. Jiang, X. Li, C. Zhao, Y. Guan, Q. Yu, Learning and inference in knowledge-based probabilistic model for medical diagnosis, Knowledge-Based Syst., 138 (2017), 58–68. https://doi.org/10.1016/j.knosys.2017.09.030 doi: 10.1016/j.knosys.2017.09.030 |
[23] | D. E. Heckerman, E. J. Horvitz, B. N. Nathwani, Toward normative expert systems: Part I the pathfinder project, Methods Inf. Med., 31 (1991), 90–105. https://doi.org/10.1055/s-0038-1634867 doi: 10.1055/s-0038-1634867 |
[24] | J. G. Klann, P. Szolovits, S. M. Downs, G. Schadow, Decision support from local data: creating adaptive order menus from past clinician behavior, J. Biomed. Inf., 48 (2014), 84–93. https://doi.org/10.1016/j.jbi.2013.12.005 doi: 10.1016/j.jbi.2013.12.005 |
[25] | M. J. Flores, A. E. Nicholson, A. Brunskill, K. B. Korb, S. Mascaro, Incorporating expert knowledge when learning bayesian network structure: a medical case study, Artif. Intell. Med., 53 (2011), 181–204. https://doi.org/10.1016/j.artmed.2011.08.004 doi: 10.1016/j.artmed.2011.08.004 |
[26] | D. M. Chickering, D. Heckerman, C. Meek, Large-sample learning of bayesian networks is np-hard, J. Mach. Learn. Res., 5 (2004), 1287–1330. |
[27] | T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, preprint, arXiv: 1609.02907. |
[28] | C. Y. Wee, C. Liu, A. Lee, J. S. Poh, H. Ji, A. Qiu, et al., Cortical graph neural network for ad and mci diagnosis and transfer learning across populations, NeuroImage: Clin., 23 (2019), 101929. https://doi.org/10.1016/j.nicl.2019.101929 doi: 10.1016/j.nicl.2019.101929 |
[29] | R. C. Petersen, P. Aisen, L. A. Beckett, M. Donohue, A. Gamst, D. J. Harvey, et al., Alzheimer's disease neuroimaging initiative (adni): clinical characterization, Neurology, 74 (2010), 201–209. https://doi.org/10.1212/WNL.0b013e3181cb3e25 doi: 10.1212/WNL.0b013e3181cb3e25 |
[30] | D. Ahmedt-Aristizabal, M. A. Armin, S. Denman, C. Fookes, L. Perersson, Graph-based deep learning for medical diagnosis and analysis: past, present and future, Sensors, 21 (2021), 4758. https://doi.org/10.3390/s21144758 doi: 10.3390/s21144758 |
[31] | M. Bastian, S. Heymann, M. Jacomy, Gephi: An open source software for exploring and manipulating networks, in Proceedings of the International AAAI Conference on Web and Social Media, 3 (2009), 361–362. Available from: https: //ojs.aaai.org/index.php/ICWSM/article/view/13937. |
[32] | S. R. Broadbentand J. M. Hammersley, Percolation processes: I. Crystals and mazes, Math. Proc. Cambridge Philos. Soc., 53 (1957), 629–641. https://doi.org/10.1017/S0305004100032680 doi: 10.1017/S0305004100032680 |
[33] | J. M. Hammersley, Percolation processes: II. The connective constant, Math. Proc. Cambridge Philos. Soc., 53 (1957), 642–645. https://doi.org/10.1017/S0305004100032692 doi: 10.1017/S0305004100032692 |
[34] | G. Grimmett, Percolation, Springer, New York, 1989. https://doi.org/10.1007/978-1-4757-4208-4 |
[35] | E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, J. Sun, Doctor AI: predicting clinical events via recurrent neural networks, in Proceedings of the 1st machine learning for healthcare conference, 56 (2016), 301–318. Available from: http://proceedings.mlr.press/v56/Choi16.pdf. |
[36] | 2010 i2b2/va challenge evaluation assertion annotation guidelines. Available from: https://www.i2b2.org/NLP/Relations/assets/Assertion%20Annotation%20Guideline.pdf. |
[37] | 2010 i2b2/va challenge evaluation concept annotation guidelines. Available from: https://www.i2b2.org/NLP/Relations/assets/Concept%20Annotation%20Guideline.pdf. |
[38] | J. Yang, Y. Guan, B. He, C. Qu, Q. Yu, Y. Liu, et al., Annotation scheme and corpus construction for named entities and entity relations on Chinese electronic medical records, J. Software, 27 (2016), 2725–2746. https://doi.org/10.13328/j.cnki.jos.004880 doi: 10.13328/j.cnki.jos.004880 |
[39] | B. He, B. Dong, Y. Guan, J. Yang, Z. Jiang, Q. Yu, et al., Building a comprehensive syntactic and semantic corpus of Chinese clinical texts, J. Biomed. Inf., 69 (2017), 203–217. https://doi.org/10.1016/j.jbi.2017.04.006 doi: 10.1016/j.jbi.2017.04.006 |
[40] | E. Choi, A. Schuetz, W. F. Stewart, J. Sun, Using recurrent neural network models for early detection of heart failure onset, J. Am. Med. Inf. Assoc., 24 (2017), 361–370. https://doi.org/10.1093/jamia/ocw112 doi: 10.1093/jamia/ocw112 |
[41] | P. Nguyen, T. Tran, N. Wickramasinghe, S. Venkatesh, Deepr: a convolutional net for medical records, IEEE J. Biomed. Health Inf., 21 (2017), 22–30. https://doi.org/10.1109/JBHI.2016.2633963 doi: 10.1109/JBHI.2016.2633963 |
[42] | C. Zhao, J. Jiang, Y. Guan, X. Guo, B. He, EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning, Artif. Intell. Med., 87 (2018), 49–59. https://doi.org/10.1016/j.artmed.2018.03.005 doi: 10.1016/j.artmed.2018.03.005 |
[43] | R. Miotto, L. Li, B. A. Kidd, J. T. Dudley, Deep Patient: An unsupervised representation to predict the future of patients from the electronic health records, Sci. Rep., 6 (2016), 26094. https://doi.org/10.1038/srep26094 doi: 10.1038/srep26094 |
[44] | C. Buckley, E. M. Voorhees, Retrieval evaluation with incomplete information, in Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (2004), 25–32. https://doi.org/10.1145/1008992.1009000 |
[45] | C. Buckley, E. M. Voorhees, Evaluating evaluation measure stability, ACM SIGIR Forum, 51 (2017), 235–242. https://doi.org/10.1145/3130348.3130373 doi: 10.1145/3130348.3130373 |
[46] | M. D. Smucker, J. Allan, B. Carterette, A comparison of statistical significance tests for information retrieval evaluation, in Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, (2007), 623–632. https://doi.org/10.1145/1321440.1321528 |