With the deep integration of "AI + medicine", AI-assisted technology has been of great help to human beings in the medical field, especially in the area of predicting and diagnosing diseases based on big data, because it is faster and more accurate. However, concerns about data security seriously hinder data sharing among medical institutions. To fully exploit the value of medical data and realize data collaborative sharing, we developed a medical data security sharing scheme based on the C/S communication mode and constructed a federated learning architecture that uses homomorphic encryption technology to protect training parameters. Here, we chose the Paillier algorithm to realize the additive homomorphism to protect the training parameters. Clients do not need to share local data, but only upload the trained model parameters to the server. In the process of training, a distributed parameter update mechanism is introduced. The server is mainly responsible for issuing training commands and weights, aggregating the local model parameters from the clients and predicting the joint diagnostic results. The client mainly uses the stochastic gradient descent algorithm for gradient trimming, updating and transmitting the trained model parameters back to the server. In order to test the performance of this scheme, a series of experiments was conducted. From the simulation results, we can know that the model prediction accuracy is related to the global training rounds, learning rate, batch size, privacy budget parameters etc. The results show that this scheme realizes data sharing while protecting data privacy, completes the accurate prediction of diseases and has a good performance.
Citation: Lihong Guo, Weilei Gao, Ye Cao, Xu Lai. Research on medical data security sharing scheme based on homomorphic encryption[J]. Mathematical Biosciences and Engineering, 2023, 20(2): 2261-2279. doi: 10.3934/mbe.2023106
With the deep integration of "AI + medicine", AI-assisted technology has been of great help to human beings in the medical field, especially in the area of predicting and diagnosing diseases based on big data, because it is faster and more accurate. However, concerns about data security seriously hinder data sharing among medical institutions. To fully exploit the value of medical data and realize data collaborative sharing, we developed a medical data security sharing scheme based on the C/S communication mode and constructed a federated learning architecture that uses homomorphic encryption technology to protect training parameters. Here, we chose the Paillier algorithm to realize the additive homomorphism to protect the training parameters. Clients do not need to share local data, but only upload the trained model parameters to the server. In the process of training, a distributed parameter update mechanism is introduced. The server is mainly responsible for issuing training commands and weights, aggregating the local model parameters from the clients and predicting the joint diagnostic results. The client mainly uses the stochastic gradient descent algorithm for gradient trimming, updating and transmitting the trained model parameters back to the server. In order to test the performance of this scheme, a series of experiments was conducted. From the simulation results, we can know that the model prediction accuracy is related to the global training rounds, learning rate, batch size, privacy budget parameters etc. The results show that this scheme realizes data sharing while protecting data privacy, completes the accurate prediction of diseases and has a good performance.
[1] | J. Scheibner, J. L. Raisaro, J. R. Troncoso-Pastoriza, M. Ienca, J. Fellay, E. Vayena, et al., Revolutionizing medical data sharing using advanced privacy-enhancing technologies: technical, legal, and ethical synthesis, J. Med. Internet Res., 23 (2021), e25120. https://doi.org/10.2196/25120 doi: 10.2196/25120 |
[2] | S. R. Oh, Y. D. Seo, E. Lee, Y. G. Kim, A comprehensive survey on security and privacy for electronic health data, Int. J. Environ. Res. Public Health, 18 (2021), 9668. https://doi.org/10.3390/ijerph18189668 doi: 10.3390/ijerph18189668 |
[3] | C. Thapa, S. Camtepe, Precision health data: Requirements, challenges and existing techniques for data security and privacy, Comput. Biol. Med., 12 (2021), 104130. https://doi.org/10.1016/j.compbiomed.2020.104130 doi: 10.1016/j.compbiomed.2020.104130 |
[4] | D. Froelicher, J. R. Troncoso-Pastoriza, J. L. Raisaro, M. A. Cuendet, J. S. Sousa, H. Cho, et al., Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption, Nat. Commun., 12 (2021), 5910. https://doi.org/10.1038/s41467-021-25972-y doi: 10.1038/s41467-021-25972-y |
[5] | N. Peng, H. Wang, Federated Learning Technology and Practice, Electronic Industry Press, Beijing, 2021. |
[6] | E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, V. Shmatikov, How to backdoor federated learning, preprint, arXiv: 1807.00459. https://doi.org/10.48550/arXiv.1807.00459 |
[7] | A. N. Bhagoji, S. Chakraborty, P. Mittal, S. Calo, Analyzing federated learning through an adversarial lens, in Proceedings of the 36th International Conference on Machine Learning, (2019), 634–643. |
[8] | L. Chen, H. Wang, Z. Charles, D. Papailiopoulos, DRACO: byzantine-resilient distributed training via redundant gradients, preprint, arXiv: 1803.09877. https://doi.org/10.48550/arXiv.1803.09877 |
[9] | C. Fung, C. J. M. Yoon, I. Beschastnikh, Mitigating sybils in federated learning poisoning, preprint, arXiv: 1808.04866. https://doi.org/10.48550/arXiv.1808.04866 |
[10] | L. T. Phong, Y. Aono, T. Hayashi, L. Wang, S. Moriai, Privacy-preserving deep learning via additively homomorphic encryption, IEEE Trans. Inf. Forensics Secur., 13 (2017), 1333–1345. https://doi.org/10.1109/TIFS.2017.2787987 doi: 10.1109/TIFS.2017.2787987 |
[11] | B. Qiu, H. Xiao, A. Chronopoulos, D. Zhou, S. Ouyang, Optimal access scheme for security provisioning of C-V2X computation offloading network with imperfect CSI, IEEE Access, 8 (2020), 9680–9691. https://doi.org/10.1109/ACCESS.2020.2964795 doi: 10.1109/ACCESS.2020.2964795 |
[12] | J. Yu, R. Hao, Comments on SEPDP: secure and efficient privacy preserving provable data possession in cloud storage, IEEE Trans. Serv. Comput., 14 (2021), 2090–2092. https://doi.org/10.1109/TSC.2019.2912379 doi: 10.1109/TSC.2019.2912379 |
[13] | C. Lin, D. He, X. Huang, X. Xie, K. R. Choo, PPChain: A privacy-preserving permissioned blockchain architecture for cryptocurrency and other regulated applications, IEEE Syst. J., 15 (2021), 4367–4378. https://doi.org/10.1109/JSYST.2020.3019923 doi: 10.1109/JSYST.2020.3019923 |
[14] | Q. Yang, A. Huang, Y. Liu, T. Chen, Practicing Federated Learning, Electronic Industry Press, Beijing, (2021), 26–30. |
[15] | A. Hard, K. Rao, R. Mathews, F. Beaufays, D. Ramage, Federated learning for mobile keyboard prediction, preprint, arXiv: 1811.03604. https://doi.org/10.48550/arXiv.1811.03604 |
[16] | D. Shi, L. Li, R. Chen, P. Prakash, M. Pan, Y. Fang, Towards energy efficient federated learning over 5G+ mobile devices, preprint, arXiv: 2101.04866. https://doi.org/10.48550/arXiv.2101.04866 |
[17] | D. Jiang, Y. Tong, Y. Song, X. Wu, W. Zhao, J. Peng, et al., Industrial federated topic modeling, ACM Trans. Intell. Syst. Technol., 12 (2021), 1–22. https://doi.org/10.1145/3418283 doi: 10.1145/3418283 |
[18] | Y. Luo, H. Zhou, W. Tu, Y. Chen, W. Dai, Q. Yang, Network on network for tabular data classification in real-world applications, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2020), 2317–2326. https://doi.org/10.1145/3397271.3401437 |
[19] | A. AitMlouk, S. Alawadi, S. Toor, A. Hellander, FedQAS: privacy-aware machine reading comprehension with federated learning, Appl. Sci., 12 (2022), 3130. https://doi.org/10.3390/app12063130 doi: 10.3390/app12063130 |
[20] | W. Zhang, X. Li, Federated transfer learning for intelligent fault diagnostics using deep adversarial networks with data privacy, IEEE/ASME Trans. Mechatron., 27 (2022), 430–439. https://doi.org/10.1109/TMECH.2021.3065522 doi: 10.1109/TMECH.2021.3065522 |
[21] | R. L. Rivest, L. Adleman, M. L. Dertouzos, On data banks and privacy homeomorphisms, Found. Secure Comput., (1978), 169–179. |
[22] | L. M. Surhone, M. T. Timpledon, S. F. Marseken, Paillier Cryptosystem, Betascript Publishing, 2010. |
[23] | A. Alsirhani, M. Ezz, A. M. Mostafa, Advanced authentication mechanisms for identity and access management in cloud computing, Comput. Syst. Sci. Eng., 43 (2022), 967–984. https://doi.org/10.32604/csse.2022.024854 doi: 10.32604/csse.2022.024854 |
[24] | M. Ragab, H. A. Abdushkour, A. F. Nahhas, W. H. Aljedaibi, Deer hunting optimization with deep learning model for lung cancer classification, CMC-Comput. Mater. Continua, 73 (2022), 533–546. https://doi.org/10.32604/cmc.2022.028856 doi: 10.32604/cmc.2022.028856 |
[25] | X. Zhang, W. Zhang, W. Sun, X. Sun. S. K. Jha, A robust 3-D medical watermarking based on wavelet transform for data protection, Comput. Syst. Sci. Eng., 41 (2022), 1043–1056. https://doi.org/10.32604/csse.2022.022305 doi: 10.32604/csse.2022.022305 |
[26] | Y. Y. Ghadi, I. Akhter, S. A. Alsuhibany, T. A. Shloul, A. Jalal, K. Kim, Multiple events detection using context-intelligence features, Intell. Autom. Soft Comput., 34 (2022), 1455–1471. https://doi.org/10.32604/iasc.2022.025013 doi: 10.32604/iasc.2022.025013 |
[27] | UCI, Breast cancer wisconsin (original) data set, 2021. Available from: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29. |
[28] | C. Zhao, S. Zhao, M. Zhao, Z. Chen, C. Gao, H. Li, et al., Secure multi-party computation: theory, practice and applications, Inf. Sci., 476 (2019), 357–372. https://doi.org/10.1016/j.ins.2018.10.024 doi: 10.1016/j.ins.2018.10.024 |
[29] | C. Luo, X. Chen, C. Ma, S. Zhang, Improved federated average algorithm based on tomographic analysis, Comput. Sci., 48 (2021), 32–40. https://doi.org/10.11896/jsjkx.201000093 doi: 10.11896/jsjkx.201000093 |
[30] | X. Li, K. Huang, W. Yang, S. Wang, Z. Zhang, On the convergence of FedAvg on Non-ⅡD data, preprint, arXiv: 1907.02189. https://doi.org/10.48550/arXiv.1907.02189 |