User data usually exists in the organization or own local equipment in the form of data island. It is difficult to collect these data to train better machine learning models because of the General Data Protection Regulation (GDPR) and other laws. The emergence of federated learning enables users to jointly train machine learning models without exposing the original data. Due to the fast training speed and high accuracy of random forest, it has been applied to federated learning among several data institutions. However, for human activity recognition task scenarios, the unified model cannot provide users with personalized services. In this paper, we propose a privacy-protected federated personalized random forest framework, which considers to solve the personalized application of federated random forest in the activity recognition task. According to the characteristics of the activity recognition data, the locality sensitive hashing is used to calculate the similarity of users. Users only train with similar users instead of all users and the model is incrementally selected using the characteristics of ensemble learning, so as to train the model in a personalized way. At the same time, user privacy is protected through differential privacy during the training stage. We conduct experiments on commonly used human activity recognition datasets to analyze the effectiveness of our model.
Citation: Songfeng Liu, Jinyan Wang, Wenliang Zhang. Federated personalized random forest for human activity recognition[J]. Mathematical Biosciences and Engineering, 2022, 19(1): 953-971. doi: 10.3934/mbe.2022044
User data usually exists in the organization or own local equipment in the form of data island. It is difficult to collect these data to train better machine learning models because of the General Data Protection Regulation (GDPR) and other laws. The emergence of federated learning enables users to jointly train machine learning models without exposing the original data. Due to the fast training speed and high accuracy of random forest, it has been applied to federated learning among several data institutions. However, for human activity recognition task scenarios, the unified model cannot provide users with personalized services. In this paper, we propose a privacy-protected federated personalized random forest framework, which considers to solve the personalized application of federated random forest in the activity recognition task. According to the characteristics of the activity recognition data, the locality sensitive hashing is used to calculate the similarity of users. Users only train with similar users instead of all users and the model is incrementally selected using the characteristics of ensemble learning, so as to train the model in a personalized way. At the same time, user privacy is protected through differential privacy during the training stage. We conduct experiments on commonly used human activity recognition datasets to analyze the effectiveness of our model.
[1] | O. D. Lara, M. A. Labrador, A survey on human activity recognition using wearable sensors, IEEE Commun. Surv. Tutorials, 15 (2013), 1192–1209. doi: 10.1109/SURV.2012.110112.00192. doi: 10.1109/SURV.2012.110112.00192 |
[2] | J. P. Queralta, T. N. Gia, H. Tenhunen, T. Westerlund, Edge-AI in LoRa-based health monitoring: fall detection system with fog computing and LSTM cecurrent neural networks, in Proceedings of the 42nd International Conference on Telecommunications, Signal Processing (TSP), Budapest, (2019), 601–604. doi: 10.1109/TSP.2019.8768883. |
[3] | E. Shirin, Z. Ahmed, M. Andreas, S. Severin, A. S. Thomas, E. Tarek, et al., Health management and pattern analysis of daily living activities of people with dementia using in-home sensors and machine learning techniques, PLoS ONE, 13 (2018), e0195605. doi: 10.1371/journal.pone.0195605. doi: 10.1371/journal.pone.0195605 |
[4] | J. A. Ward, G. Pirkl, P. Hevesi, P. Lukowicz, Towards recognising collaborative activities using multiple on-body sensors, in Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp Adjunct), (2016), 221–224. doi: 10.1145/2968219.2971429. |
[5] | J. Wang, Y. Chen, S. Hao, X. Peng, L. Hu, Deep learning for sensor-based activity recognition: a survey, Pattern Recognit. Lett., 119 (2019), 3–11. doi: 10.1016/j.patrec.2018.02.010. doi: 10.1016/j.patrec.2018.02.010 |
[6] | P. Voigt, A. Bussche, The eu general data protection regulation (gdpr), A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10 (2017), 3152676. |
[7] | B. Mcmahan, E. Moore, D. Ramage, S. Hampson, B. A. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54 (2017), 1273–1282. |
[8] | Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, V. Chandra, Federated learning with Non-IID data, preprint, arXiv: 1806.00582. |
[9] | Y. Liu, Y. Liu, Z. Liu, J. Zhang, C. Meng, Y. Zheng, Federated forest, IEEE Trans. Big Data, 2020. doi: 10.1109/TBDATA.2020.2992755. doi: 10.1109/TBDATA.2020.2992755 |
[10] | K. Cheng, T. Fan, Y. Jin, Y. Liu, T. Chen, Q. Yang, Secureboost: A lossless federated learning framework, IEEE Intell. Syst., 2021. doi: 10.1109/MIS.2021.3082561. doi: 10.1109/MIS.2021.3082561 |
[11] | Q. Li, Z. Wen, B. He, Practical federated gradient boosting decision trees, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 4642–4649. doi: 10.1609/aaai.v34i04.5895 |
[12] | Y. Liu, M. Chen, W. Zhang, J. Zhang, Y. Zheng, Federated extra-trees with privacy preserving, preprint, arXiv: 2002.07323. |
[13] | L. T. Phong, Y. Aono, T. Hayashi, L. Wang, S. Moriai, Privacy-preserving deep learning via additively homomorphic encryption, IEEE Trans. Inf. Forensics Secur., 13 (2017), 1333–1345. doi: 10.1109/TIFS.2017.2787987. doi: 10.1109/TIFS.2017.2787987 |
[14] | C. Dwork, Differential privacy, in Proceedinds of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), 4052 (2006), 1–12. doi: 10.1007/11787006_1. |
[15] | C. Gentry, A Fully Homomorphic Encryption Scheme, Ph.D thesis, Stanford University, 2009. |
[16] | F. Mo, H. Haddadi, K. Katevas, E. Marin, D. Perino, N. Kourtellis, PPFL: privacy-preserving federated learning with trusted execution environments, in Proceedings of the 19th Annual International Conference on Mobile Systems, Applications and Services (Mobisys), (2021), 94–108. doi: 10.1145/3458864.3466628. |
[17] | D. Polap, G. Srivastava, K. Yu, Agent architecture of an intelligent medical system based on federated learning and blockchain technology, J. Inf. Secur. Appl., 58 (2021), 102748. doi: 10.1016/j.jisa.2021.102748. doi: 10.1016/j.jisa.2021.102748 |
[18] | K. Sozinov, V. Vlassov, S. Girdzijauskas, Human activity recognition using federated learning, in 2018 IEEE International Conference on Big Data and Cloud Computing (BDCloud), (2018), 1103–1111. doi: 10.1109/BDCloud.2018.00164. |
[19] | A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in Proceedings of 25th International Conference on Very Large Data Bases, (1999), 518–529. |
[20] | K. Wang, R. Mathews, C. Kiddon, H. Eichner, F. Beaufays, D. Ramage, Federated evaluation of on-device personalization, preprint, arXiv: 1910.10252. |
[21] | Y. Chen, X. Qin, J. Wang, C. Yu, W. Gao, Fedhealth: A federated transfer learning framework for wearable healthcare, IEEE Intell. Syst., 35 (2020), 83–93. doi: 10.1109/MIS.2020.2988604. doi: 10.1109/MIS.2020.2988604 |
[22] | R. Caruana, Multitask learning, Mach. learn., 28 (1997), 41–75. doi: 10.1023/A:1007379606734. doi: 10.1023/A:1007379606734 |
[23] | V. Smith, C. Chiang, M. Sanjabi, A. Talwalkar, Federated multi-task learning, in Annual Conference on Neural Information Processing Systems (NIPS), (2017), 4424–4434. |
[24] | T. Yu, E. Bagdasaryan, V. Shmatikov, Salvaging federated learning by local adaptation, preprint, arXiv: 2002.04758. |
[25] | B. Wang, S. Yu, W. Lou, Y. T. Hou, Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud, in Proceedings of the 2014 IEEE Conference on Computer Communications (INFOCOM), (2014), 2112–2120. doi: 10.1109/INFOCOM.2014.6848153. |
[26] | L. Qi, X. Zhang, W. Dou, Q. Ni, A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data, IEEE J. Sel. Areas Commun., 35 (2017), 2616–2624. doi: 10.1109/JSAC.2017.2760458. doi: 10.1109/JSAC.2017.2760458 |
[27] | M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the 20th Annual Symposium on Computational Geometry, 34 (2004), 253–262. doi: 10.1145/997817.997857. |
[28] | C. Dwork, F. Mcsherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis, in Proceedings of the Third conference on Theory of Cryptography, 3876 (2006), 265–284. doi: 10.1007/11681878_14. |
[29] | F. Mcsherry, K. Talwar, Mechanism design via differential privacy, in 48th Annual IEEE Symposium on Foundations of Computer Science, (2007), 94–103. doi: 10.1109/FOCS.2007.41. |
[30] | P. Xiong, T. Q. Zhu, X. F. Wang, A survey on differential privacy and applications, Chin. J. Comput., 37 (2014), 101–122. doi: 10.3724/SP.J.1016.2014.00101. doi: 10.3724/SP.J.1016.2014.00101 |
[31] | F. Mcsherry, Privacy integrated queries: an extensible platform for privacy-preserving data analysis, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, (2009), 19–30. doi: 10.1145/1559845.1559850. |
[32] | P. Patarasuk, X. Yuan, Bandwidth optimal all-reduce algorithms for clusters of workstations, J. Parallel Distrib. Comput., 69 (2009), 117–124. doi: 10.1016/j.jpdc.2008.09.002. doi: 10.1016/j.jpdc.2008.09.002 |
[33] | L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and regression trees, The Wadsworth and Brooks-Cole statistics-probability series, 1984. |
[34] | Z. H. Zhou, J. Wu, W. Tang, Ensembling neural networks: many could be better than all, Artif. Intell., 137 (2002), 239–263. doi: 10.1016/S0004-3702(02)00190-X. doi: 10.1016/S0004-3702(02)00190-X |
[35] | H. Chen, P. Tiňo, X. Yao, Predictive ensemble pruning by expectation propagation, IEEE Trans. Knowl. Data Eng., 21 (2009), 999–1013. doi: 10.1109/TKDE.2009.62. doi: 10.1109/TKDE.2009.62 |
[36] | A. Friedman, A. Schuster, Data mining with differential privacy, in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, (2010), 493–502. doi: 10.1145/1835804.1835868. |
[37] | S. Fletcher, M. Z. Islam, Decision tree classification with differential privacy: a survey, ACM Comput. Surv. (CSUR), 52 (2019), 1–33. doi: 10.1145/3337064. doi: 10.1145/3337064 |
[38] | A. Patil, S. Singh, Differential private random forest, in 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), (2014), 2623–2630. doi: 10.1109/ICACCI.2014.6968348. |
[39] | D. Anguita, A. Ghio, L. Oneto, X. Parra, J. L. Reyes-Ortiz, Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine, Int. workshop ambient assisted living, 7657 (2012), 216–223. doi: 10.1007/978-3-642-35395-6_30. doi: 10.1007/978-3-642-35395-6_30 |
[40] | R. K. Jennifer, M. W. Gary, M. Samuel, Activity recognition using cell phone accelerometers, ACM SigKDD Explor. Newsl., 12 (2010), 74–82. doi: 10.1145/1964897.1964918. doi: 10.1145/1964897.1964918 |