Multiple organizations would benefit from collaborative learning models trained over aggregated datasets from various human activity recognition applications without privacy leakages. Two of the prevailing privacy-preserving protocols, secure multi-party computation and differential privacy, however, are still confronted with serious privacy leakages: lack of provision for privacy guarantee about individual data and insufficient protection against inference attacks on the resultant models. To mitigate the aforementioned shortfalls, we propose privacy-preserving architecture to explore the potential of secure multi-party computation and differential privacy. We utilize the inherent prospects of output perturbation and gradient perturbation in our differential privacy method, and progress with an innovation for both techniques in the distributed learning domain. Data owners collaboratively aggregate the locally trained models inside a secure multi-party computation domain in the output perturbation algorithm, and later inject appreciable statistical noise before exposing the classifier. We inject noise during every iterative update to collaboratively train a global model in our gradient perturbation algorithm. The utility guarantee of our gradient perturbation method is determined by an expected curvature relative to the minimum curvature. With the application of expected curvature, we theoretically justify the advantage of gradient perturbation in our proposed algorithm, therefore closing existing gap between practice and theory. Validation of our algorithm on real-world human recognition activity datasets establishes that our protocol incurs minimal computational overhead, provides substantial utility gains for typical security and privacy guarantees.
Citation: Kwabena Owusu-Agyemang, Zhen Qin, Appiah Benjamin, Hu Xiong, Zhiguang Qin. Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization[J]. Mathematical Biosciences and Engineering, 2021, 18(4): 3006-3033. doi: 10.3934/mbe.2021151
Multiple organizations would benefit from collaborative learning models trained over aggregated datasets from various human activity recognition applications without privacy leakages. Two of the prevailing privacy-preserving protocols, secure multi-party computation and differential privacy, however, are still confronted with serious privacy leakages: lack of provision for privacy guarantee about individual data and insufficient protection against inference attacks on the resultant models. To mitigate the aforementioned shortfalls, we propose privacy-preserving architecture to explore the potential of secure multi-party computation and differential privacy. We utilize the inherent prospects of output perturbation and gradient perturbation in our differential privacy method, and progress with an innovation for both techniques in the distributed learning domain. Data owners collaboratively aggregate the locally trained models inside a secure multi-party computation domain in the output perturbation algorithm, and later inject appreciable statistical noise before exposing the classifier. We inject noise during every iterative update to collaboratively train a global model in our gradient perturbation algorithm. The utility guarantee of our gradient perturbation method is determined by an expected curvature relative to the minimum curvature. With the application of expected curvature, we theoretically justify the advantage of gradient perturbation in our proposed algorithm, therefore closing existing gap between practice and theory. Validation of our algorithm on real-world human recognition activity datasets establishes that our protocol incurs minimal computational overhead, provides substantial utility gains for typical security and privacy guarantees.
[1] | Y. Chen, Y. Mao, H. Liang, S. Yu, Y. Wei, S. Leng, Data poison detection schemes for distributed machine learning, IEEE Access, 8 (2019), 7442–7454. |
[2] | C. Dwork, G. N. Rothblum, S. P. Vadhan, Boosting and differential privacy, in 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, IEEE, (2010), 51–60. |
[3] | L. Wang, Q. Gu, Differentially private iterative gradient hard thresholding for sparse learning, in 28th International Joint Conference on Artificial Intelligence, 2019. |
[4] | B. Jayaraman, L. Wang, D. Evans, Q. Gu, Distributed learning without distress: Privacy-preserving empirical risk minimization, Adv. Neural Inf. Process. Syst., 2018. |
[5] | Y. Xu, G. Yang, S. Bai, Laplace input and output perturbation for differentially private principal components analysis, Secur. Commun. Networks, 2019 (2019). |
[6] | D. Yu, H. Zhang, W. Chen, J. Yin, T. Liu, Gradient perturbation is underrated for differentially private convex optimization, preprint, arXiv: 1911.11363. |
[7] | A. G. Thakurta, Differentially Private Convex Optimization For Empirical Risk Minimization And High-dimensional Regression, Ph.D thesis, The Pennsylvania State University, 2012. |
[8] | X. Ma, C. Ji, X. Zhang, J. Wang, J. Li, K. Li, Secure multiparty learning from the aggregation of locally trained models, J. Network Comput. Appl., 167 (2020), 102754. doi: 10.1016/j.jnca.2020.102754 |
[9] | O. Kwabena, Z. Qin, T. Zhuang, Z. Qin, Mscryptonet: Multi-scheme privacy-preserving deep learning in cloud computing, IEEE Access, 7 (2019), 29344–29354. |
[10] | A. Gascón, P. Schoppmann, B. Balle, M. Raykova, J. Doerner, S. Zahur, et al., Privacy-preserving distributed linear regression on high-dimensional data, PoPETs, 4 (2017), 345–364. |
[11] | X. Wang, S. Ranellucci, J. Katz, Global-scale secure multiparty computation, in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, (2017), 39–56. |
[12] | C. Dwork, A. Roth, The algorithmic foundations of differential privacy, Found. Trends Theor. Comput. Sci., 9 (2014), 211–407. |
[13] | S. Mahloujifar, M. Mahmoody, A. Mohammed, Data poisoning attacks in multi-party learning, in International Conference on Machine Learning, PMLR, (2019), 4274-4283. |
[14] | E. Kim, H. Lee, J. Park, Towards round-optimal secure multiparty computations: Multikey FHE without a CRS, Int. J. Found. Comput. Sci., 31 (2020), 157–174. doi: 10.1142/S012905412050001X |
[15] | D. Croce, F. Giuliano, I. Tinnirello, L. Giarré, Privacy-preserving overgrid: Secure data collection for the smart grid, Sensors, 20 (2020), 2249. |
[16] | J. Liu, Y. Tian, Y. Zhou, Y. Xiao, N. Ansari, Privacy preserving distributed data mining based on secure multi-party computation, Comput. Commun., 153 (2020), 208–216. doi: 10.1016/j.comcom.2020.02.014 |
[17] | O. Catrina, C. Dragulin, Multiparty computation of fixed-point multiplication and reciprocal, in 2009 20th International Workshop on Database and Expert Systems Application, (2009), 107–111. |
[18] | O. Catrina, A. Saxena, Secure computation with fixed-point numbers, in International Conference on Financial Cryptography and Data Security, Springer, Berlin, Heidelberg, (2010), 35–50. |
[19] | A. C. Yao, Protocols for secure computations, in 23rd annual symposium on foundations of computer science (sfcs 1982), (1982), 160–164. |
[20] | S. Goldwasser, Multi party computations: past and present, in Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing, (1997), 1–6. |
[21] | C. Gentry, Fully homomorphic encryption using ideal lattices, in Proceedings of the forty-first annual ACM symposium on Theory of computing, (2009), 169–178. |
[22] | I. Damgård, C. Orlandi, Multiparty computation for dishonest majority: From passive to active security at low cost, in Annual cryptology conference, Springer, Berlin, Heidelberg, (2010), 558–576. |
[23] | R. Bendlin, I. Damgård, C. Orlandi, S. Zakarias, Semi-homomorphic encryption and multiparty computation, in Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, Berlin, Heidelberg, (2011), 169–188. |
[24] | J. B. Nielsen, P. S. Nordholt, C. Orlandi, S. S. Burra, A new approach to practical active-secure two-party computation, in Annual Cryptology Conference, Springer, Berlin, Heidelberg, (2012), 681–700. |
[25] | A. Bansal, T. Chen, S. Zhong, Privacy preserving back-propagation neural network learning over arbitrarily partitioned data, Neural Comput. Appl., 20 (2011), 143–150. doi: 10.1007/s00521-010-0346-z |
[26] | J. Yuan, S. Yu, Privacy preserving back-propagation neural network learning made practical with cloud computing, IEEE Trans. Parallel Distrib. Syst., 25 (2014), 212–221. |
[27] | W. Zhang, A BGN-type multiuser homomorphic encryption scheme, in 2015 International Conference on Intelligent Networking and Collaborative Systems, IEEE, (2015), 268–271. |
[28] | E. Hesamifard, H. Takabi, M. Ghasemi, C. Jones, Privacy-preserving machine learning in cloud, in Proceedings of the 2017 on cloud computing security workshop, (2017), 39–43. |
[29] | P. Li, J. Li, Z. Huang, T. Li, C. Gao, S. Yiu, et al., Multi-key privacy-preserving deep learning in cloud computing, Future Gener. Comput. Syst., 74 (2017), 76–85. doi: 10.1016/j.future.2017.02.006 |
[30] | P. Mukherjee, D. Wichs, Two round multiparty computation via multi-key FHE, in M. Fischlin and J. Coron, editors, Advances in Cryptology - EUROCRYPT 2016 - 35th Annual International Conference Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, Berlin, Heidelberg, (2016), 735–763. |
[31] | R. Agrawal, R. Srikant, Privacy-preserving data mining, in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, (2000), 439–450. |
[32] | P. K. Fong, J. H. Weber-Jahnke, Privacy preserving decision tree learning using unrealized data sets, IEEE Trans. Knowl. Data Eng., 24 (2012), 353–364. |
[33] | Y. Wang, D. Kifer, J. Lee, Differentially private confidence intervals for empirical risk minimization, J. Priv. Confidentiality, 9 (2019). |
[34] | M. Bun, T. Steinke, Concentrated differential privacy: Simplifications, extensions, and lower bounds, in Theory of Cryptography Conference, Springer, Berlin, Heidelberg, (2016), 635–658. |
[35] | W. Du, A. Li, Q. Li, Privacy-preserving multiparty learning for logistic regression, in International Conference on Security and Privacy in Communication Systems, Springer, Cham, (2018), 549–568. |
[36] | M. A. Pathak, S. Rane, B. Raj, Multiparty differential privacy via aggregation of locally trained classifiers, in NIPS, (2010), 1876–1884. |
[37] | K. Chaudhuri, C. Monteleoni, Privacy-preserving logistic regression, in NIPS, 8 (2008), 289–296. |
[38] | S. Mahdavifar, A. F. A. Kadir, R. Fatemi, D. Alhadidi, A. A. Ghorbani, Dynamic android malware category classification using semi-supervised deep learning, in 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, (2020), 515–522. |
[39] | I. Sharafaldin, A. H. Lashkari, A. A. Ghorbani, Toward generating a new intrusion detection dataset and intrusion traffic characterization, in ICISSp, (2018), 108–116. |
[40] | M. Lichman, UCI machine learning repository, 2013. Available from: http://archive.ics.uci.edu/ml. |
[41] | Minnesota Population Center, Integrated Public Use Microdata Series, International: Version 6.4, 2015. |
[42] | R. Shokri, V. Shmatikov, Privacy-preserving deep learning, in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, (2015), 1310–1321. |
[43] | A. Albarghouthi, J. Hsu, Synthesizing coupling proofs of differential privacy, Proc. ACM Program. Lang., 2 (2017), 1–30. |