Set-valued data collection with local differential privacy based on category hierarchy

Jia Ouyang; Yinyin Xiao; Shaopeng Liu; Zhenghong Xiao; Xiuxiu Liao; Jia Ouyang; Yinyin Xiao; Shaopeng Liu; Zhenghong Xiao; Xiuxiu Liao

doi:10.3934/mbe.2021139

Mathematical Biosciences and Engineering

2021, Volume 18, Issue 3: 2733-2763. doi: 10.3934/mbe.2021139

Previous Article Next Article

Research article Special Issues

Set-valued data collection with local differential privacy based on category hierarchy

1.
School of Cyber Security, Guangdong Polytechnic Normal University, Guangzhou 510665, China
2.
College of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China

Received: 01 February 2021 Accepted: 09 March 2021 Published: 22 March 2021

Set-valued data is extremely important and widely used in sensor technology and application. Recently, privacy protection for set-valued data under differential privacy (DP) has become a research hotspot. However, the DP model assumes that the data center is trustworthy, consequently, increasingly attention has been paid to the application of the local differential privacy model (LDP) for set-valued data. Constrained by the local differential privacy model, most methods randomly respond to the subset of set-valued data, and the data collector conducts statistics on the received data. There are two main problems with this kind of method: one is that the utility function used in the random response loses too much information; the other is that the privacy protection of the set-valued data category is usually ignored. To solve these problems, this paper proposes a set-valued data collection method (SetLDP) based on the category hierarchy under the local differential privacy model. The core idea is to first make a random response to the existence of the category, continue to disturb the item count if the category exists, and finally randomly respond to a candidate itemset based on the new utility function. Theory analysis and experimental results show that the SetLDP can not only preserve more information, but also protect the category private information in set-valued data.
- privacy preservation,
- set-valued data,
- local differential privacy,
- utility function,
- data collection
Citation: Jia Ouyang, Yinyin Xiao, Shaopeng Liu, Zhenghong Xiao, Xiuxiu Liao. Set-valued data collection with local differential privacy based on category hierarchy[J]. Mathematical Biosciences and Engineering, 2021, 18(3): 2733-2763. doi: 10.3934/mbe.2021139

Related Papers:

Abstract

Set-valued data is extremely important and widely used in sensor technology and application. Recently, privacy protection for set-valued data under differential privacy (DP) has become a research hotspot. However, the DP model assumes that the data center is trustworthy, consequently, increasingly attention has been paid to the application of the local differential privacy model (LDP) for set-valued data. Constrained by the local differential privacy model, most methods randomly respond to the subset of set-valued data, and the data collector conducts statistics on the received data. There are two main problems with this kind of method: one is that the utility function used in the random response loses too much information; the other is that the privacy protection of the set-valued data category is usually ignored. To solve these problems, this paper proposes a set-valued data collection method (SetLDP) based on the category hierarchy under the local differential privacy model. The core idea is to first make a random response to the existence of the category, continue to disturb the item count if the category exists, and finally randomly respond to a candidate itemset based on the new utility function. Theory analysis and experimental results show that the SetLDP can not only preserve more information, but also protect the category private information in set-valued data.

References

[1]	Y. He, J. F. Naughton, Anonymization of set-valued data via top-down, local generalization, Pro. VLDB Endow., 2 (2009), 934-945. doi: 10.14778/1687627.1687733
[2]	S. L. Wang, Y. C. Tsai, T. P. Hong, Anonymizing set valued social data, in Proceedings of 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, IEEE, (2010), 809-812.
[3]	R. Chen, B. C. M. Fung, N. Mohammed, B. C. Desai, K. Wang, Privacy-preserving trajectory data publishing by local suppression, Inform. Sci., 231 (2013), 83-97. doi: 10.1016/j.ins.2011.07.035
[4]	G. Ghinita, Y. Tao, P. Kalnis, On the Anonymization of sparse high-dimensional data, in Proceedings of the 24nd International Conference on Data Engineering, IEEE, (2008), 715-724.
[5]	J. Cao, P. Karras, C. Raïssi, K. L Tan, ρ-uncertainty: inference-proof transaction anonymization, Proc. VLDB Endowment, 3 (2010), 1033-1044.
[6]	M. Terrovitis, N. Mamoulis, P. Kalnis, Privacy-preserving anonymization of set-valued data, Pro. VLDB Endowment, 1 (2008), 115-125. doi: 10.14778/1453856.1453874
[7]	Y. Xu, K. Wang, W. C. Fu, P. S. Yu, Anonymizing transaction databases for publication, in Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, (2008), 767-775.
[8]	L. Sweeney, k-Anonymity: A model for protecting privacy, Int. J. Uncertainty Fuzziness Knowl. Based Syst., 5 (2002), 557-570.
[9]	A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, l-diversity: privacy beyond k-anonymity, ACMT. Knowl. D., 1 (2017), 1-3.
[10]	R. Chen, B. Fung, B. C. Desai, Differentially private trajectory data publication, preprint, arXiv: 1112.2020.
[11]	J. Ouyang, J. Yin, S. P. Liu, Y. B. Liu, An effective differential privacy transaction data publication strategy, J. Comput. Res. Dev., 51 (2014), 2195-2205.
[12]	J. Ouyang, J. Yin, S. P. Liu, Differential privacy publishing strategy for distributed transaction data, J. Software, 26 (2015), 1457-1472.
[13]	G. Fanti, V. Pihur, U. Erlingsson, Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries, Proc. Privacy Enhancing Technol., 2016 (2016), 41-61. doi: 10.1515/popets-2016-0015
[14]	U. Erlingsson, V. Pihur, A. Korolova, RAPPOR: randomized aggregatable privacy-preserving Ordinal Response, in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, (2014), 1054-1067.
[15]	S. L. Warner, Randomized response: a survey technique for eliminating evasive answer bias, J. Am. Stat. Assoc., 60 (1965), 63-69. doi: 10.1080/01621459.1965.10480775
[16]	C. Sun, Y. Fu, J. Zhou, H. Gao, Personalized privacy-preserving frequent itemset mining using randomized response, J. Sci. World, 2014 (2014), 1-10.
[17]	A. V. Evfimievski, R. Srikant, R. Agrawal, J. E. Gehrke, Privacy preserving mining of association rules, J. Inform. Syst., 29 (2004), 343-364.
[18]	Q. Ye, X. Meng, M. Zhu, Z. Huo, Survey on local differential privacy, J. Software, 7 (2018), 159-183.
[19]	B. Ding, J. Kulkarni, S. Yekhanin, Collecting telemetry data privately, preprint, arXiv: 1712.01524.
[20]	P. Kairouz, K. Bonawitz, D. Ramage, Discrete distribution estimation under local privacy, in International Conference on Machine Learning, (2016), 2436-2444.
[21]	J. C. Duchi, M. I. Jordan, M. J. Wainwright, Local privacy and minimax bounds: sharp rates for probability estimation, preprint, arXiv: 1305.6000.
[22]	M. Andrés, N. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Geo-indistinguishability: differential privacy for location-based systems, in Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, ACM, (2013), 901-914.
[23]	N. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Optimal geo-indistinguishable mechanisms for location privacy, in Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, ACM, (2014), 251-262.
[24]	S. W. Wang, L. S. Huang, Y. W. Nie, P. Wang, H. Xu, W. Yang, PrivSet: set-valued data analyses with local differential privacy, in Proceedings of the 2018 IEEE INFOCOM Conference on Computer Communications, IEEE, (2018), 1088-1096.
[25]	J. Hsu, S. Khanna, A. Roth, Distributed Private Heavy Hitters, in International Colloquium on Automata, Languages, and Programming, Springer, Berlin, Heidelberg, (2012), 461-472.
[26]	Z. Qin, Y. Yin, T. Yu, I. Khalil, X. K. Xiao, K. Ren, Heavy hitter estimation over set-valued data with local differential privacy, in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, (2016), 192-203.
[27]	R. Bassily, A. Smith, Local, private, efficient protocols for succinct histograms, in Proceedings of the forty-seventh annual ACM symposium on Theory of computing, ACM, (2015), 127-135.
[28]	M. E. Gursoy, A. Tamersoy, S. Truex, W. Wei, L. Liu, Secure and utility-aware data collection with condensed local differential privacy, preprint, arXiv: 1905.06361.
[29]	W. Wang, M. A. Carreira-Perpiñán, Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application, preprint, arXiv: 1309.1541.
[30]	C. W. Lin, T. P Hong, H. C. Hsu, Reducing side effects of hiding sensitive itemsets in privacy preserving data mining, Sci. World J., 2014 (2014), 235837.
[31]	J. C. Lin, P. Fournier-Viger, L. Wu, W. Gan, Y. Djenouri, J. Zhang, PPSF: an open-source privacy-preserving and security mining framework, in 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, (2018), 1459-1463.
[32]	X. T. Wang, H. F. Zhu, A novel two-party key agreement protocol with the environment of wearable device using chaotic maps, Data Sci. Pattern Recognit., 3 (2019), 12-23.
[33]	Q. Q. Ye, H. B. Hu, X. F. Meng, H. D. Zheng, PrivKV: key-value data collection with local differential privacy, in 2019 IEEE Symposium on Security and Privacy (SP), IEEE, (2019), 294-308.

mbe-18-03-139-supplementary.pdf

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)