Research article Special Issues

Set-valued data collection with local differential privacy based on category hierarchy

  • Received: 01 February 2021 Accepted: 09 March 2021 Published: 22 March 2021
  • Set-valued data is extremely important and widely used in sensor technology and application. Recently, privacy protection for set-valued data under differential privacy (DP) has become a research hotspot. However, the DP model assumes that the data center is trustworthy, consequently, increasingly attention has been paid to the application of the local differential privacy model (LDP) for set-valued data. Constrained by the local differential privacy model, most methods randomly respond to the subset of set-valued data, and the data collector conducts statistics on the received data. There are two main problems with this kind of method: one is that the utility function used in the random response loses too much information; the other is that the privacy protection of the set-valued data category is usually ignored. To solve these problems, this paper proposes a set-valued data collection method (SetLDP) based on the category hierarchy under the local differential privacy model. The core idea is to first make a random response to the existence of the category, continue to disturb the item count if the category exists, and finally randomly respond to a candidate itemset based on the new utility function. Theory analysis and experimental results show that the SetLDP can not only preserve more information, but also protect the category private information in set-valued data.

    Citation: Jia Ouyang, Yinyin Xiao, Shaopeng Liu, Zhenghong Xiao, Xiuxiu Liao. Set-valued data collection with local differential privacy based on category hierarchy[J]. Mathematical Biosciences and Engineering, 2021, 18(3): 2733-2763. doi: 10.3934/mbe.2021139

    Related Papers:

  • Set-valued data is extremely important and widely used in sensor technology and application. Recently, privacy protection for set-valued data under differential privacy (DP) has become a research hotspot. However, the DP model assumes that the data center is trustworthy, consequently, increasingly attention has been paid to the application of the local differential privacy model (LDP) for set-valued data. Constrained by the local differential privacy model, most methods randomly respond to the subset of set-valued data, and the data collector conducts statistics on the received data. There are two main problems with this kind of method: one is that the utility function used in the random response loses too much information; the other is that the privacy protection of the set-valued data category is usually ignored. To solve these problems, this paper proposes a set-valued data collection method (SetLDP) based on the category hierarchy under the local differential privacy model. The core idea is to first make a random response to the existence of the category, continue to disturb the item count if the category exists, and finally randomly respond to a candidate itemset based on the new utility function. Theory analysis and experimental results show that the SetLDP can not only preserve more information, but also protect the category private information in set-valued data.



    加载中


    [1] Y. He, J. F. Naughton, Anonymization of set-valued data via top-down, local generalization, Pro. VLDB Endow., 2 (2009), 934-945. doi: 10.14778/1687627.1687733
    [2] S. L. Wang, Y. C. Tsai, T. P. Hong, Anonymizing set valued social data, in Proceedings of 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, IEEE, (2010), 809-812.
    [3] R. Chen, B. C. M. Fung, N. Mohammed, B. C. Desai, K. Wang, Privacy-preserving trajectory data publishing by local suppression, Inform. Sci., 231 (2013), 83-97. doi: 10.1016/j.ins.2011.07.035
    [4] G. Ghinita, Y. Tao, P. Kalnis, On the Anonymization of sparse high-dimensional data, in Proceedings of the 24nd International Conference on Data Engineering, IEEE, (2008), 715-724.
    [5] J. Cao, P. Karras, C. Raïssi, K. L Tan, ρ-uncertainty: inference-proof transaction anonymization, Proc. VLDB Endowment, 3 (2010), 1033-1044.
    [6] M. Terrovitis, N. Mamoulis, P. Kalnis, Privacy-preserving anonymization of set-valued data, Pro. VLDB Endowment, 1 (2008), 115-125. doi: 10.14778/1453856.1453874
    [7] Y. Xu, K. Wang, W. C. Fu, P. S. Yu, Anonymizing transaction databases for publication, in Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, (2008), 767-775.
    [8] L. Sweeney, k-Anonymity: A model for protecting privacy, Int. J. Uncertainty Fuzziness Knowl. Based Syst., 5 (2002), 557-570.
    [9] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, l-diversity: privacy beyond k-anonymity, ACMT. Knowl. D., 1 (2017), 1-3.
    [10] R. Chen, B. Fung, B. C. Desai, Differentially private trajectory data publication, preprint, arXiv: 1112.2020.
    [11] J. Ouyang, J. Yin, S. P. Liu, Y. B. Liu, An effective differential privacy transaction data publication strategy, J. Comput. Res. Dev., 51 (2014), 2195-2205.
    [12] J. Ouyang, J. Yin, S. P. Liu, Differential privacy publishing strategy for distributed transaction data, J. Software, 26 (2015), 1457-1472.
    [13] G. Fanti, V. Pihur, U. Erlingsson, Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries, Proc. Privacy Enhancing Technol., 2016 (2016), 41-61. doi: 10.1515/popets-2016-0015
    [14] U. Erlingsson, V. Pihur, A. Korolova, RAPPOR: randomized aggregatable privacy-preserving Ordinal Response, in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, (2014), 1054-1067.
    [15] S. L. Warner, Randomized response: a survey technique for eliminating evasive answer bias, J. Am. Stat. Assoc., 60 (1965), 63-69. doi: 10.1080/01621459.1965.10480775
    [16] C. Sun, Y. Fu, J. Zhou, H. Gao, Personalized privacy-preserving frequent itemset mining using randomized response, J. Sci. World, 2014 (2014), 1-10.
    [17] A. V. Evfimievski, R. Srikant, R. Agrawal, J. E. Gehrke, Privacy preserving mining of association rules, J. Inform. Syst., 29 (2004), 343-364.
    [18] Q. Ye, X. Meng, M. Zhu, Z. Huo, Survey on local differential privacy, J. Software, 7 (2018), 159-183.
    [19] B. Ding, J. Kulkarni, S. Yekhanin, Collecting telemetry data privately, preprint, arXiv: 1712.01524.
    [20] P. Kairouz, K. Bonawitz, D. Ramage, Discrete distribution estimation under local privacy, in International Conference on Machine Learning, (2016), 2436-2444.
    [21] J. C. Duchi, M. I. Jordan, M. J. Wainwright, Local privacy and minimax bounds: sharp rates for probability estimation, preprint, arXiv: 1305.6000.
    [22] M. Andrés, N. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Geo-indistinguishability: differential privacy for location-based systems, in Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, ACM, (2013), 901-914.
    [23] N. Bordenabe, K. Chatzikokolakis, C. Palamidessi, Optimal geo-indistinguishable mechanisms for location privacy, in Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, ACM, (2014), 251-262.
    [24] S. W. Wang, L. S. Huang, Y. W. Nie, P. Wang, H. Xu, W. Yang, PrivSet: set-valued data analyses with local differential privacy, in Proceedings of the 2018 IEEE INFOCOM Conference on Computer Communications, IEEE, (2018), 1088-1096.
    [25] J. Hsu, S. Khanna, A. Roth, Distributed Private Heavy Hitters, in International Colloquium on Automata, Languages, and Programming, Springer, Berlin, Heidelberg, (2012), 461-472.
    [26] Z. Qin, Y. Yin, T. Yu, I. Khalil, X. K. Xiao, K. Ren, Heavy hitter estimation over set-valued data with local differential privacy, in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, (2016), 192-203.
    [27] R. Bassily, A. Smith, Local, private, efficient protocols for succinct histograms, in Proceedings of the forty-seventh annual ACM symposium on Theory of computing, ACM, (2015), 127-135.
    [28] M. E. Gursoy, A. Tamersoy, S. Truex, W. Wei, L. Liu, Secure and utility-aware data collection with condensed local differential privacy, preprint, arXiv: 1905.06361.
    [29] W. Wang, M. A. Carreira-Perpiñán, Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application, preprint, arXiv: 1309.1541.
    [30] C. W. Lin, T. P Hong, H. C. Hsu, Reducing side effects of hiding sensitive itemsets in privacy preserving data mining, Sci. World J., 2014 (2014), 235837.
    [31] J. C. Lin, P. Fournier-Viger, L. Wu, W. Gan, Y. Djenouri, J. Zhang, PPSF: an open-source privacy-preserving and security mining framework, in 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, (2018), 1459-1463.
    [32] X. T. Wang, H. F. Zhu, A novel two-party key agreement protocol with the environment of wearable device using chaotic maps, Data Sci. Pattern Recognit., 3 (2019), 12-23.
    [33] Q. Q. Ye, H. B. Hu, X. F. Meng, H. D. Zheng, PrivKV: key-value data collection with local differential privacy, in 2019 IEEE Symposium on Security and Privacy (SP), IEEE, (2019), 294-308.
  • mbe-18-03-139-supplementary.pdf
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2549) PDF downloads(110) Cited by(3)

Article outline

Figures and Tables

Figures(13)  /  Tables(14)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog