Research article Special Issues

The density-based clustering method for privacy-preserving data mining

  • Received: 14 December 2018 Accepted: 31 January 2019 Published: 27 February 2019
  • Privacy-preserving data mining has become an interesting and emerging issue in recent years since it can, not only hide the sensitive information but still mine the meaningful knowledge at the same time. Since privacy-preserving data mining is a non-trivial task, which is also concerned as a NP-hard problem, several evolutionary algorithms were presented to find the optimized solutions but most of them focus on considering a single-objective function with the pre-defined weight values of three side effects (hiding failure, missing cost, and artificial cost). In this paper, we aim at designing a multiple objective particle swarm optimization method for hiding the sensitive information based on the density clustering approach (named CMPSO). The presented CMPSO is more flexible to select the most appropriate solutions for hiding the sensitive information based on user's preference. Extensive experiments are carried on two datasets to show that the designed CMPSO algorithm has good performance than the traditional single-objective evolutionary approaches in terms of three side effects.

    Citation: Jimmy Ming-Tai Wu, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Youcef Djenouri, Chun-Hao Chen, Zhongcui Li. The density-based clustering method for privacy-preserving data mining[J]. Mathematical Biosciences and Engineering, 2019, 16(3): 1718-1728. doi: 10.3934/mbe.2019082

    Related Papers:

  • Privacy-preserving data mining has become an interesting and emerging issue in recent years since it can, not only hide the sensitive information but still mine the meaningful knowledge at the same time. Since privacy-preserving data mining is a non-trivial task, which is also concerned as a NP-hard problem, several evolutionary algorithms were presented to find the optimized solutions but most of them focus on considering a single-objective function with the pre-defined weight values of three side effects (hiding failure, missing cost, and artificial cost). In this paper, we aim at designing a multiple objective particle swarm optimization method for hiding the sensitive information based on the density clustering approach (named CMPSO). The presented CMPSO is more flexible to select the most appropriate solutions for hiding the sensitive information based on user's preference. Extensive experiments are carried on two datasets to show that the designed CMPSO algorithm has good performance than the traditional single-objective evolutionary approaches in terms of three side effects.


    加载中


    [1] R. Agrawal and R. Srikant, Quest synthetic data generator, IBM Almaden Research Center. Available from: http://www.Almaden.ibm.com/cs/quest/syndata.html, (1994).
    [2] R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, The International Conference on Very Large Data Base, (1994), 487–499.
    [3] M. Atallah, E. Bertino, A. Elmagarmid, et al., Disclosure limitation of sensitive rules,The Workshop on Knowledge and Data Engineering Exchange, (1999), 45–52.
    [4] R. Agrawal and R. Srikant, Privacy-preserving data mining,ACM SIGMOD Record, 29 (2000), 439–450.
    [5] C. C. Aggarwal, J. Pei and B. Zhang, On privacy preservation against adversarial data mining, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2006), 510– 516.
    [6] D. W. Cheung, J. Han, V. T. Ng, et al., Maintenance of discovered association rules in large databases: An incremental updating technique,The International Conference on Data Engineering, (1996), 106–114.
    [7] C. A. Coello and M. S. Lechuga, MOPSO: a proposal for multiple objective particle swarm optimization, IEEE Congress on Evolutionary Computation, (2002), 1051–1056.
    [8] M. S. Chen, J. Han and P. S. Yu, Data mining: An overview from a database perspective,IEEE T. Knowl. Data En., 8 (1996), 866–883.
    [9] C. Clifton, M. Kantarcioglu, J. Vaidya, et al., Tools for privacy preserving distributed data mining, ACM SIGKDD Explorations, 4 (2003), 1–7.
    [10] P. Cheng, I. Lee, C. W. Lin, et al., Association rule hiding based on evolutionary multi-objective optimization,Intell. Data Anal., 20 (2016), 495–514.
    [11] C. M. Chen, B. Xiang, Y. Liu, et al., A secure authentication protocol for internet of vehicles, IEEE Access (2019), DOI:10.1109/ACCESS.2019.2891105.
    [12] C. M. Chen, B. Xiang, K. H. Wang, et al., A robust mutual authentication with a key agreement scheme for session initiation protocol,Appl. Sci., 8 (2018).
    [13] E. Dasseni, V. S. Verykios, A. K. Elmagarmid, et al., Hiding association rules by using confidence and support,International Workshop on Information Hiding, (2001), 369–383.
    [14] A. Evfimievski, R. Srikant, R. Agrawal, et al., Privacy preserving mining of association rules,ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2002), 217–228.
    [15] P. Fournier-Viger, J. C.W. Lin, A. Gomariz, et al., The SPMF open-source data mining library version 2,Joint European Conference on Machine Learning and Knowledge Discovery in Databases, (2016), 36–40.
    [16] W. Gan, J. C. W. Lin, H. C. Chao, et al., Data mining in distributed environment: a surveyWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7 (2017), 1–19.
    [17] W. Gan, J. C. W. Lin, P. P. Fournier-Viger, et al., A survey of incremental highutility itemset mining,Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8 (2018), 1–23.
    [18] J. H. Holland, Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence, MIT Press, (1992).
    [19] J. Han, J. Pei, Y. Yin, et al., Mining frequent patterns without candidate generation: A frequentpattern tree approach,Data Mining and Knowledge Discovery, 8 (2004), 53–87.
    [20] S. Han and W. K. Ng, Privacy-preserving genetic algorithms for rule discovery,International Conference on Data Warehousing and Knowledge Discovery, (2007), 407–417.
    [21] T. P. Hong, C. W. Lin, K. T. Yang, et al., Using TF-IDF to hide sensitive itemsets,Appl. Intell., 38 (2012), 502–510.
    [22] M. Z. Islam and L. Brankovic, Privacy preserving data mining: A noise addition framework using a novel clustering technique,Knowl. Based Syst., 24 (2011), 1214–1223.
    [23] S. Jeyadevi, S. Baskar, C. K. Babulal, et al. Solving multiobjective optimal reactive power dispatch using modified NSGA-II,Int. J. Elec. Power., 33 (2011), 219–228.
    [24] J. Kennedy and R. Eberhart, Particle swarm optimization,IEEE International Conference on Neural Networks, (1995), 1942–1948.
    [25] J. Knowles and D. Corne, The pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation, (1999), 98–105.
    [26] Y. Lindell and B. Pinkas, Privacy preserving data mining,The Annual International Cryptology Conference on Advances in Cryptology, (2000), 36–54.
    [27] C. W. Lin, T. P. Hong, C. C. Chang, et al., A greedy-based approach for hiding sensitive itemsets by transaction insertion, J. Inform. Hiding Multimed.Signal Proc., 4 (2013), 201–227.
    [28] C. W. Lin, B. Zhang, K. T. Yang, et al., Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms,The Scientific World J., (2014).
    [29] C.W. Lin, T. P. Hong, K. T. Yang, et al., The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion,Appl. Intell., 42 (2015), 210–230.
    [30] J. C. W. Lin, Q. Liu and P. Fournier-Viger, A sanitization approach for hiding sensitive itemsets based on particle swarm optimization,Eng. Appl. Artif. Intel., 53 (2016), 1–18.
    [31] J. C. W. Lin, P. Fournier-Viger, L. Wu, et al., PPSF: An open-source privacy-preserving and security mining framework,IEEE International Conference on Data Mining Workshop, (2018), 1459– 1463.
    [32] J. C. W. Lin, Y. Zhang, C. H. Chen, et al., A multiple objective PSO-based approach for data sanitization,The 2018 Conference on Technologies and Applications of Artificial Intelligence, (2018), 148–151.
    [33] J. C. W. Lin, L. Yang, P. Fournier-Viger, et al., Mining of skyline patterns by considering both frequent and utility constraints,Eng. Appl. Artif. Intel., 77 (2019), 229–238.
    [34] J. C. W. Lin, Y. Zhang, B. Zhang, et al., Hiding sensitive itemsets with multiple objective optimization, Soft Comput., (2019), 1–19.
    [35] S. R. M. Oliveira and O. R. Zaane, Privacy preserving frequent itemset mining,IEEE International Conference on Privacy, Security and Data Mining, (2002), 43–54.
    [36] N. Srinivas and K. Deb, Multiobjective optimization using nondominated sorting in genetic algorithms, Evol. Comput., 2 (1994), 221–248.
    [37] V. S. Verykios, E. Bertino, I. N. Fovino, et al., State-of-the-art in privacy preserving data mining, ACM SIGMOD Record, 33 (2004), 50–57.
    [38] Y. H. Wu, C. M. Chiang and A. L. P. Chen, Hiding sensitive association rules with limited side effects,IEEE Transactions on Knowledge and Data Engineering, 19 (2007), 29–42.
    [39] T. Y. Wu, C. M. Chen, K. H. Wang, et al., A provably secure certificateless public key encryption with keyword search,J. Chin. Inst. Eng., (2019), DOI:10.1080/02533839.2018.1537807.
    [40] E. Zitzler and L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach,IEEE T. Evolut. Comput., 3 (1994), 257–271.
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4302) PDF downloads(700) Cited by(13)

Article outline

Figures and Tables

Figures(2)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog