The density peak clustering algorithm (DPC) requires manual determination of cluster centers, and poor performance on complex datasets with varying densities or non-convexity. Hence, a novel density peak clustering algorithm is proposed for the automatic selection of clustering centers based on K-nearest neighbors (AKDPC). First, the AKDPC classifies samples according to their mutual K-nearest neighbor values into core and non-core points. Second, the AKDPC uses the average distance of K nearest neighbors of a sample as its density. The smaller the average distance is, the higher the density. Subsequently, it selects the highest density sample among all unclassified core points as a center of the new cluster, and the core points that satisfy the merging condition are added to the cluster until no core points satisfy the condition. Afterwards, the above steps are repeated to complete the clustering of all core points. Lastly, the AKDPC labels the unclassified non-core points similar to the nearest points that have been classified. In addition, to prove the validity of AKDPC, experiments on manual and real datasets are conducted. By comparing the AKDPC with classical clustering algorithms and excellent DPC-variants, this paper demonstrates that AKDPC presents higher accuracy.
Citation: Zhihe Wang, Huan Wang, Hui Du, Shiyin Chen, Xinxin Shi. A novel density peaks clustering algorithm for automatic selection of clustering centers based on K-nearest neighbors[J]. Mathematical Biosciences and Engineering, 2023, 20(7): 11875-11894. doi: 10.3934/mbe.2023528
The density peak clustering algorithm (DPC) requires manual determination of cluster centers, and poor performance on complex datasets with varying densities or non-convexity. Hence, a novel density peak clustering algorithm is proposed for the automatic selection of clustering centers based on K-nearest neighbors (AKDPC). First, the AKDPC classifies samples according to their mutual K-nearest neighbor values into core and non-core points. Second, the AKDPC uses the average distance of K nearest neighbors of a sample as its density. The smaller the average distance is, the higher the density. Subsequently, it selects the highest density sample among all unclassified core points as a center of the new cluster, and the core points that satisfy the merging condition are added to the cluster until no core points satisfy the condition. Afterwards, the above steps are repeated to complete the clustering of all core points. Lastly, the AKDPC labels the unclassified non-core points similar to the nearest points that have been classified. In addition, to prove the validity of AKDPC, experiments on manual and real datasets are conducted. By comparing the AKDPC with classical clustering algorithms and excellent DPC-variants, this paper demonstrates that AKDPC presents higher accuracy.
[1] | Z. Chen, Z. Qi, F. Meng, L. Cui, Y. Shi, Image segmentation via improving clustering algorithms with density and distance, Procedia Comput. Sci., 55 (2015), 1015–1022. https://doi.org/10.1016/j.procs.2015.07.096 doi: 10.1016/j.procs.2015.07.096 |
[2] | Q. Zhao, X. Li, Y. Li, X. Zhao, A fuzzy clustering image segmentation algorithm based on hidden Markov random field models and Voronoi tessellation, Pattern Recognit. Lett., 85 (2017), 49–55. https://doi.org/10.1016/j.patrec.2016.11.019 doi: 10.1016/j.patrec.2016.11.019 |
[3] | X. Zeng, A. Chen, M. Zhou, Color perception algorithm of medical images using density peak based hierarchical clustering, Biomed. Signal Process. Control, 48 (2019), 69–79. https://doi.org/10.1016/j.bspc.2018.09.013 doi: 10.1016/j.bspc.2018.09.013 |
[4] | J. Gao, M. T. Chang, H. C. Johnsen, S. P. Guo, B. E. Sylvester, S. O. Sumer, et al., 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets, Genome Med., 9 (2017), 1–13. https://doi.org/10.1186/s13073-016-0393-x doi: 10.1186/s13073-016-0393-x |
[5] | J. W. Wu, J. C. Tseng, W. N. Tsai, A hybrid linear text segmentation algorithm using hierarchical agglomerative clustering and discrete particle swarm optimization, Integr. Comput.-Aided Eng., 21 (2014), 35–46. https://doi.org/10.3233/ICA-130446 doi: 10.3233/ICA-130446 |
[6] | A. Sapountzi, K. E. Psannis, Social networking data analysis tools & challenges, Future Gener. Comput. Syst., 86 (2018), 893–913. https://doi.org/10.1016/j.future.2016.10.019 doi: 10.1016/j.future.2016.10.019 |
[7] | X. Cai, X. Z. Gao, Y. Xue, Improved bat algorithm with optimal forage strategy and random disturbance strategy, Int. J. Bio-Inspired Comput., 8 (2016), 205–214. https://doi.org/1504.2016/IJBIC.078666 |
[8] | Q. Zou, G. Lin, X. Jiang, X. Liu, X. Zeng, Sequence clustering in bioinformatics: an empirical study, Briefings Bioinf., 21 (2020), 1–10. https://doi.org/1093.090/bib/bby |
[9] | J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symposium on Math., Stat., and Prob, (1965), 281. |
[10] | T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases, ACM Sigmod Record, 25 (1996), 103–114. https://doi.org/10.1145/235968.233324 doi: 10.1145/235968.233324 |
[11] | M. Ester, H. P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in kdd, 96 (1996), 226–231. |
[12] | G. Sheikholeslami, S. Chatterjee, A. Zhang, WaveCluster: a wavelet-based clustering approach for spatial data in very large databases, VLDB J., 8 (2000), 289–304. https://doi.org/10.1007/s007780050009 doi: 10.1007/s007780050009 |
[13] | U. Von Luxburg, A tutorial on spectral clustering, Stat. Comput., 17 (2007), 395–416. |
[14] | A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science, 344 (2014), 1492–1496. https://doi.org/10.1126/science.1242072 doi: 10.1126/science.1242072 |
[15] | Y. Liu, Z. Ma, F. Yu, Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy, Knowledge-Based Syst., 133 (2017), 208–220. https://doi.org/10.1016/j.knosys.2017.07.010 doi: 10.1016/j.knosys.2017.07.010 |
[16] | Z. Guo, T. Huang, Z. Cai, W. Zhu, A new local density for density peak clustering, in Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3–6, 2018, Proceedings, Part Ⅲ 22, (2018), 426–438. https://doi.org/10.1007/978-3-319-93040-4_34 |
[17] | W. Zhou, L. Wang, X. Han, M. Parmar, M. Li, A novel density deviation multi-peaks automatic clustering algorithm, Complex Intell. Syst., 9 (2023), 177–211. https://doi.org/10.1007/s40747-022-00798-3 doi: 10.1007/s40747-022-00798-3 |
[18] | J. Xie, H. Gao, W. Xie, X. Liu, P. W. Grant, Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors, Inf. Sci., 354 (2016), 19–40. https://doi.org/10.1016/j.ins.2016.03.011 doi: 10.1016/j.ins.2016.03.011 |
[19] | R. Liu, H. Wang, X. Yu, Shared-nearest-neighbor-based clustering by fast search and find of density peaks, Inf. Sci., 450 (2018), 200–226. https://doi.org/10.1016/j.ins.2018.03.031 doi: 10.1016/j.ins.2018.03.031 |
[20] | H. Yu, L. Chen, J. Yao, A three-way density peak clustering method based on evidence theory, Knowledge-Based Syst., 211 (2021), 106532. https://doi.org/10.1016/j.knosys.2020.106532 doi: 10.1016/j.knosys.2020.106532 |
[21] | J. Jiang, Y. Chen, X. Meng, L. Wang, K. Li, A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process, Physica A, 523 (2019), 702–713. https://doi.org/10.1016/j.physa.2019.03.012 doi: 10.1016/j.physa.2019.03.012 |
[22] | A. K. Jain, M. H. Law, Data clustering: A user's dilemma, in Pattern Recognition and Machine Intelligence: First International Conference, PReMI 2005, Kolkata, India, December 20–22, Proceedings 1, (2005), 1–10. https://doi.org/10.1007/11590316_1 |
[23] | A. Gionis, H. Mannila, P. Tsaparas, Clustering aggregation, ACM Trans. Knowl. Discovery Data, 1 (2007), 4-es. https://doi.org/10.1145/1217299.1217303 doi: 10.1145/1217299.1217303 |
[24] | D. Dua, C. Graff, UCI Machine Learning Repository, 2017. Available from: https://archive.ics.uci.edu/ml. |
[25] | W. N. Street, W. H. Wolberg, O. L. Mangasarian, Nuclear feature extraction for breast tumor diagnosis, in Biomedical Image Processing and Biomedical Visualization, 1905 (1993), 861–870. https://doi.org/10.1117/12.148698 |
[26] | L. Fu, E. Medico, FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data, BMC Bioinf., 8 (2007), 1–15. https://doi.org/10.1186/1471-2105-8-3 doi: 10.1186/1471-2105-8-3 |
[27] | H. Chang, D. Y. Yeung, Robust path-based spectral clustering, Pattern Recognit., 41 (2008), 191–203. https://doi.org/10.1016/j.patcog.2007.04.010 doi: 10.1016/j.patcog.2007.04.010 |
[28] | Q. Z. Dai, Z. Y. Xiong, J. Xie, X. Wang, Y. Zhang, J. Shang, A novel clustering algorithm based on the natural reverse nearest neighbor structure, Inf. Syst., 84 (2019), 1–16. https://doi.org/10.1016/j.is.2019.04.001 doi: 10.1016/j.is.2019.04.001 |
[29] | J. M. Santos, M. Embrechts, On the use of the adjusted rand index as a metric for evaluating supervised classification, in Artificial Neural Networks—ICANN 2009: 19th International Conference, Limassol, Cyprus, September 14–17, 2009, Proceedings, Part Ⅱ 19, (2009), 175–184. https://doi.org/10.1007/978-3-642-04277-5_18 |
[30] | A. F. McDaid, D. Greene, N. Hurley, Normalized mutual information to evaluate overlapping community finding algorithms, preprint, arXiv: 11102515. |
[31] | B. P. Nguyen, W. L. Tay, C. K. Chui, Robust biometric recognition from palm depth images for gloved hands, IEEE Trans. Hum.-Mach. Syst., 45 (2015), 799–804. https://doi.org/10.1109/THMS.2015.2453203 doi: 10.1109/THMS.2015.2453203 |
[32] | A. X. Wang, S. S. Chukova, B. P. Nguyen, Implementation and analysis of centroid displacement-based k-nearest neighbors, in Advanced Data Mining and Applications: 18th International Conference, ADMA 2022, Brisbane, QLD, Australia, November 28–30, 2022, Proceedings, Part I, (2022), 431–443. https://doi.org/10.1007/978-3-031-22064-731 |
[33] | A. X. Wang, S. S. Chukova, B. P. Nguyen, Ensemble k-nearest neighbors based on centroid displacement, Inf. Sci., 629 (2023), 313–323. https://doi.org/10.1016/j.ins.2023.02.004 doi: 10.1016/j.ins.2023.02.004 |
[34] | K. Liu, Z. Li, C. Yao, J. Chen, K. Zhang, M. Saifullah, Coupling the k-nearest neighbor procedure with the Kalman filter for real-time updating of the hydraulic model in flood forecasting, Int. J. Sediment Res., 31 (2016), 149–158. https://doi.org/10.1016/j.ijsrc.2016.02.002 doi: 10.1016/j.ijsrc.2016.02.002 |