Research article

A fast granular ellipsoid-based density peaks clustering algorithm for large-scale data

  • Published: 25 March 2026
  • MSC : 62H30, 68T10, 68W27

  • As an effective clustering approach, the density peaks clustering (DPC) has been extensively studied in recent years. However, the traditional DPC algorithm suffers from not only high computational complexity, but also a limited capability to identify non-spherical or anisotropic clusters. Therefore, we combine the concept of granular computing with ellipsoidal modeling and propose a novel algorithm termed granular-ellipsoid density peaks (GEDP). Meanwhile, we extend the granular ball model into a granular ellipsoid ($ \mathcal{GE} $) model through a hierarchical splitting and fitting process guided by compactness and shape, enabling adaptive modeling of local geometry. Furthermore, the Mahalanobis distance is utilized to capture feature correlations and anisotropy, providing a more faithful description of data structure. Based on this, we define ellipsoid-level density and $ \delta $-distance in an adaptive and parameter-free manner without requiring any manually tuned thresholds or kernel widths. We further redesign the automatic cluster center identification and refinement processes using a normalized $ \gamma $ criterion, combined with robust label propagation and post-processing to ensure reliable clustering performance. Most importantly, comprehensive experiments on synthetic, real-world, and large-scale datasets demonstrate the effectiveness, scalability, and robustness of the proposed GEDP algorithm. The results further confirm its strong adaptability within various data distributions, particularly on large-scale datasets.

    Citation: Shihu Liu, Shuang Li, Fusheng Yu. A fast granular ellipsoid-based density peaks clustering algorithm for large-scale data[J]. AIMS Mathematics, 2026, 11(3): 7871-7909. doi: 10.3934/math.2026325

    Related Papers:

  • As an effective clustering approach, the density peaks clustering (DPC) has been extensively studied in recent years. However, the traditional DPC algorithm suffers from not only high computational complexity, but also a limited capability to identify non-spherical or anisotropic clusters. Therefore, we combine the concept of granular computing with ellipsoidal modeling and propose a novel algorithm termed granular-ellipsoid density peaks (GEDP). Meanwhile, we extend the granular ball model into a granular ellipsoid ($ \mathcal{GE} $) model through a hierarchical splitting and fitting process guided by compactness and shape, enabling adaptive modeling of local geometry. Furthermore, the Mahalanobis distance is utilized to capture feature correlations and anisotropy, providing a more faithful description of data structure. Based on this, we define ellipsoid-level density and $ \delta $-distance in an adaptive and parameter-free manner without requiring any manually tuned thresholds or kernel widths. We further redesign the automatic cluster center identification and refinement processes using a normalized $ \gamma $ criterion, combined with robust label propagation and post-processing to ensure reliable clustering performance. Most importantly, comprehensive experiments on synthetic, real-world, and large-scale datasets demonstrate the effectiveness, scalability, and robustness of the proposed GEDP algorithm. The results further confirm its strong adaptability within various data distributions, particularly on large-scale datasets.



    加载中


    [1] A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science, 344 (2014), 1492–1496. https://doi.org/10.1126/science.1242072 doi: 10.1126/science.1242072
    [2] P. Bhattacharjee, P. Mitra, A survey of density based clustering algorithms, Front. Comput. Sci., 15 (2021), 151308. https://doi.org/10.1007/s11704-019-9059-3 doi: 10.1007/s11704-019-9059-3
    [3] Y. Wang, J. Qian, M. Hassan, X. Zhang, T. Zhang, C. Yang, et al., Density peak clustering algorithms: a review on the decade 2014–2023, Expert Syst. Appl., 238 (2024), 121860. https://doi.org/10.1016/j.eswa.2023.121860 doi: 10.1016/j.eswa.2023.121860
    [4] Y. Chen, X. Hu, W. Fan, L. Shen, Z. Zhang, X. Liu, et al., Fast density peak clustering for large scale data based on kNN, Knowl.-Based Syst., 187 (2020), 104824. https://doi.org/10.1016/j.knosys.2019.06.032 doi: 10.1016/j.knosys.2019.06.032
    [5] I. S. Dhillon, D. S. Modha, Concept decompositions for large sparse text data using clustering, Mach. Learn., 42 (2001), 143–175. https://doi.org/10.1023/A:1007612920971 doi: 10.1023/A:1007612920971
    [6] M. Ester, H. P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996,226–231.
    [7] D. Huang, C. D. Wang, J. S. Wu, J. H. Lai, C. K. Kwoh, Ultra-scalable spectral clustering and ensemble clustering, IEEE Trans. Knowl. Data Eng., 32 (2020), 1212–1226. https://doi.org/10.1109/TKDE.2019.2903410 doi: 10.1109/TKDE.2019.2903410
    [8] D. Cheng, S. Zhang, J. Huang, Dense members of local cores-based density peaks clustering algorithm, Knowl.-Based Syst., 193 (2020), 105454. https://doi.org/10.1016/j.knosys.2019.105454 doi: 10.1016/j.knosys.2019.105454
    [9] D. Cheng, Q. Zhu, J. Huang, Q. Wu, L. Yang, Clustering with local density peaks-based minimum spanning tree, IEEE Trans. Knowl. Data Eng., 33 (2021), 374–387. https://doi.org/10.1109/TKDE.2019.2930056 doi: 10.1109/TKDE.2019.2930056
    [10] Y. Zhang, S. Cheny, G. Yu, Efficient distributed density peaks for clustering large data sets in mapreduce, IEEE Trans. Knowl. Data Eng., 28 (2016), 3218–3230. https://doi.org/10.1109/TKDE.2016.2609423 doi: 10.1109/TKDE.2016.2609423
    [11] B. Y. Chen, Y. B. Luo, Y. Zhang, T. Jia, H. P. Chen, J. Gong, et al., Efficient and scalable DBSCAN framework for clustering continuous trajectories in road networks, Int. J. Geogr. Inform. Sciences, 37 (2023), 1693–1727. https://doi.org/10.1080/13658816.2023.2217443 doi: 10.1080/13658816.2023.2217443
    [12] Y. Yao, Perspectives of granular computing, Proceedings of IEEE International Conference on Granular Computing, 2005, 85–90. https://doi.org/10.1109/GRC.2005.1547239
    [13] S. Xia, Y. Liu, X. Ding, G. Wang, H. Yu, Y. Luo, Granular ball computing classifiers for efficient, scalable and robust learning, Inform. Sciences, 483 (2019), 136–152. https://doi.org/10.1016/j.ins.2019.01.010 doi: 10.1016/j.ins.2019.01.010
    [14] D. Cheng, Y. Li, S. Xia, G. Wang, J. Huang, S. Zhang, A fast granular-ball-based density peaks clustering algorithm for large-scale data, IEEE Trans. Neur. Net. Lear., 35 (2024), 17202–17215. https://doi.org/10.1109/TNNLS.2023.3300916 doi: 10.1109/TNNLS.2023.3300916
    [15] Z. Jia, Z. Zhang, W. Pedrycz, LGBQPC: local granular-ball quality peaks clustering, arXiv: 2505.11359. https://doi.org/10.48550/arXiv.2505.11359
    [16] X. Sun, J. Zhang, B. Huang, X. Wang, T. Wang, H. Li, et al., GEC: a novel and efficient classifier based on granular-ellipsoid model, Inform. Sciences, 700 (2025), 121861. https://doi.org/10.1016/j.ins.2024.121861 doi: 10.1016/j.ins.2024.121861
    [17] C. Liu, R. Li, S. Wu, H. Che, D. Jiang, Z. Yu, et al., Self-guided partial graph propagation for incomplete multiview clustering, IEEE Trans. Neur. Net. Lear., 35 (2024), 10803–10816. https://doi.org/10.1109/TNNLS.2023.3244021 doi: 10.1109/TNNLS.2023.3244021
    [18] C. Liu, S. Wu, R. Li, D. Jiang, H. S. Wong, Self-supervised graph completion for incomplete multi-view clustering, IEEE Trans. Knowl. Data Eng., 35 (2023), 9394–9406. https://doi.org/10.1109/TKDE.2023.3238416 doi: 10.1109/TKDE.2023.3238416
    [19] C. Liu, R. Li, H. Che, M. Leung, S. Wu, Z. Yu, et al., Latent structure-aware view recovery for incomplete multi-view clustering, IEEE Trans. Knowl. Data Eng., 36 (2024), 8655–8669. https://doi.org/10.1109/TKDE.2024.3445992 doi: 10.1109/TKDE.2024.3445992
    [20] Y. Chen, J. Zhou, X. He, X. Luo, An improved density peaks clustering based on sparrow search algorithm, Cluster Comput., 27 (2024), 11017–11037. https://doi.org/10.1007/s10586-024-04384-9 doi: 10.1007/s10586-024-04384-9
    [21] S. Liu, Y. He, X. Yang, Z. Yu, INSDPC: a density peaks clustering algorithm based on interactive neighbors similarity, AIMS Mathematics, 10 (2025), 9748–9772. https://doi.org/10.3934/math.2025447 doi: 10.3934/math.2025447
    [22] L. G. Khachiyan, Rounding of polytopes in the real number model of computation, Math. Oper. Res., 21 (1996), 307–320. https://doi.org/10.1287/moor.21.2.307 doi: 10.1287/moor.21.2.307
    [23] R. Shioda, L. Tuncel, Clustering via minimum volume ellipsoids, Comput. Optim. Appl., 37 (2007), 247–295. https://doi.org/10.1007/s10589-007-9024-1 doi: 10.1007/s10589-007-9024-1
    [24] S. Rosa, R. Harman, Computing minimum-volume enclosing ellipsoids for large datasets, Comput. Stat. Data Anal., 171 (2022), 107452. https://doi.org/10.1016/j.csda.2022.107452 doi: 10.1016/j.csda.2022.107452
    [25] A. Beck, Introduction to nonlinear optimization: theory, algorithms, and applications with MATLAB, Philadelphia: Society for Industrial and Applied Mathematics, 2014. https://doi.org/10.1137/1.9781611973655
    [26] N. Bowman, M. T. Heath, Computing minimum-volume enclosing ellipsoids, Math. Prog. Comp., 15 (2023), 621–650. https://doi.org/10.1007/s12532-023-00242-8 doi: 10.1007/s12532-023-00242-8
    [27] Y. Chen, D. Song, X. Xi, Y. Zhang, Local minima structures in Gaussian mixture models, IEEE Trans. Inform. Theory, 70 (2024), 4218–4257. https://doi.org/10.1109/TIT.2024.3374716 doi: 10.1109/TIT.2024.3374716
    [28] M. Zhao, H. Wang, L. Fan, Y. Liang, D. M. Yan, Robust ellipse fitting using hierarchical Gaussian mixture models, IEEE Trans. Image Process., 30 (2021), 3828–3843. https://doi.org/10.1109/TIP.2021.3065799 doi: 10.1109/TIP.2021.3065799
    [29] R. A. Vandermeulen, R. Saitenmacher, Generalized identifiability bounds for mixture models with grouped samples, IEEE Trans. Inform. Theory, 70 (2024), 2746–2758. https://doi.org/10.1109/TIT.2024.3367433 doi: 10.1109/TIT.2024.3367433
    [30] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967,281–297.
    [31] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791 doi: 10.1109/5.726791
    [32] D. Cai, X. He, J. Han, Document clustering using locality preserving indexing, IEEE Trans. Knowl. Data Eng., 17 (2005), 1624–1637. https://doi.org/10.1109/TKDE.2005.198 doi: 10.1109/TKDE.2005.198
    [33] N. X. Vinh, J. Epps, J. Bailey, Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance, J. Mach. Learn. Res., 11 (2010), 2837–2854.
    [34] S. Xia, D. Peng, D. Meng, C. Zhang, G. Wang, E. Giem, et al., Ball kk-means: fast adaptive clustering with no bounds, IEEE Trans. Pattern Anal., 44 (2022), 87–99. https://doi.org/10.1109/TPAMI.2020.3008694 doi: 10.1109/TPAMI.2020.3008694
    [35] J. Xie, H. Gao, W. Xie, X. Liu, P. Grant, Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors, Inform. Sciences, 354 (2016), 19–40. https://doi.org/10.1016/j.ins.2016.03.011 doi: 10.1016/j.ins.2016.03.011
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(892) PDF downloads(284) Cited by(0)

Article outline

Figures and Tables

Figures(14)  /  Tables(18)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog