A fast granular ellipsoid-based density peaks clustering algorithm for large-scale data

Shihu Liu; Shuang Li; Fusheng Yu; Shihu Liu; Shuang Li; Fusheng Yu

doi:10.3934/math.2026325

AIMS Mathematics

2026, Volume 11, Issue 3: 7871-7909. doi: 10.3934/math.2026325

Previous Article Next Article

Research article

A fast granular ellipsoid-based density peaks clustering algorithm for large-scale data

1.
School of Mathematics and Computer Science, Yunnan Minzu University, Kunming 650504, China
2.
Fujian Provincial Key Laboratory of Data-Intensive Computing, Quanzhou Normal University, Quanzhou 362000, China
3.
School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China

Received: 14 January 2026 Revised: 10 March 2026 Accepted: 16 March 2026 Published: 25 March 2026
MSC : 62H30, 68T10, 68W27

As an effective clustering approach, the density peaks clustering (DPC) has been extensively studied in recent years. However, the traditional DPC algorithm suffers from not only high computational complexity, but also a limited capability to identify non-spherical or anisotropic clusters. Therefore, we combine the concept of granular computing with ellipsoidal modeling and propose a novel algorithm termed granular-ellipsoid density peaks (GEDP). Meanwhile, we extend the granular ball model into a granular ellipsoid ($ \mathcal{GE} $) model through a hierarchical splitting and fitting process guided by compactness and shape, enabling adaptive modeling of local geometry. Furthermore, the Mahalanobis distance is utilized to capture feature correlations and anisotropy, providing a more faithful description of data structure. Based on this, we define ellipsoid-level density and $ \delta $-distance in an adaptive and parameter-free manner without requiring any manually tuned thresholds or kernel widths. We further redesign the automatic cluster center identification and refinement processes using a normalized $ \gamma $ criterion, combined with robust label propagation and post-processing to ensure reliable clustering performance. Most importantly, comprehensive experiments on synthetic, real-world, and large-scale datasets demonstrate the effectiveness, scalability, and robustness of the proposed GEDP algorithm. The results further confirm its strong adaptability within various data distributions, particularly on large-scale datasets.
- clustering,
- density peaks clustering,
- granular computing,
- granular ellipsoid,
- large-scale data
Citation: Shihu Liu, Shuang Li, Fusheng Yu. A fast granular ellipsoid-based density peaks clustering algorithm for large-scale data[J]. AIMS Mathematics, 2026, 11(3): 7871-7909. doi: 10.3934/math.2026325

Related Papers:

Abstract

As an effective clustering approach, the density peaks clustering (DPC) has been extensively studied in recent years. However, the traditional DPC algorithm suffers from not only high computational complexity, but also a limited capability to identify non-spherical or anisotropic clusters. Therefore, we combine the concept of granular computing with ellipsoidal modeling and propose a novel algorithm termed granular-ellipsoid density peaks (GEDP). Meanwhile, we extend the granular ball model into a granular ellipsoid ($ \mathcal{GE} $) model through a hierarchical splitting and fitting process guided by compactness and shape, enabling adaptive modeling of local geometry. Furthermore, the Mahalanobis distance is utilized to capture feature correlations and anisotropy, providing a more faithful description of data structure. Based on this, we define ellipsoid-level density and $ \delta $-distance in an adaptive and parameter-free manner without requiring any manually tuned thresholds or kernel widths. We further redesign the automatic cluster center identification and refinement processes using a normalized $ \gamma $ criterion, combined with robust label propagation and post-processing to ensure reliable clustering performance. Most importantly, comprehensive experiments on synthetic, real-world, and large-scale datasets demonstrate the effectiveness, scalability, and robustness of the proposed GEDP algorithm. The results further confirm its strong adaptability within various data distributions, particularly on large-scale datasets.

References

[1]	A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science, 344 (2014), 1492–1496. https://doi.org/10.1126/science.1242072 doi: 10.1126/science.1242072
[2]	P. Bhattacharjee, P. Mitra, A survey of density based clustering algorithms, Front. Comput. Sci., 15 (2021), 151308. https://doi.org/10.1007/s11704-019-9059-3 doi: 10.1007/s11704-019-9059-3
[3]	Y. Wang, J. Qian, M. Hassan, X. Zhang, T. Zhang, C. Yang, et al., Density peak clustering algorithms: a review on the decade 2014–2023, Expert Syst. Appl., 238 (2024), 121860. https://doi.org/10.1016/j.eswa.2023.121860 doi: 10.1016/j.eswa.2023.121860
[4]	Y. Chen, X. Hu, W. Fan, L. Shen, Z. Zhang, X. Liu, et al., Fast density peak clustering for large scale data based on kNN, Knowl.-Based Syst., 187 (2020), 104824. https://doi.org/10.1016/j.knosys.2019.06.032 doi: 10.1016/j.knosys.2019.06.032
[5]	I. S. Dhillon, D. S. Modha, Concept decompositions for large sparse text data using clustering, Mach. Learn., 42 (2001), 143–175. https://doi.org/10.1023/A:1007612920971 doi: 10.1023/A:1007612920971
[6]	M. Ester, H. P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996,226–231.
[7]	D. Huang, C. D. Wang, J. S. Wu, J. H. Lai, C. K. Kwoh, Ultra-scalable spectral clustering and ensemble clustering, IEEE Trans. Knowl. Data Eng., 32 (2020), 1212–1226. https://doi.org/10.1109/TKDE.2019.2903410 doi: 10.1109/TKDE.2019.2903410
[8]	D. Cheng, S. Zhang, J. Huang, Dense members of local cores-based density peaks clustering algorithm, Knowl.-Based Syst., 193 (2020), 105454. https://doi.org/10.1016/j.knosys.2019.105454 doi: 10.1016/j.knosys.2019.105454
[9]	D. Cheng, Q. Zhu, J. Huang, Q. Wu, L. Yang, Clustering with local density peaks-based minimum spanning tree, IEEE Trans. Knowl. Data Eng., 33 (2021), 374–387. https://doi.org/10.1109/TKDE.2019.2930056 doi: 10.1109/TKDE.2019.2930056
[10]	Y. Zhang, S. Cheny, G. Yu, Efficient distributed density peaks for clustering large data sets in mapreduce, IEEE Trans. Knowl. Data Eng., 28 (2016), 3218–3230. https://doi.org/10.1109/TKDE.2016.2609423 doi: 10.1109/TKDE.2016.2609423
[11]	B. Y. Chen, Y. B. Luo, Y. Zhang, T. Jia, H. P. Chen, J. Gong, et al., Efficient and scalable DBSCAN framework for clustering continuous trajectories in road networks, Int. J. Geogr. Inform. Sciences, 37 (2023), 1693–1727. https://doi.org/10.1080/13658816.2023.2217443 doi: 10.1080/13658816.2023.2217443
[12]	Y. Yao, Perspectives of granular computing, Proceedings of IEEE International Conference on Granular Computing, 2005, 85–90. https://doi.org/10.1109/GRC.2005.1547239
[13]	S. Xia, Y. Liu, X. Ding, G. Wang, H. Yu, Y. Luo, Granular ball computing classifiers for efficient, scalable and robust learning, Inform. Sciences, 483 (2019), 136–152. https://doi.org/10.1016/j.ins.2019.01.010 doi: 10.1016/j.ins.2019.01.010
[14]	D. Cheng, Y. Li, S. Xia, G. Wang, J. Huang, S. Zhang, A fast granular-ball-based density peaks clustering algorithm for large-scale data, IEEE Trans. Neur. Net. Lear., 35 (2024), 17202–17215. https://doi.org/10.1109/TNNLS.2023.3300916 doi: 10.1109/TNNLS.2023.3300916
[15]	Z. Jia, Z. Zhang, W. Pedrycz, LGBQPC: local granular-ball quality peaks clustering, arXiv: 2505.11359. https://doi.org/10.48550/arXiv.2505.11359
[16]	X. Sun, J. Zhang, B. Huang, X. Wang, T. Wang, H. Li, et al., GEC: a novel and efficient classifier based on granular-ellipsoid model, Inform. Sciences, 700 (2025), 121861. https://doi.org/10.1016/j.ins.2024.121861 doi: 10.1016/j.ins.2024.121861
[17]	C. Liu, R. Li, S. Wu, H. Che, D. Jiang, Z. Yu, et al., Self-guided partial graph propagation for incomplete multiview clustering, IEEE Trans. Neur. Net. Lear., 35 (2024), 10803–10816. https://doi.org/10.1109/TNNLS.2023.3244021 doi: 10.1109/TNNLS.2023.3244021
[18]	C. Liu, S. Wu, R. Li, D. Jiang, H. S. Wong, Self-supervised graph completion for incomplete multi-view clustering, IEEE Trans. Knowl. Data Eng., 35 (2023), 9394–9406. https://doi.org/10.1109/TKDE.2023.3238416 doi: 10.1109/TKDE.2023.3238416
[19]	C. Liu, R. Li, H. Che, M. Leung, S. Wu, Z. Yu, et al., Latent structure-aware view recovery for incomplete multi-view clustering, IEEE Trans. Knowl. Data Eng., 36 (2024), 8655–8669. https://doi.org/10.1109/TKDE.2024.3445992 doi: 10.1109/TKDE.2024.3445992
[20]	Y. Chen, J. Zhou, X. He, X. Luo, An improved density peaks clustering based on sparrow search algorithm, Cluster Comput., 27 (2024), 11017–11037. https://doi.org/10.1007/s10586-024-04384-9 doi: 10.1007/s10586-024-04384-9
[21]	S. Liu, Y. He, X. Yang, Z. Yu, INSDPC: a density peaks clustering algorithm based on interactive neighbors similarity, AIMS Mathematics, 10 (2025), 9748–9772. https://doi.org/10.3934/math.2025447 doi: 10.3934/math.2025447
[22]	L. G. Khachiyan, Rounding of polytopes in the real number model of computation, Math. Oper. Res., 21 (1996), 307–320. https://doi.org/10.1287/moor.21.2.307 doi: 10.1287/moor.21.2.307
[23]	R. Shioda, L. Tuncel, Clustering via minimum volume ellipsoids, Comput. Optim. Appl., 37 (2007), 247–295. https://doi.org/10.1007/s10589-007-9024-1 doi: 10.1007/s10589-007-9024-1
[24]	S. Rosa, R. Harman, Computing minimum-volume enclosing ellipsoids for large datasets, Comput. Stat. Data Anal., 171 (2022), 107452. https://doi.org/10.1016/j.csda.2022.107452 doi: 10.1016/j.csda.2022.107452
[25]	A. Beck, Introduction to nonlinear optimization: theory, algorithms, and applications with MATLAB, Philadelphia: Society for Industrial and Applied Mathematics, 2014. https://doi.org/10.1137/1.9781611973655
[26]	N. Bowman, M. T. Heath, Computing minimum-volume enclosing ellipsoids, Math. Prog. Comp., 15 (2023), 621–650. https://doi.org/10.1007/s12532-023-00242-8 doi: 10.1007/s12532-023-00242-8
[27]	Y. Chen, D. Song, X. Xi, Y. Zhang, Local minima structures in Gaussian mixture models, IEEE Trans. Inform. Theory, 70 (2024), 4218–4257. https://doi.org/10.1109/TIT.2024.3374716 doi: 10.1109/TIT.2024.3374716
[28]	M. Zhao, H. Wang, L. Fan, Y. Liang, D. M. Yan, Robust ellipse fitting using hierarchical Gaussian mixture models, IEEE Trans. Image Process., 30 (2021), 3828–3843. https://doi.org/10.1109/TIP.2021.3065799 doi: 10.1109/TIP.2021.3065799
[29]	R. A. Vandermeulen, R. Saitenmacher, Generalized identifiability bounds for mixture models with grouped samples, IEEE Trans. Inform. Theory, 70 (2024), 2746–2758. https://doi.org/10.1109/TIT.2024.3367433 doi: 10.1109/TIT.2024.3367433
[30]	J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967,281–297.
[31]	Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791 doi: 10.1109/5.726791
[32]	D. Cai, X. He, J. Han, Document clustering using locality preserving indexing, IEEE Trans. Knowl. Data Eng., 17 (2005), 1624–1637. https://doi.org/10.1109/TKDE.2005.198 doi: 10.1109/TKDE.2005.198
[33]	N. X. Vinh, J. Epps, J. Bailey, Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance, J. Mach. Learn. Res., 11 (2010), 2837–2854.
[34]	S. Xia, D. Peng, D. Meng, C. Zhang, G. Wang, E. Giem, et al., Ball kk-means: fast adaptive clustering with no bounds, IEEE Trans. Pattern Anal., 44 (2022), 87–99. https://doi.org/10.1109/TPAMI.2020.3008694 doi: 10.1109/TPAMI.2020.3008694
[35]	J. Xie, H. Gao, W. Xie, X. Liu, P. Grant, Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors, Inform. Sciences, 354 (2016), 19–40. https://doi.org/10.1016/j.ins.2016.03.011 doi: 10.1016/j.ins.2016.03.011

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)