Research article

Clustering accuracy


  • Received: 06 June 2024 Revised: 13 June 2024 Accepted: 13 June 2024 Published: 17 June 2024
  • Clustering accuracy (ACC) is one of the most often used measures in literature to evaluate clustering quality. However, the measure is often used without any definition or reference to such a definition. In this paper, we identify the origin of the measure. We give a proper definition for the measure and provide a simple bug fix which allows it to be used also in the case of a mismatch in the number of clusters. We show that the measure belongs to a wider class of set-matching based measures. We compare its properties to centroid index (CI) and normalized mutual information (NMI).

    Citation: Pasi Fränti, Sami Sieranoja. Clustering accuracy[J]. Applied Computing and Intelligence, 2024, 4(1): 24-44. doi: 10.3934/aci.2024003

    Related Papers:

  • Clustering accuracy (ACC) is one of the most often used measures in literature to evaluate clustering quality. However, the measure is often used without any definition or reference to such a definition. In this paper, we identify the origin of the measure. We give a proper definition for the measure and provide a simple bug fix which allows it to be used also in the case of a mismatch in the number of clusters. We show that the measure belongs to a wider class of set-matching based measures. We compare its properties to centroid index (CI) and normalized mutual information (NMI).



    加载中


    [1] S. van Dongen, Performance criteria for graph clustering and Markov cluster experiments, Amsterdam: Centrum voor Wiskunde en Informatica, 2000.
    [2] M. Meila, D. Heckerman, An experimental comparison of model based clustering methods, Mach. Learn., 41 (2001), 9–29. https://doi.org/10.1023/A:1007648401407 doi: 10.1023/A:1007648401407
    [3] M. Rezaei, P. Fränti, Set matching measures for external cluster validity, IEEE Trans. Knowl. Data Eng., 28 (2016), 2173–2186. https://doi.org/10.1109/TKDE.2016.2551240 doi: 10.1109/TKDE.2016.2551240
    [4] P. Fränti, Genetic algorithm with deterministic crossover for vector quantization, Pattern Recogn. Lett., 21 (2000), 61–68. https://doi.org/10.1016/S0167-8655(99)00133-6 doi: 10.1016/S0167-8655(99)00133-6
    [5] P. Fränti, Efficiency of random swap clustering, J. Big Data, 5 (2018), 13. https://doi.org/10.1186/s40537-018-0122-y doi: 10.1186/s40537-018-0122-y
    [6] B. Fritzke, Breathing k-means, arXiv: 2006.15666.
    [7] C. Baldassi, Recombinator-k-means: an evolutionary algorithm that exploits k-means++ for recombination, IEEE Trans. Evolut. Comput., 20 (2022), 991–1003. https://doi.org/10.1109/TEVC.2022.3144134 doi: 10.1109/TEVC.2022.3144134
    [8] P. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7 doi: 10.1016/0377-0427(87)90125-7
    [9] T. Calinski, J. Harabasz, A dendrite method for cluster analysis, Commun. Stat., 3 (1974), 1–27. https://doi.org/10.1080/03610927408827101 doi: 10.1080/03610927408827101
    [10] Q. Zhao, P. Fränti, WB-index: a sum-of-squares based index for cluster validity, Data Knowl. Eng., 92 (2014), 77–89. https://doi.org/10.1016/j.datak.2014.07.008 doi: 10.1016/j.datak.2014.07.008
    [11] J. Hämäläinen, S. Jauhiainen, T. Kärkkäinen, Comparison of internal clustering validation indices for prototype-based clustering, Algorithms, 10 (2017), 105. https://doi.org/10.3390/a10030105 doi: 10.3390/a10030105
    [12] M. Niemelä, S. Äyrämö, T. Kärkkäinen, Toolbox for distance estimation and cluster validation on data with missing values, IEEE Access, 10 (2021), 352–367. https://doi.org/10.1109/ACCESS.2021.3136435 doi: 10.1109/ACCESS.2021.3136435
    [13] L. Hubert, P. Arabie, Comparing partitions, J. Classif., 2 (1985), 193–218. https://doi.org/10.1007/BF01908075 doi: 10.1007/BF01908075
    [14] T. Kvålseth, Entropy and correlation: some comments, IEEE Trans. Syst. Man Cybern., 17 (1987), 517–519. https://doi.org/10.1109/TSMC.1987.4309069 doi: 10.1109/TSMC.1987.4309069
    [15] T. Kinnunen, I. Sidoroff, M. Tuononen, P. Fränti, Comparison of clustering methods: a case study of text-independent speaker modeling, Pattern Recogn. Lett., 32 (2011), 1604–1617. https://doi.org/10.1016/j.patrec.2011.06.023 doi: 10.1016/j.patrec.2011.06.023
    [16] C. Peng, Z. Kang, Q. Cheng, Integrating feature and graph learning with low-rank representation, Neurocomputing, 249 (2017), 106–116. https://doi.org/10.1016/j.neucom.2017.03.071 doi: 10.1016/j.neucom.2017.03.071
    [17] M. Wu, B. Schölkopf, A local learning approach for clustering, In: Advances in neural information processing systems 19, Cambridge: MIT Press, 2007, 1529–1536.
    [18] L. Lovasz, M. D. Plummer, Matching theory, North Holland: Akademiai Kiado, 1986.
    [19] H. W. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistics Quarterly, 2 (1955), 83–97. https://doi.org/10.1002/nav.3800020109 doi: 10.1002/nav.3800020109
    [20] J. Munkres, Algorithms for the assignment and transportation problems, J. Soc. Ind. Appl. Math., 5 (1957), 32–38. https://doi.org/10.1137/0105003 doi: 10.1137/0105003
    [21] X. Liu, M. Li, C. Tang, J. Xia, J. Xiong, L. Liu, et al., Efficient and effective regularized incomplete multi-view clustering, IEEE Trans. Pattern Anal., 43 (2021), 2634–2646. https://doi.org/10.1109/TPAMI.2020.2974828 doi: 10.1109/TPAMI.2020.2974828
    [22] M. Alshammari, J. Stavrakakis, M. Takatsuka, Refining a k-nearest neighbor graph for a computationally efficient spectral clustering, Pattern Recogn., 114 (2021), 107869. https://doi.org/10.1016/j.patcog.2021.107869 doi: 10.1016/j.patcog.2021.107869
    [23] J. Ma, Y. Zhang, L. Zhang, Discriminative subspace matrix factorization for multiview data clustering, Pattern Recogn., 111 (2021), 107676. https://doi.org/10.1016/j.patcog.2020.107676 doi: 10.1016/j.patcog.2020.107676
    [24] J. Liu, S. Teng, L. Fei, W. Zhang, X. Fang, Z. Zhang, et al., A novel consensus learning approach to incomplete multi-view clustering, Pattern Recogn., 115 (2021), 107890. https://doi.org/10.1016/j.patcog.2021.107890 doi: 10.1016/j.patcog.2021.107890
    [25] X. Yang, J. Shi, C. Wang, Y. Zhou, Z. Zhou, T. Chen, et al., Binary clustering for deep network trained by feature growth, IEEE Access, 9 (2021), 8354–8366. https://doi.org/10.1109/ACCESS.2020.3047467 doi: 10.1109/ACCESS.2020.3047467
    [26] Y. Xu, S. Chen, J. Li, L. Luo, J. Yang, Learnable low-rank latent dictionary for subspace clustering, Pattern Recogn., 120 (2021), 108142. https://doi.org/10.1016/j.patcog.2021.108142 doi: 10.1016/j.patcog.2021.108142
    [27] Y. Chen, L. Zhou, N. Bouguila, C. Wang, Y. Chen, J. Du, BLOCK-DBSCAN: fast clustering for large scale data, Pattern Recogn., 109 (2021), 107624. https://doi.org/10.1016/j.patcog.2020.107624 doi: 10.1016/j.patcog.2020.107624
    [28] R. Wang, F. Nie, Z. Wang, H. Hu, X. Li, Parameter-free weighted multi-view projected clustering with structured graph learning, IEEE Trans. Knowl. Data Eng., 32 (2020), 2014–2025. https://doi.org/10.1109/TKDE.2019.2913377 doi: 10.1109/TKDE.2019.2913377
    [29] C. Peng, Z. Kang, S. Cai, Q. Cheng, Integrate and conquer: double-sided two-dimensional k-means via integrating of projection and manifold construction, ACM Tran. Intel. Syst. Tech., 9 (2018), 57. https://doi.org/10.1145/3200488 doi: 10.1145/3200488
    [30] C. Peng, Z. Kang, Y. Hu, J. Cheng, Q. Cheng, Robust graph regularized nonnegative matrix factorization for clustering, ACM Trans. Knowl. Discov. Data, 11 (2017), 33. https://doi.org/10.1145/3003730 doi: 10.1145/3003730
    [31] D. Cai, X. He, X. Wu, J. Han, Non-negative matrix factorization on manifold, Proceedings of IEEE International Conference on Data Mining, 2008, 63–72. https://doi.org/10.1109/ICDM.2008.57 doi: 10.1109/ICDM.2008.57
    [32] D. Cai, X. He, J. Han, Locally consistent concept factorization for document clustering, IEEE Trans. Knowl. Data Eng., 23 (2011), 902–913. https://doi.org/10.1109/TKDE.2010.165 doi: 10.1109/TKDE.2010.165
    [33] W. Xu, X. Liu, Y. Gong, Document clustering based on non-negative matrix factorization, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 2003,267–273. https://doi.org/10.1145/860435.860485 doi: 10.1145/860435.860485
    [34] M. Yin, J. Gao, Z. Lin, Laplacian regularized low-rank representation and its applications, IEEE Trans. Pattern Anal., 38 (2016), 504–517. https://doi.org/10.1109/TPAMI.2015.2462360 doi: 10.1109/TPAMI.2015.2462360
    [35] R. Liu, H. Wang, X. Yu, Shared-nearest-neighbor-based clustering by fast search and find of density peaks, Inform. Sciences, 450 (2018), 200–226. https://doi.org/10.1016/j.ins.2018.03.031 doi: 10.1016/j.ins.2018.03.031
    [36] Y. Chen, S. Tang, L. Zhou, C. Wang, J. Du, T. Wang, et al., Decentralized clustering by finding loose and distributed density cores, Inform. Sciences, 433–434 (2018), 510–526. https://doi.org/10.1016/j.ins.2016.08.009 doi: 10.1016/j.ins.2016.08.009
    [37] Y. Chen, S. Tang, S. Pei, C. Wang, J. Du, N. Xiong, DHeat: a density heat-based algorithm for clustering with effective radius, IEEE Trans. Syst. Man Cybern., 48 (2018), 649–660. https://doi.org/10.1109/TSMC.2017.2745493 doi: 10.1109/TSMC.2017.2745493
    [38] E. Müller, S. Günnemann, I. Assent, T. Seidl, Evaluating clustering in subspace projections of high dimensional data, Proc. VLDB Endow., 2 (2009), 1270–1281. https://doi.org/10.14778/1687627.1687770 doi: 10.14778/1687627.1687770
    [39] D. Cai, X. He, J. Han, Document clustering using locality preserving indexing, IEEE Trans. Knowl. Data Eng., 17 (2005), 1624–1637. https://doi.org/10.1109/TKDE.2005.198 doi: 10.1109/TKDE.2005.198
    [40] H. Zha, X. He, C. Ding, M. Gu, H. Simon, Spectral relaxation for k-means clustering, In: Advances in neural information processing systems 14, Cambridge: MIT Press, 2001, 1057–1064.
    [41] N. Slonim, N. Tishby, Document clustering using word clusters via the information bottleneck method, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 2000,208–215. https://doi.org/10.1145/345508.345578 doi: 10.1145/345508.345578
    [42] E. Rendón, I. Abundez, A. Arizmendi, E. M. Quiroz, Internal versus external cluster validation indexes, International Journal of Computer Networks and Communications, 5 (2011), 27–34.
    [43] M. C. P. de Souto, A. L. V. Coelho, K. Faceli, T. C. Sakata, V. Bonadia, I. G. Costa, A comparison of external clustering evaluation indices in the context of imbalanced data sets, Proceedings of Brazilian Symposium on Neural Networks, 2012, 49–54. https://doi.org/10.1109/SBRN.2012.25 doi: 10.1109/SBRN.2012.25
    [44] P. Fränti, M. Rezaei, Q. Zhao, Centroid index: cluster level similarity measure, Pattern Recogn., 47 (2014), 3034–3045. https://doi.org/10.1016/j.patcog.2014.03.017 doi: 10.1016/j.patcog.2014.03.017
    [45] Q. Zhao, P. Fränti, Centroid ratio for pairwise random swap clustering algorithm, IEEE Trans. Knowl. Data Eng., 26 (2014), 1090–1101. https://doi.org/10.1109/TKDE.2013.113 doi: 10.1109/TKDE.2013.113
    [46] P. Fränti, M. Rezaei, Generalized centroid index to different clustering models, Proceedings of Structural, Syntactic, and Statistical Pattern Recognition, 2016,285–296. https://doi.org/10.1007/978-3-319-49055-7_26 doi: 10.1007/978-3-319-49055-7_26
    [47] M. Rezaei, P. Fränti, Can the number of clusters be determined by external indices? IEEE Access, 8 (2020), 89239–89257. https://doi.org/10.1109/ACCESS.2020.2993295 doi: 10.1109/ACCESS.2020.2993295
    [48] R. Jonker, A. Volgenant, A shortest augmenting path algorithm for dense and sparse linear assignment problems, Computing, 38 (1987), 325–340. https://doi.org/10.1007/BF02278710 doi: 10.1007/BF02278710
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1202) PDF downloads(73) Cited by(5)

Article outline

Figures and Tables

Figures(8)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog