Research article

Optimal clustering by merge-based branch-and-bound


  • Received: 18 March 2022 Accepted: 22 March 2022 Published: 31 March 2022
  • We present a method to construct optimal clustering via a sequence of merge steps. We formulate the merge-based clustering as a minimum redundancy search tree, and then search the optimal clustering by a branch-and-bound technique. Optimal clustering is found regardless of the objective function used. We also consider two suboptimal polynomial time variants based on the proposed branch-and-bound technique. However, all variants are slow and has merely theoretical interest. We discuss the reasons for the results.

    Citation: Pasi Fränti, Olli Virmajoki. Optimal clustering by merge-based branch-and-bound[J]. Applied Computing and Intelligence, 2022, 2(1): 63-82. doi: 10.3934/aci.2022004

    Related Papers:

  • We present a method to construct optimal clustering via a sequence of merge steps. We formulate the merge-based clustering as a minimum redundancy search tree, and then search the optimal clustering by a branch-and-bound technique. Optimal clustering is found regardless of the objective function used. We also consider two suboptimal polynomial time variants based on the proposed branch-and-bound technique. However, all variants are slow and has merely theoretical interest. We discuss the reasons for the results.



    加载中


    [1] B. S. Everitt, Cluster Analysis, 3rd ed., Edward Arnold, London / Halsted Press, New York, 1993. ISBN 978-0340584798 / 978-0470220436
    [2] L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley Sons, New York, 1990. https://doi.org/10.1002/9780470316801
    [3] R. Dubes, A. Jain, Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ, 1988. ISBN 978-0-13-022278-7
    [4] A. Gersho, R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Dordrecht, 1992. https://doi.org/10.1007/978-1-4615-3626-0
    [5] M. R. Garey, D. S. Johnson, H. S. Witsenhausen, The complexity of the generalized Lloyd-Max problem, IEEE Trans. Inf. Theory, 28 (1982), 255-256. https://doi.org/10.1109/TIT.1982.1056488 doi: 10.1109/TIT.1982.1056488
    [6] J. H. Ward, Hierarchical grouping to optimize an objective function, J. Amer. Statist. Assoc., 58 (1963), 236-244. https://doi.org/10.1080/01621459.1963.10500845 doi: 10.1080/01621459.1963.10500845
    [7] W. H. Equitz, A new vector quantization clustering algorithm, IEEE Trans. Acoust. Speech, Signal Processing, 37 (1989), 1568-1575. https://doi.org/10.1109/29.35395 doi: 10.1109/29.35395
    [8] P. Fränti, O. Virmajoki, T. Kaukoranta, Branch-and-bound technique for solving optimal clustering, Int. Conf. on Pattern Recognition (ICPR'02), Québec, Canada, 2 (2002), 232-235. https://doi.org/10.1109/ICPR.2002.1048281
    [9] P. Fränti, O. Virmajoki, Polynomial-time clustering algorithms derived from branch-and-bound technique, Advanced Concepts for Intelligent Vision Systems (ACIVS'2002), Gent, Belgium, (2002), 118-123.
    [10] W. L. G. Koontz, P. M. Narendra, K. Fukunaga, A branch and bound clustering algorithm, IEEE Trans. Comput., 24 (1975), 908-915. https://doi.org/10.1109/T-C.1975.224336 doi: 10.1109/T-C.1975.224336
    [11] C.-H. Cheng, A branch and bound clustering algorithm, IEEE Trans. SMC, 25 (1995), 895-898. https://doi.org/10.1109/21.376504 doi: 10.1109/21.376504
    [12] G. Palubeckis, A branch-and-bound approach using polyhedral results for a clustering problem, INFORMS Journal of Computing, 9 (1997), 30-42. https://doi.org/10.1287/ijoc.9.1.30 doi: 10.1287/ijoc.9.1.30
    [13] L. S. Iyer, J. E. Aronson, A parallel branch-and-bound method for cluster analysis, Ann. Oper. Res., 90 (1999), 65-86. https://doi.org/10.1023/A:1018925018009 doi: 10.1023/A:1018925018009
    [14] M. J. Brusco, A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning, Psychometrika, 71 (2006), 347-363. https://doi.org/10.1007/s11336-004-1218-1 doi: 10.1007/s11336-004-1218-1
    [15] G. Kaminka, Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering, European Conf. on Artificial Intelligence, The Hague, The Netherlands - Including Prestigious Applications of Artificial Intelligence (PAIS 2016), 285 (2016), 462-470. IOS Press. https://doi.org/10.3233/978-1-61499-672-9-462
    [16] J. Bennell, G. Scheithauer, Y. Stoyan, T. Romanova, A. Pankratov, Optimal clustering of a pair of irregular objects, J. Glob. Optim., 61 (2015), 497-524. https://doi.org/10.1007/s10898-014-0192-0 doi: 10.1007/s10898-014-0192-0
    [17] Y. Han, L. Zhu, Z. Cheng, J. Li, X. Liu, Discrete optimal graph clustering, IEEE Trans. Cybern., 50 (2018), 1697-1710. https://doi.org/10.1109/TCYB.2018.2881539 doi: 10.1109/TCYB.2018.2881539
    [18] S. Boluki, S. Z. Dadaneh, X. Qian, E. R. Dougherty, Optimal clustering with missing values, BMC bioinformatics, 20 (2019), 1-10. https://doi.org/10.1186/s12859-019-2832-3 doi: 10.1186/s12859-019-2832-3
    [19] T. Feder, D. Greene, Optimal algorithms for approximate clustering, ACM Symposium on Theory of Computing, (1988), 434-444. https://doi.org/10.1145/62212.62255 doi: 10.1145/62212.62255
    [20] R. R. Mettu, C. G. Plaxton, Optimal time bounds for approximate clustering, Machine Learning, 56 (2004), 35-60. https://doi.org/10.1023/B:MACH.0000033114.18632.e0 doi: 10.1023/B:MACH.0000033114.18632.e0
    [21] Z. Wu, R. Leahy, An optimal graph theoretic approach to data clustering: theory and its application to image segmentation, IEEE T. Pattern Anal., 15 (1993), 1101-1113. https://doi.org/10.1109/34.244673 doi: 10.1109/34.244673
    [22] L. Gao, A. L. Rosenberg, R. K. Sitaraman, Optimal clustering of tree-sweep computations for high-latency parallel environments, IEEE T. Parall. Distr., 10 (1999), 813-824. https://doi.org/10.1109/71.790599 doi: 10.1109/71.790599
    [23] X. Wu, Optimal quantization by matrix searching, J. Algorithms, 12 (1991), 663-673. https://doi.org/10.1016/0196-6774(91)90039-2 doi: 10.1016/0196-6774(91)90039-2
    [24] P. Fränti, T. Kaukoranta, D.-F. Shen, K.-S. Chang, Fast and memory efficient implementation of the exact PNN, IEEE Trans. Image Process., 9 (2000), 773-777. https://doi.org/10.1109/83.841516 doi: 10.1109/83.841516
    [25] H. Späth, Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Ellis Horwood Limited, West Sussex, 1980. ISBN 978-0853121411
    [26] R. L. Graham, D. E. Knuth, O. Patashnik, Concrete Mathematics – a Foundation for Computer Science, 2nd ed., Addison-Wesley, 1994. ISBN 978-0201558029
    [27] T. Kaukoranta, P. Fränti, O. Nevalainen, Vector quantization by lazy pairwise nearest neighbor method, Optical Engineering, 38 (1999), 1862-1868. https://doi.org/10.1117/1.602251 doi: 10.1117/1.602251
    [28] Y. Linde, A. Buzo, R. M. Gray, An algorithm for vector quantizer design, IEEE Trans. on Comm., 28 (1980), 84-95. https://doi.org/10.1109/TCOM.1980.1094577 doi: 10.1109/TCOM.1980.1094577
    [29] P. Fränti, O. Virmajoki, Iterative shrinking method for clustering problems, Pattern Recogn., 39 (2006), 761-765. https://doi.org/10.1016/j.patcog.2005.09.012 doi: 10.1016/j.patcog.2005.09.012
    [30] T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theor. Comput. Sci., 38 (1985), 293-306. https://doi.org/10.1016/0304-3975(85)90224-5 doi: 10.1016/0304-3975(85)90224-5
    [31] P. Fränti, S. Sami, How much can k-means be improved by using better initialization and repeats? Pattern Recogn., 93 (2019), 95-112. https://doi.org/10.1016/j.patcog.2019.04.014 doi: 10.1016/j.patcog.2019.04.014
    [32] P. Fränti, N. Teemu, M. Yuan, Converting MST to TSP path by branch elimination, Applied Sciences, 11 (2020), 177. https://doi.org/10.3390/app11010177 doi: 10.3390/app11010177
    [33] P. Fränti, Efficiency of random swap clustering, Journal of Big Data, 5 (2018) 1-29. https://doi.org/10.1186/s40537-018-0122-y doi: 10.1186/s40537-018-0122-y
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1996) PDF downloads(79) Cited by(1)

Article outline

Figures and Tables

Figures(14)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog