Clustering is an important and challenging research topic in many fields. Although various clustering algorithms have been developed in the past, traditional shallow clustering algorithms cannot mine the underlying structural information of the data. Recent advances have shown that deep clustering can achieve excellent performance on clustering tasks. In this work, a novel variational autoencoder-based deep clustering algorithm is proposed. It treats the Gaussian mixture model as the prior latent space and uses an additional classifier to distinguish different clusters in the latent space accurately. A similarity-based loss function is proposed consisting specifically of the cross-entropy of the predicted transition probabilities of clusters and the Wasserstein distance of the predicted posterior distributions. The new loss encourages the model to learn meaningful cluster-oriented representations to facilitate clustering tasks. The experimental results show that our method consistently achieves competitive results on various data sets.
Citation: He Ma. Achieving deep clustering through the use of variational autoencoders and similarity-based loss[J]. Mathematical Biosciences and Engineering, 2022, 19(10): 10344-10360. doi: 10.3934/mbe.2022484
Clustering is an important and challenging research topic in many fields. Although various clustering algorithms have been developed in the past, traditional shallow clustering algorithms cannot mine the underlying structural information of the data. Recent advances have shown that deep clustering can achieve excellent performance on clustering tasks. In this work, a novel variational autoencoder-based deep clustering algorithm is proposed. It treats the Gaussian mixture model as the prior latent space and uses an additional classifier to distinguish different clusters in the latent space accurately. A similarity-based loss function is proposed consisting specifically of the cross-entropy of the predicted transition probabilities of clusters and the Wasserstein distance of the predicted posterior distributions. The new loss encourages the model to learn meaningful cluster-oriented representations to facilitate clustering tasks. The experimental results show that our method consistently achieves competitive results on various data sets.
[1] | A. K. Jain, Data clustering: 50 years beyond k-means, Pattern Recognit. Lett., 31 (2009), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011 doi: 10.1016/j.patrec.2009.09.011 |
[2] | A. K. Jain, R. C. Dubes, Algorithms for clustering data, Prentice-Hall, University of Michigan, 1998. |
[3] | A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: A review, ACM Comput. Surv., 3 (1999), 264–323. https://doi.org/10.1145/331499.331504 doi: 10.1145/331499.331504 |
[4] | R. Xu, D. C. Wunsch Ⅱ, Survey of clustering algorithms, IEEE Trans. Neural Netw. Learn. Syst., 3 (2005), 645–678. https://doi.org/10.1109/TNN.2005.845141 doi: 10.1109/TNN.2005.845141 |
[5] | K. Pearson, On lines and planes of closest fit to systems of point in space, Philos. Mag., 2 (1901), 559–572. |
[6] | P. Comon, Independent component analysis, A new concept?, Signal Process., 36 (1994), 287–314. https://doi.org/10.1016/0165-1684(94)90029-9 doi: 10.1016/0165-1684(94)90029-9 |
[7] | V. N. Vapnik, Statistical learning theory, in Learning from Data: Concepts, Theory, and Methods, Wiley Interscience, (2007), 99–150. |
[8] | C. Zheng, D. Huang, L. Zhang, X. Kong, Tumor clustering using nonnegative matrix factorization with gene selection, IEEE T. Inf. Technol. B., 13 (2009), 599–607. https://doi.org/10.1109/TITB.2009.2018115 doi: 10.1109/TITB.2009.2018115 |
[9] | J. Wang, J. Jiang, Unsupervised deep clustering via adaptive GMM modeling and optimization, Neurocomputing, 433 (2021), 199–211. https://doi.org/10.1016/j.neucom.2020.12.082 doi: 10.1016/j.neucom.2020.12.082 |
[10] | J. Cai, S. Wang, C. Xu, W. Guo, Unsupervised deep clustering via contractive feature representation and focal loss, Pattern Recogn., 123 (2022). https://doi.org/10.1016/j.patcog.2021.108386 doi: 10.1016/j.patcog.2021.108386 |
[11] | F. Tian, B. Gao, Q. Cui, E. Chen, T. Liu, Learning deep representations for graph clustering, AAAI Conf. Artif. Intell., (2014), 1293–1299. https://doi.org/10.1609/aaai.v28i1.8916 doi: 10.1609/aaai.v28i1.8916 |
[12] | Z. Jiang, Y. Zheng, H. Tan, B. Tang, H. Zhou, Variational deep embedding: an unsupervised and generative approach to clustering, Int. Joint Conf. Artif. Intell., (2017), 1965–1972. https://doi.org/10.48550/arXiv.1611.05148 doi: 10.48550/arXiv.1611.05148 |
[13] | L. Yang, N. Cheung, J. Li, J. Fang, Deep clustering by Gaussian mixture variational autoencoders with graph embedding, IEEE Int. Conf. Comput. Vis., (2019), 6449-6458. https://doi.org/10.1109/ICCV.2019.00654 doi: 10.1109/ICCV.2019.00654 |
[14] | V. Prasad, D. Das, B. Bhowmick, Variational clustering: Leveraging variational autoencoders for image clustering, Int. Jt. Conf. Neural Networks, (2020). https://doi.org/10.1109/IJCNN48605.2020.9207523 doi: 10.1109/IJCNN48605.2020.9207523 |
[15] | C. C. Aggarwal, C. K. Reddy, Data clustering: algorithms and applications, in Data Mining and Knowledge Discovery Series, CRC Press, (2013). |
[16] | M. Li, B. Yuan, 2D-LDA: A statistical linear discriminant analysis for image matrix, Pattern Recognit. Lett., 26 (2005), 527–532. https://doi.org/10.1016/j.patrec.2004.09.007 doi: 10.1016/j.patrec.2004.09.007 |
[17] | B. Schölkopf, A. Smola, K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10 (1998), 1299–1319. https://doi.org/10.1162/089976698300017467 doi: 10.1162/089976698300017467 |
[18] | X. Chen, D. Cai, Large scale spectral clustering with landmark-based representation, AAAI Conf. Artif. Intell., (2011), 313–318. https://doi.org/10.1109/TCYB.2014.2358564 doi: 10.1109/TCYB.2014.2358564 |
[19] | D. Cai, X. He, X. Wang, H. Bao, J. Han, Locality preserving nonnegative matrix factorization, Int. Joint Conf. Artif. Intell., (2009). |
[20] | G. Trigeorgis, K. Bousmalis, S. Zafeiriou, B. W. Schuller, A deep semi-nmf model for learning hidden representations, Int. Conf. Mach. Learn., 32 (2014), 1692–1700. https://doi.org/10.5555/3044805.3045081 doi: 10.5555/3044805.3045081 |
[21] | Y. Yang, D. Xu, F. Nie, S. Yan, Y. Zhuang, Image clustering using local discriminant models and global integration, IEEE Trans. Image Process., (2010). https://doi.org/10.1109/TIP.2010.2049235 doi: 10.1109/TIP.2010.2049235 |
[22] | J. Wang, A. Hilton, J. Jiang, Spectral analysis network for deep representation learning and image clustering, Int. Conf. Multimedia Expo, (2019), 1540–1545. https://doi.org/10.1109/ICME.2019.00266 doi: 10.1109/ICME.2019.00266 |
[23] | O. Nina, J. Moody, C. Milligan, A decoder-free approach for unsupervised clustering and manifold learning with random triplet mining, IEEE Int. Conf. Comput. Vis. Workshops, (2019). https://doi.org/10.1109/ICCVW.2019.00493 doi: 10.1109/ICCVW.2019.00493 |
[24] | S. A. Makhzani, J. Shlens, N. Jaitly, I. J. Goodfellow, Adversarial autoencoders, preprint, arXiv: 1511.05644. |
[25] | J. Xie, R. Girshick, A. Farhadi, Unsupervised deep embedding for clustering analysis, Int. Conf. Mach. Learn., (2016), 740–749. https://doi.org/10.48550/arXiv.1511.06335 doi: 10.48550/arXiv.1511.06335 |
[26] | X. Guo, L. Gao, X. Liu, J. Yin, Improved deep embedded clustering with local structure preservation, Int. Conf. Mach. Learn., (2017), 1753–1759. |
[27] | Y. Opochinsky, S. E. Chazan, S. Gannot, J. Goldberger, K-Autoencoders deep clustering, IEEE trans. acoust. speech signal process., (2020), 4037–4041. https://doi.org/10.1109/ICASSP40776.2020.9053109 doi: 10.1109/ICASSP40776.2020.9053109 |
[28] | D. P. Kingma, M. Welling, Auto-encoding variational Bayes, preprint, arXiv: 1312.6114. |
[29] | J. L. W. V. Jensen, Sur les fonctions convexes et les inégalités entre les valeurs moyennes, Acta Math., 30 (1906), 175–193. https://doi.org/10.1007/BF02418571 doi: 10.1007/BF02418571 |
[30] | N. Dilokthanakul, P. A. Mediano, M. Garnelo, M. C. Lee, H. Salimbeni, K. Arulkumaran, et al., Deep unsupervised clustering with gaussian mixture variational autoencoders, preprint, arXiv: 1611.02648. |
[31] | A. M. Martinez, A. C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell., 23 (2001), 228–233. https://doi.org/10.1109/34.908974 doi: 10.1109/34.908974 |
[32] | K. Kamnitsas, D. C. Castro, L. L. Folgoc, I. Walker, R. Tanno, D. Rueckert, et al., Semi-supervised learning via compact latent space clustering, Int. Conf. Mach. Learn., (2018), 2459–2468. https://doi.org/10.48550/arXiv.1806.02679 doi: 10.48550/arXiv.1806.02679 |
[33] | B. Yang, X. Fu, N. D. Sidiropoulos, M. Hong, Towards k-means-friendly spaces: Simultaneous deep learning and clustering, Int. Conf. Mach. Learn., (2017), 5888–5901. |