With the rapid development of network technology and small handheld devices, the amount of data has significantly increased and various kinds of data can be supplied to us at the same time. Recently, hashing technology has become popular in executing large-scale similarity search and image matching tasks. However, most of the prior hashing methods are mainly focused on the choice of the high-dimensional feature descriptor for learning effective hashing functions. In practice, real world image data collected from multiple scenes cannot be descriptive enough by using a single type of feature. Recently, several unsupervised multi-view hashing learning methods have been proposed based on matrix factorization, anchor graph and metric learning. However, large quantization error will be introduced via a sign function and the robustness of multi-view hashing is ignored. In this paper we present a novel feature adaptive multi-view hashing (FAMVH) method based on a robust multi-view quantization framework. The proposed method is evaluated on three large-scale benchmarks CIFAR-10, CIFAR-20 and Caltech-256 for approximate nearest neighbor search task. The experimental results show that our approach can achieve the best accuracy and efficiency in the three large-scale datasets.
Citation: Li Sun, Bing Song. Feature adaptive multi-view hash for image search[J]. Electronic Research Archive, 2023, 31(9): 5845-5865. doi: 10.3934/era.2023297
With the rapid development of network technology and small handheld devices, the amount of data has significantly increased and various kinds of data can be supplied to us at the same time. Recently, hashing technology has become popular in executing large-scale similarity search and image matching tasks. However, most of the prior hashing methods are mainly focused on the choice of the high-dimensional feature descriptor for learning effective hashing functions. In practice, real world image data collected from multiple scenes cannot be descriptive enough by using a single type of feature. Recently, several unsupervised multi-view hashing learning methods have been proposed based on matrix factorization, anchor graph and metric learning. However, large quantization error will be introduced via a sign function and the robustness of multi-view hashing is ignored. In this paper we present a novel feature adaptive multi-view hashing (FAMVH) method based on a robust multi-view quantization framework. The proposed method is evaluated on three large-scale benchmarks CIFAR-10, CIFAR-20 and Caltech-256 for approximate nearest neighbor search task. The experimental results show that our approach can achieve the best accuracy and efficiency in the three large-scale datasets.
[1] | K. D. Doan, P. Yang, P. Li, One loss for quantization: Deep hashing with discrete wasserstein distributional matching, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 9447–9457. |
[2] | J. M. Guo, A. W. H. Prayuda, H. Prasetyo, S. Seshathiri, Deep learning based image retrieval with unsupervised double bit hashing, IEEE Trans. Circ. Syst. Vid. Technol., (2023), 1–15. https://doi.org/10.1109/TCSVT.2023.3268091 doi: 10.1109/TCSVT.2023.3268091 |
[3] | S. Li, X. Li, J. Lu, J. Zhou, Self-supervised video hashing via bidirectional transformers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 13549–13558. |
[4] | L. Wang, Y. Pan, C. Liu, H. Lai, J. Yin, Y. Liu, Deep hashing with minimal-distance-separated hash centers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 23455–23464. |
[5] | V. Gaede, O. Günther, Multidimensional access methods, ACM Comput. Surv., 30 (1998), 170–231. https://doi.org/10.1145/280277.280279 doi: 10.1145/280277.280279 |
[6] | J. H. Friedman, J. L. Bentley, R. A. Finkel, An algorithm for finding best matches in logarithmic expected time, ACM Trans. Math. Softw., 1977. https://doi.org/10.1145/355744.355745 doi: 10.1145/355744.355745 |
[7] | A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in International Conference on Very Large Data Bases, (1999), 518–529. |
[8] | Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in Advances in Neural Information Processing Systems, (2008), 1753–1760. |
[9] | A.Z. Broder, On the resemblance and containment of documents, in Compression and Complexity of Sequences, (1997), 21–29. |
[10] | M. S. Charikar, Similarity estimation techniques from rounding algorithms, Appl. Comput. Harmon. Anal., (2002), 380–388. |
[11] | M. Raginsky, S. Lazebnik, Locality-sensitive binary codes from shift-invariant kernels, in Advances in Neural Information Processing Systems, (2009), 1509–1517. |
[12] | A. Torralba, R. Fergus, Y. Weiss, Small codes and large image databases for recognition, in IEEE Conference on Computer Vision and Pattern Recognition, (2008), 1–8. https://doi.org/10.1109/CVPR.2008.4587633 |
[13] | Y. Weiss, R. Fergus, A. Torralba, Multidimensional spectral hashing, in European Conference on Computer Vision, Springer, (2012), 340–353. https://doi.org/10.1007/978-3-642-33715-4_25 |
[14] | W. Liu, J. Wang, S. Kumar, S. F. Chang, Hashing with graphs, in Proceedings of the 28th International Conference on Machine Learning, (2011). |
[15] | Q. Y. Jiang, W. J. Li, Scalable graph hashing with feature transformation, in International Joint Conference on Artificial Intelligence, (2015), 2248–2254. |
[16] | L. Chen, D. Xu, I. W. H. Tsang, X. Li, Spectral embedded hashing for scalable image retrieval. IEEE Trans. Cybern., 44 (2014), 1180–1190. https://doi.org/10.1109/TCYB.2013.2281366 doi: 10.1109/TCYB.2013.2281366 |
[17] | Y. Gong, S. Lazebnik, A. Gordo, F. Perronnin, Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), 2916–2929. https://doi.org/10.1109/TPAMI.2012.193 |
[18] | H. Jegou, M. Douze, C. Schmid, Product quantization for nearest neighbor search, Trans. Pattern Anal. Mach. Intell., 33 (2011), 117–128. https://doi.org/10.1109/TPAMI.2010.57 doi: 10.1109/TPAMI.2010.57 |
[19] | M. Norouzi, D. J. Fleet, Cartesian k-means, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2013), 3017–3024. |
[20] | T. Zhang, C. Du, J. Wang, Composite quantization for approximate nearest neighbor search, in Proceedings of the 31st International Conference on Machine Learning, 32 (2014), 838–846. |
[21] | G. Shakhnarovich, Learning Task-Specific Similarity, PhD thesis, Massachusetts Institute of Technology, 2005. |
[22] | G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 313 (2006), 504–507. https://doi.org/10.1126/science.1127647 doi: 10.1126/science.1127647 |
[23] | B. Kulis, T. Darrell, Learning to hash with binary reconstructive embeddings, in Advances in Neural Information Processing Systems, (2009), 22. |
[24] | J. Wang, S. Kumar, S. F. Chang, Semi-supervised hashing for large-scale search, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012), 2393–2406. https://doi.org/10.1109/TPAMI.2012.48 doi: 10.1109/TPAMI.2012.48 |
[25] | M. M. Bronstein, A. M. Bronstein, F. Michel, N. Paragios, Data fusion through cross-modality metric learning using similarity-sensitive hashing, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), 3594–3601. https://doi.org/10.1109/CVPR.2010.5539928 |
[26] | S. Kumar, R. Udupa, Learning hash functions for cross-view similarity search, in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, (2011), 1360–1365. |
[27] | Y. Zhen, D. Y. Yeung, Co-regularized hashing for multimodal data, in Advances in Neural Information Processing Systems, (2012), 25. |
[28] | D. Zhang, F. Wang, L. Si, Composite hashing with multiple information sources, in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, (2011), 225–234. https://doi.org/10.1145/2009916.2009950 |
[29] | S. Kim, S. Choi, Multi-view anchor graph hashing, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, (2013), 3123–3127. https://doi.org/10.1109/ICASSP.2013.6638233 |
[30] | S. Kim, Y. Kang, S. Choi, Sequential spectral learning to hash with multiple representations, in European Conference on Computer Vision, Springer, (2012), 538–551. https://doi.org/10.1007/978-3-642-33715-4_39 |
[31] | Q. Wang, L. Si, B. Shen, Learning to hash on partial multi-modal data, in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, (2015), 3904–3910. |
[32] | L. Liu, M. Yu, L. Shao, Multiview alignment hashing for efficient image search, IEEE Trans. Image Process., 24 (2015), 956–966. https://doi.org/10.1109/TIP.2015.2390975 doi: 10.1109/TIP.2015.2390975 |
[33] | X. Liu, J. He, B. Lang, Multiple feature kernel hashing for large-scale visual search, Pattern Recogn., 47 (2014), 748–757. https://doi.org/10.1016/j.patcog.2013.08.022 doi: 10.1016/j.patcog.2013.08.022 |
[34] | X. Cai, F. Nie, H. Huang, Multi-view k-means clustering on big data, in Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, 2013. |
[35] | J. Song, Y. Yang, X. Li, Z. Huang, Y. Yang, Robust hashing with local models for approximate similarity search, IEEE Trans. Cybern., 44 (2014), 1225–1236. https://doi.org/10.1109/TCYB.2013.2289351 doi: 10.1109/TCYB.2013.2289351 |
[36] | F. Nie, H. Huang, X. Cai, C. Ding, Efficient and robust feature selection via joint 2, 1-norms minimization, in Advances in Neural Information Processing Systems, (2010), 1813–1821. |
[37] | Z. Ma, F. Nie, Y. Yang, J. R. R. Uijlings, N. Sebe, Web image annotation via subspace-sparsity collaborated feature selection, IEEE Trans. Multimedia, 14 (2012), 1021–1030. https://doi.org/10.1109/TMM.2012.2187179 doi: 10.1109/TMM.2012.2187179 |
[38] | X. Ren, L. Bo, D. Fox, RGB-(D) scene labeling: Features and algorithms, in 2012 IEEE Conference on Computer Vision and Pattern Recognition, (2012), 2759–2766. https://doi.org/10.1109/CVPR.2012.6247999 |
[39] | K. Lai, L. Bo, X. Ren, D. Fox, A large-scale hierarchical multi-view RGB-D object dataset, in 2011 IEEE International Conference on Robotics and Automation, (2011), 1817–1824. https://doi.org/10.1109/ICRA.2011.5980382 |
[40] | P. H. Schönemann, A generalized solution of the orthogonal procrustes problem, Psychometrika, 31 (1966), 1–10. https://doi.org/10.1007/BF02289451 doi: 10.1007/BF02289451 |
[41] | G. Ding, Y. Guo, J. Zhou, Collective matrix factorization hashing for multimodal data, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014), 2083–2090. |
[42] | A. Torralba, R. Fergus, W. T. Freeman, 80 million tiny images: A large data set for nonparametric object and scene recognition, IEEE Trans. Pattern Anal. Mach. Intell., 30 (2008), 1958–1970. https://doi.org/10.1109/TPAMI.2008.128 doi: 10.1109/TPAMI.2008.128 |
[43] | A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, C. Schmid, Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part Ⅰ, Springer, Heidelberg, 2012. |
[44] | A. Oliva, A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, Int. J. Comput. Vision, 42 (2001), 145–175. https://doi.org/10.1023/A:1011139631724 doi: 10.1023/A:1011139631724 |
[45] | N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), (2005), 886–893. https://doi.org/10.1109/CVPR.2005.177 |
[46] | T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell., 24 (2002), 971–987. https://doi.org/10.1109/TPAMI.2002.1017623 doi: 10.1109/TPAMI.2002.1017623 |
[47] | A. Vedaldi, B. Fulkerson, VLFeat: An open and portable library of computer vision algorithms, in Proceedings of the 18th ACM international conference on Multimedia, (2010), 1469–1472. https://doi.org/10.1145/1873951.1874249 |