Sketch image retrieval is an important branch of the image retrieval field, mainly relying on sketch images as queries for content search. The acquisition process of sketch images is relatively simple and in some scenarios, such as when it is impossible to obtain photos of real objects, it demonstrates its unique practical application value, attracting the attention of many researchers. Furthermore, traditional generalized sketch image retrieval has its limitations when it comes to practical applications; merely retrieving images from the same category may not adequately identify the specific target that the user desires. Consequently, fine-grained sketch image retrieval merits further exploration and study. This approach offers the potential for more precise and targeted image retrieval, making it a valuable area of investigation compared to traditional sketch image retrieval. Therefore, we comprehensively review the fine-grained sketch image retrieval technology based on deep learning and its applications and conduct an in-depth analysis and summary of research literature in recent years. We also provide a detailed introduction to three fine-grained sketch image retrieval datasets: Queen Mary University of London (QMUL) ShoeV2, ChairV2 and PKU Sketch Re-ID, and list common evaluation metrics in the sketch image retrieval field, while showcasing the best performance achieved for these datasets. Finally, we discuss the existing challenges, unresolved issues and potential research directions in this field, aiming to provide guidance and inspiration for future research.
Citation: Qing Luo, Xiang Gao, Bo Jiang, Xueting Yan, Wanyuan Liu, Junchao Ge. A review of fine-grained sketch image retrieval based on deep learning[J]. Mathematical Biosciences and Engineering, 2023, 20(12): 21186-21210. doi: 10.3934/mbe.2023937
Sketch image retrieval is an important branch of the image retrieval field, mainly relying on sketch images as queries for content search. The acquisition process of sketch images is relatively simple and in some scenarios, such as when it is impossible to obtain photos of real objects, it demonstrates its unique practical application value, attracting the attention of many researchers. Furthermore, traditional generalized sketch image retrieval has its limitations when it comes to practical applications; merely retrieving images from the same category may not adequately identify the specific target that the user desires. Consequently, fine-grained sketch image retrieval merits further exploration and study. This approach offers the potential for more precise and targeted image retrieval, making it a valuable area of investigation compared to traditional sketch image retrieval. Therefore, we comprehensively review the fine-grained sketch image retrieval technology based on deep learning and its applications and conduct an in-depth analysis and summary of research literature in recent years. We also provide a detailed introduction to three fine-grained sketch image retrieval datasets: Queen Mary University of London (QMUL) ShoeV2, ChairV2 and PKU Sketch Re-ID, and list common evaluation metrics in the sketch image retrieval field, while showcasing the best performance achieved for these datasets. Finally, we discuss the existing challenges, unresolved issues and potential research directions in this field, aiming to provide guidance and inspiration for future research.
[1] | P. Xu, T. M. Hospedales, Q. Yin, Y. Z. Song, T. Xiang, L. Wang, Deep learning for free-hand sketch: A survey, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2022), 285–312. https://doi.org/10.1109/TPAMI.2022.3148853 doi: 10.1109/TPAMI.2022.3148853 |
[2] | A. K. Bhunia, P. N. Chowdhury, Y. Yang, T. M. Hospedales, T. Xiang, Y. Z. Song, Vectorization and rasterization: Self-supervised learning for sketch and handwriting, in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), (2021), 5668–5677. https://doi.org/10.1109/CVPR46437.2021.00562 |
[3] | A. Qi, Y. Gryaditskaya, J. Song, Y. Yang, Y. Qi, T. M. Hospedales, et al., Toward fine-grained sketch-based 3D shape retrieval, IEEE Trans. Image Process., 30 (2021), 8595–8606. https://doi.org/10.1109/TIP.2021.3118975 doi: 10.1109/TIP.2021.3118975 |
[4] | P. Sangkloy, N. Burnell, C. Ham, J. Hays, The sketchy database: Learning to retrieve badly drawn bunnies, ACM Trans. Graphics, 35 (2016), 1–12. https://doi.org/10.1145/2897824.2925954 doi: 10.1145/2897824.2925954 |
[5] | Q. Yu, F. Liu, Y. Z. Song, T. Xiang, T. M. Hospedales, C. C. Loy, Sketch me that shoe, in Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 799–807. https://doi.org/10.1109/CVPR.2016.93 |
[6] | Y. Cao, C. Wang, L. Zhang, L. Zhang, Edgel index for large-scale sketch-based image search, in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 761–768. https://doi.org/10.1109/CVPR.2011.5995460 |
[7] | Y. Cao, H. Wang, C. Wang, Z. Li, L. Zhang, L. Zhang, Mindfinder: Interactive sketch-based image search on millions of images, in Proceedings of The 18th ACM International Conference on Multimedia (MM'10), (2010), 1605–1608. https://doi.org/10.1145/1873951.1874299 |
[8] | M. Eitz, K. Hildebrand, T. Boubekeur, M. Alexa, Sketch-based image retrieval: Benchmark and bag-of-features descriptors, IEEE Trans. Visualization Comput. Graphics, 17 (2011), 1624–1636. https://doi.org/10.1109/TVCG.2010.266 doi: 10.1109/TVCG.2010.266 |
[9] | J. Collomosse, T. Bui, M. Wilber, C. Fang, H. Jin, Sketching with style: Visual search with sketches and aesthetic context, in Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 2679–2687. https://doi.org/10.1109/ICCV.2017.290 |
[10] | Y. Li, T. M. Hospedales, Y. Song, S. Gong, Fine-grained sketch-based image retrieval by matching deformable part models, in The British Machine Vision Conference(BMVC), (2014). |
[11] | J. Song, Q. Yu, Y. Z. Song, T. Xiang, T. M. Hospedales, Deep spatial-semantic attention for fine-grained sketch-based image retrieval, in Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 5551–5560. https://doi.org/10.1109/ICCV.2017.592 |
[12] | J. Zhang, F. Shen, L. Liu, F. Zhu, M. Yu, L. Shao, et al., Generative domain-migration hashing for sketch-to-image retrieval, in Proceedings of the 2018 European Conference on Computer Vision (ECCV), (2018), 297–314. https://doi.org/10.1007/978-3-030-01216-8_19 |
[13] | J. Song, Y. Z. Song, T. Xiang, T. M. Hospedales, Fine-Grained image retrieval: The text/sketch input dilemma, in The British Machine Vision Conference(BMVC), (2017). |
[14] | A. Sain, A. K. Bhunia, Y. Yang, T. Xiang, Y. Song, Cross-modal hierarchical modelling for fine-grained sketch based image retrieval, preprint, arXiv: 2007.15103. |
[15] | S. L. Yan, Y. F. Zhang, M. H. Xie, D. C. Zhang, Z. T. Yu, Cross-domain person re-identification with pose-invariant feature decomposition and hypergraph structure alignment, Neurocomputing, 467 (2022), 229–241. https://doi.org/10.1016/j.neucom.2021.09.054 doi: 10.1016/j.neucom.2021.09.054 |
[16] | H. Li, M. Liu, Z. Hu, F. Nie, Z. Yu, Intermediary-guided bidirectional spatial-temporal aggregation network for video-based visible-infrared person re-identification, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 4962–4972. https://doi.org/10.1109/TCSVT.2023.3246091 doi: 10.1109/TCSVT.2023.3246091 |
[17] | H. Li, K. Xu, J. Li, Z. Yu, Dual-stream reciprocal disentanglement learning for domain adaptation person re-identification, Knowl. Based Syst., 251 (2022), 109315. https://doi.org/10.1016/j.knosys.2022.109315 doi: 10.1016/j.knosys.2022.109315 |
[18] | S. Wang, R. Liu, H. Li, G. Qi, Z. Yu, Occluded person re-identification via defending against attacks from obstacles, IEEE Trans. Inf. Forensics Secur., 18 (2022), 147–161. https://doi.org/10.1109/TIFS.2022.3218449 doi: 10.1109/TIFS.2022.3218449 |
[19] | H. Li, N. Dong, Z. Yu, D. Tao, G. Qi, Triple adversarial learning and multi-view imaginative reasoning for unsupervised domain adaptation person re-identification, IEEE Trans. Circuits Syst. Video Technol., 32 (2021), 2814–2830. https://doi.org/10.1109/TCSVT.2021.3099943 doi: 10.1109/TCSVT.2021.3099943 |
[20] | L. Pang, Y. Wang, Y. Z. Song, T. J. Huang, Y. H. Tian, Cross-domain adversarial feature learning for sketch re-identification, in Proceedings of the 26th ACM international conference on Multimedia (MM'18), (2018), 609–617. https://doi.org/10.1145/3240508.3240606 |
[21] | S. Gui, Y. Zhu, X. Qin, X. Ling, Learning multi-level domain invariant features for sketch re-identification, Neurocomputing, 403 (2020), 294–303. https://doi.org/10.1016/j.neucom.2020.04.060 doi: 10.1016/j.neucom.2020.04.060 |
[22] | D. Gray, S. Brennan, H. Tao, Evaluating appearance models for recognition, reacquisition, and tracking, in Proceedings of the IEEE international workshop on performance evaluation for tracking and surveillance (PETS), (2007), 1–7. |
[23] | L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scalable person re-identification: A benchmark, in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 1116–1124. https://doi.org/10.1109/ICCV.2015.133 |
[24] | Z. Li, S. Chang, F. Liang, T. S. Huang, L. Cao, J. R. Smith, Learning locally-adaptive decision functions for person verification, in Proceedings of the 2013 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2013), 3610–3617. https://doi.org/10.1109/CVPR.2013.463 |
[25] | R. Kushwaha, N. Nain, PUG-FB: Person-verification using geometric and Haralick featuresof footprint biometric, Multimedia Tools Appl., 79 (2020), 2671–2701. https://doi.org/10.1007/s11042-019-08149-0 doi: 10.1007/s11042-019-08149-0 |
[26] | K. Pang, Y. Yang, T. M. Hospedales, T. Xiang, Y. Z. Song, Solving mixed-modal jigsaw puzzle forfine-grained sketch-based image retrieval, in Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020) 10347–10355. https://doi.org/10.1109/CVPR42600.2020.01036 |
[27] | F. Radenovic, G. Tolias, O. Chum, Deep shape matching, in Proceedings of the 2018 European Conference on Computer Vision (ECCV), (2018), 751–767. https://doi.org/10.1007/978-3-030-01228-1_46 |
[28] | X. Lin, Y. Duan, Q. Dong, J. Lu, J. Zhou, Deep variational metric learning, in Proceedings of the 2018 European Conference on Computer Vision (ECCV), (2018), 689–704. https://doi.org/10.1007/978-3-030-01267-0_42 |
[29] | J. Xu, H. Sun, Q. Qi, J. Wang, C. Ge, L. Zhang, et al., DlA-Net for FG-SBIR: Dynamic local aligned network for fine-grained sketch-based image retrieval, in Proceedings of the 29th ACM international conference on Multimedia (MM'21), (2021), 5609–5618. https://doi.org/10.1145/3474085.3475705 |
[30] | H. Sun, J. Xu, J. Wang, Q. Qi, C. Ge, J. Liao, Dli-net: Dual local interaction network for fine-grained sketch-based image retrieval, IEEE Trans. Circuits Syst. Video Technol., 32 (2022), 7177–7189.https://doi.org/10.1109/TCSVT.2022.3171972 doi: 10.1109/TCSVT.2022.3171972 |
[31] | Z. Zhang, Z. Xie, Z. Chen, Y. Han, X. Luo, X. Xu, Expansion window local alignment weighted network for fine-grained sketch-based image retrieval, Pattern Recognit., 144 (2023), 109892. https://doi.org/10.1016/j.patcog.2023.109892 doi: 10.1016/j.patcog.2023.109892 |
[32] | Z. Ling, Z. Xing, J. Li, L. Niu, Multi-level region matching for fine-grained sketch-based image retrieval, in Proceedings of the 30th ACM international conference on Multimedia (MM'22), (2022), 462–470. https://doi.org/10.1145/3503161.3548147 |
[33] | K. Pang, K. Li, Y. Yang, H. Zhang, T. M. Hospedales, T. Xiang, et al., Generalising fine-grained sketch-based image retrieval, in Proceedings of the 2019 IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), (2019), 677–686. https://doi.org/10.1109/CVPR.2019.00077 |
[34] | A. Sain, A. K. Bhunia, Y. Yang, T. Xiang, Y. Z. Song, Stylemeup: Towards style-agnostic sketch-based image retrieval, in Proceedings of the 2021 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR), (2021), 8504–8513. https://doi.org/10.1109/CVPR46437.2021.00840 |
[35] | Z. Ling, Z. Xing, J. Zhou, X. Zhou, Conditional stroke recovery for fine-grained sketch-based image retrieval, in Proceedings of the 2022 European Conference on Computer Vision (ECCV), (2022), 722–738. https://doi.org/10.1007/978-3-031-19809-0_41 |
[36] | A. K. Bhunia, P. N. Chowdhury, A. Sain, Y. Yang, T. Xiang, Y. Z. Song, More photos are all you need: Semi-supervised learning for fine-grained sketch based image retrieval, in Proceedings of the 2021 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR), (2021), 4247–4256. https://doi.org/10.1109/CVPR46437.2021.00423 |
[37] | A. K. Bhunia, A. Sain, P. H. Shah, A. Gupta, P. N. Chowdhury, T. Xiang, et al., Adaptive fine-grained sketch-based image retrieval, in Proceedings of the 2022 European Conference on Computer Vision (ECCV), (2022), 163–181. https://doi.org/10.1007/978-3-031-19836-6_10 |
[38] | D. Ha, D. Eck, A neural representation of sketch drawings, preprint, arXiv: 1704.03477. |
[39] | U. R. Muhammad, Y. Yang, Y. Z. Song, T. Xiang, T. M. Hospedales, Learning deep sketch abstraction, in Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 8014–8023. https://doi.org/10.1109/CVPR.2018.00836 |
[40] | A. K. Bhunia, Y. Yang, T. M. Hospedales, T. Xiang, Y. Z. Song, Sketch less for more: On-the-fly fine-grained sketch-based image retrieval, in Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 9779–9788. https://doi.org/10.1109/CVPR.2018.00836 |
[41] | D. Wang, H. Sapkota, X. Liu, Q. Yu, Deep reinforced attention regression for partial sketch based image retrieval, in Proceedings of the 2021 IEEE International Conference on Data Mining (ICDM), (2021), 669–678. https://doi.org/10.1109/CVPR.2018.00836 |
[42] | D. Dai, X. Tang, Y. Liu, S. Xia, G. Wang, Multi-granularity association learning for on-the-fly fine-grained sketch-based image retrieval, Knowl. Based Syst., 253 (2022), 109447. https://doi.org/10.1016/j.knosys.2022.109447 doi: 10.1016/j.knosys.2022.109447 |
[43] | A. K. Bhunia, S. Koley, A. F. U. R. Khilji, A. Sain, P. N. Chowdhury, T. Xiang, Sketching without worrying: Noise-tolerant sketch-based image retrieval, in Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 999–1008. https://doi.org/10.1109/CVPR52688.2022.00107 |
[44] | F. Yang, Y. Wu, Z. Wang, X. Li, S. Sakti, S. Nakamura, Instance-level heterogeneous domain adaptation for limited-labeled sketch-to-photo retrieval, IEEE Trans. Mult., 23 (2020), 2347–2360. https://doi.org/10.1109/TMM.2020.3009476 doi: 10.1109/TMM.2020.3009476 |
[45] | Y. Gong, L. Huang, L. Chen, Eliminate deviation with deviation for data augmentation and a general multi-modal data learning method, preprint, arXiv: 2101.08533. |
[46] | C. Chen, M. Ye, M. Qi, B. Du, Sketch transformer: Asymmetrical disentanglement learning from dynamic synthesis, in Proceedings of the 30th ACM international conference on Multimedia (MM'22), (2022), 4012–4020. https://doi.org/10.1145/3503161.3547993 |
[47] | Y. Zhang, Y. Wang, H. Li, S. Li, Cross-compatible embedding and semantic consistent feature construction for sketch re-identification, in Proceedings of the 30th ACM International Conference on Multimedia (MM'22), (2022), 3347–3355. https://doi.org/10.1145/3503161.3548224 |
[48] | F. Zhu, Y. Zhu, X. Jiang, J. Ye, Cross-domain attention and center loss for sketch re-identification, IEEE Trans. Inf. Forensics Securit., 17 (2022), 3421–3432. https://doi.org/10.1109/TIFS.2022.3208811 |
[49] | R. F. Rachmadi, S. M. S. Nugroho, I. K. E. Purnama, Revisiting dropout regularization for cross-modality person re-identification, IEEE Access, 10 (2022), 102195–102209. https://doi.org/10.1109/ACCESS.2022.3208562 doi: 10.1109/ACCESS.2022.3208562 |
[50] | B. Yuan, B. Chen, Z. Tan, X. Shao, B. K. Bao, Unbiased feature enhancement framework for cross-modality person re-identification, Multimedia Syst., 28 (2022), 749–759. https://doi.org/10.1007/s00530-021-00872-9 doi: 10.1007/s00530-021-00872-9 |
[51] | C. Gao, Q. Liu, Q. Xu, L. Wang, J. Liu, C. Zou, Sketchycoco: Image generation from freehand scene sketches, in Proceedings of the 2020 IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR), (2020), 5174–5183. https://doi.org/10.1109/CVPR42600.2020.00522 |
[52] | F. Liu, C. Zou, X. Deng, R. Zuo, Y. K. Lai, C. Ma, et al., Scenesketcher: Fine-grained image retrieval with scene sketches, in Proceedings of the 2020 European Conference on Computer Vision (ECCV), (2020), 718–734. https://doi.org/10.1007/978-3-030-58529-7_42 |
[53] | K. D. D. Willis, P. K. Jayaraman, J. G. Lambourne, H. Chu, Y. Pu, Engineering sketch generation for computer-aided design, in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2021), 2105–2114. https://doi.org/10.1109/CVPRW53098.2021.00239 |
[54] | V. Jain, P. Agrawal, S. Banga, R. Kapoor, S. Gulyani, Sketch2Code: Transformation of sketches to UI in real-time using deep neural network, preprint, arXiv: 1910.08930. |
[55] | D. Giunchi, S. James, D. Degraen, A. Steed, Mixing realities for sketch retrieval in virtual reality, in Proceedings of the 17th International Conference on Virtual-Reality Continuum and its Applications in Industry(VRCAI'19), (2019). https://doi.org/10.1145/3359997.3365751 |
[56] | B. Jackson, D. F. Keefe, Lift-off: Using reference imagery and freehand sketching to create 3D models in VR, IEEE Trans. Visualization Comput. Graphics, 22 (2016), 1442–1451. https://doi.org/10.1109/TVCG.2016.2518099 doi: 10.1109/TVCG.2016.2518099 |
[57] | J. C. Roberts, C. Headleand, P. D. Ritsos, Sketching designs using the five design-sheet methodology, IEEE Trans. Visualization Comput. Graphics, 22 (2015), 419–428. https://doi.org/10.1109/TVCG.2015.2467271 doi: 10.1109/TVCG.2015.2467271 |
[58] | F. Boniardi, A. Valada, W. Burgard, G. D. Tipaldi, Autonomous indoor robot navigation using a sketch interface for drawing maps and routes, in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), (2016), 2896–2901. https://doi.org/10.1109/ICRA.2016.7487453 |
[59] | F. Lin, M. Li, D. Li, T. Hospedales, Y. Z. Song, Y. Qi, Zero-shot everything sketch-based image retrieval, and in explainable style, in Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), (2023), 23349–23358. https://doi.org/10.1109/CVPR52729.2023.02236 |
[60] | X. S. Wei, Y. Z. Song, O. M. Aodha, J. Wu, Y. Peng, J. Tang, et al., Fine-grained image analysis with deep learning: A survey, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 8927–8948. https://doi.org/10.1109/TPAMI.2021.3126648 doi: 10.1109/TPAMI.2021.3126648 |
[61] | A. Sain, A. K. Bhunia, S. Koley, P. N. Chowdhury, S. Chattopadhyay, T. Xiang, et al., Exploiting unlabelled photos for stronger fine-grained SBIR, in Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), (2023), 6873–6883. https://doi.org/10.1109/CVPR52729.2023.00664 |