Research article Special Issues

IFKD: Implicit field knowledge distillation for single view reconstruction


  • Received: 30 November 2022 Revised: 14 May 2023 Accepted: 21 May 2023 Published: 19 June 2023
  • In 3D reconstruction tasks, camera parameter matrix estimation is usually used to present the single view of an object, which is not necessary when mapping the 3D point to 2D image. The single view reconstruction task should care more about the quality of reconstruction instead of the alignment. So in this paper, we propose an implicit field knowledge distillation model (IFKD) to reconstruct 3D objects from the single view. Transformations are performed on 3D points instead of the camera and keep the camera coordinate identified with the world coordinate, so that the extrinsic matrix can be omitted. Besides, a knowledge distillation structure from 3D voxel to the feature vector is established to further refine the feature description of 3D objects. Thus, the details of a 3D model can be better captured by the proposed model. This paper adopts ShapeNet Core dataset to verify the effectiveness of the IFKD model. Experiments show that IFKD has strong advantages in IOU and other core indicators compared with the camera matrix estimation methods, which verifies the feasibility of the new proposed mapping method.

    Citation: Jianyuan Wang, Huanqiang Xu, Xinrui Hu, Biao Leng. IFKD: Implicit field knowledge distillation for single view reconstruction[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 13864-13880. doi: 10.3934/mbe.2023617

    Related Papers:

  • In 3D reconstruction tasks, camera parameter matrix estimation is usually used to present the single view of an object, which is not necessary when mapping the 3D point to 2D image. The single view reconstruction task should care more about the quality of reconstruction instead of the alignment. So in this paper, we propose an implicit field knowledge distillation model (IFKD) to reconstruct 3D objects from the single view. Transformations are performed on 3D points instead of the camera and keep the camera coordinate identified with the world coordinate, so that the extrinsic matrix can be omitted. Besides, a knowledge distillation structure from 3D voxel to the feature vector is established to further refine the feature description of 3D objects. Thus, the details of a 3D model can be better captured by the proposed model. This paper adopts ShapeNet Core dataset to verify the effectiveness of the IFKD model. Experiments show that IFKD has strong advantages in IOU and other core indicators compared with the camera matrix estimation methods, which verifies the feasibility of the new proposed mapping method.



    加载中


    [1] M. Li, H. Zhang, D2im-net: Learning detail disentangled implicit fields from single images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 10246–10255. https://doi.org/10.1109/CVPR46437.2021.01011
    [2] S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, H. Li, Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 2304–2314. https://doi.org/10.1109/ICCV.2019.00239
    [3] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, M. Aubry, A papier-mâché approach to learning 3d surface generation, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 216–224. https://doi.org/10.1109/CVPR.2018.00030
    [4] C. B. Choy, D. Xu, J. Gwak, K. Chen, S. Savarese, 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction, Eur. Conf. Comput. Vision, 9912 (2016), 628–644. https://doi.org/10.1007/978-3-319-46484-8_38 doi: 10.1007/978-3-319-46484-8_38
    [5] G. Yang, X. Huang, Z. Hao, M. Liu, S. Belongie, B. Hariharan, Pointflow: 3d point cloud generation with continuous normalizing flows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 4541–4550. https://doi.org/10.1109/ICCV.2019.00464
    [6] G. Chen, W. Choi, X. Yu, T. Han, M. Chandraker, Learning efficient object detection models with knowledge distillation, Adv. Neural Inform. Proc. Syst., (2017), 30.
    [7] J. J. Park, P. Florence, J. Straub, R. Newcombe, S. Lovegrove, Deepsdf: Learning continuous signed distance functions for shape representation, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2019), 165–174. https://doi.org/10.1109/CVPR.2019.00025
    [8] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, preprint, arXiv: 1503.02531.
    [9] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, Y. Jiang, Pixel2mesh: Generating 3d mesh models from single rgb images. in Proceedings of the European conference on computer vision (ECCV), (2018), 52–67. https://doi.org/10.1007/978-3-030-01252-6_4
    [10] H. Xie, H. Yao, X. Sun, S. Zhou, S. Zhang, Pix2vox: Context-aware 3d reconstruction from single and multi-view images, preprint, arXiv: 1901.11153.
    [11] H. Xie, H. Yao, S. Zhang, S. Zhou, W. Sun, Pix2vox++: Multi-scale context-aware 3d object reconstruction from single and multiple images, Int. J. Comput. Vision, 128 (2020), 2919–2935. https://doi.org/10.1007/s11263-020-01347-6 doi: 10.1007/s11263-020-01347-6
    [12] L. P. Tchapmi, V. Kosaraju, H. Rezatofighi, I. Reid, S. Savarese, Topnet: Structural point cloud decoder, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 383–392. https://doi.org/10.1109/CVPR.2019.00047
    [13] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, A. Geiger, Occupancy networks: Learning 3d reconstruction in function space. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 4460–4470. https://doi.org/10.1109/CVPR.2019.00459
    [14] Z. Chen, A. Tagliasacchi, H. Zhang, Bsp-net: Generating compact meshes via binary space partitioning, preprint, arXiv: 1911.06971.
    [15] Q. Xu, W. Wang, D. Ceylan, R. Mech, U. Neumann, Disn: Deep implicit surface network for high-quality single-view 3d reconstruction, Adv. Neural Inform. Process. Syst., (2019), 32.
    [16] J. Bechtold, M. Tatarchenko, V. Fischer, T. Brox, Fostering generalization in single-view 3d reconstruction by learning a hierarchy of local and global shape priors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 15880–15889. https://doi.org/10.1109/CVPR46437.2021.01562
    [17] Y. Duan, H. Zhu, H. Wang, L. Yi, R. Nevatia, L. J. Guibas, Curriculum deepsdf, Eur. Conf. Comput. Vision, (2020), 51–67. https://doi.org/10.1007/978-3-030-58598-3_4 doi: 10.1007/978-3-030-58598-3_4
    [18] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, Y. Bengio, Fitnets: Hints for thin deep nets, preprint, arXiv: 1412.6550.
    [19] Q. Li, S. Jin, J. Yan, Mimicking very efficient network for object detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 7341–7349. https://doi.org/10.1109/CVPR.2017.776
    [20] T. Wang, L. Yuan, X. Zhang, J. Feng, Distilling object detectors with fine-grained feature imitation, preprint, arXiv: 1906.03609.
    [21] W. E. Lorensen, H. E. Cline, Marching cubes: A high resolution 3d surface construction algorithm, ACM SIGGRAPH Comput. Graphics, 21 (1987), 163–169. https://doi.org/10.1145/37402.37422 doi: 10.1145/37402.37422
    [22] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention, Springer, 9351 (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
    [23] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2009.
    [24] Z. Chen, H. Zhang, Learning implicit fields for generative shape modeling, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 5932-5941. https://doi.org/10.1109/CVPR.2019.00609
    [25] C. Häne, S. Tulsiani, J. Malik, Hierarchical surface prediction for 3d object reconstruction, preprint, arXiv: 1704.00710.
    [26] B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, preprint, arXiv: 1505.00853.
    [27] J. Chibane, T. Alldieck, G. Pons-Moll, Implicit functions in feature space for 3d shape reconstruction and completion, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 6968–6979. https://doi.org/10.1109/CVPR42600.2020.00700
    [28] J. Jin, A. G. Patil, Z. Xiong, H. Zhang, Dr-kfs: A differentiable visual similarity metric for 3d shape reconstruction, Eur. Conf. Comput. Vision, (2020), 295–311. https://doi.org/10.1007/978-3-030-58589-1_18 doi: 10.1007/978-3-030-58589-1_18
    [29] B. Yang, S. Wang, A. Markham, N. Trigoni, Robust attentional aggregation of deep feature sets for multi-view 3d reconstruction, Int. J. Comput. Vision, 128 (2020), 53–73. https://doi.org/10.1007/s11263-019-01217-w doi: 10.1007/s11263-019-01217-w
    [30] M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu, Spatial transformer networks, in Proceedings of the 28th International Conference on Neural Information Processing Systems, 2 (2015), 2017–2025. https://doi.org/10.5555/2969442.2969465
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1612) PDF downloads(71) Cited by(0)

Article outline

Figures and Tables

Figures(7)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog