Review Special Issues

Survey on low-level controllable image synthesis with deep learning

  • Received: 03 September 2023 Revised: 29 October 2023 Accepted: 01 November 2023 Published: 21 November 2023
  • Deep learning, particularly generative models, has inspired controllable image synthesis methods and applications. These approaches aim to generate specific visual content using latent prompts. To explore low-level controllable image synthesis for precise rendering and editing tasks, we present a survey of recent works in this field using deep learning. We begin by discussing data sets and evaluation indicators for low-level controllable image synthesis. Then, we review the state-of-the-art research on geometrically controllable image synthesis, focusing on viewpoint/pose and structure/shape controllability. Additionally, we cover photometrically controllable image synthesis methods for 3D re-lighting studies. While our focus is on algorithms, we also provide a brief overview of related applications, products and resources for practitioners.

    Citation: Shixiong Zhang, Jiao Li, Lu Yang. Survey on low-level controllable image synthesis with deep learning[J]. Electronic Research Archive, 2023, 31(12): 7385-7426. doi: 10.3934/era.2023374

    Related Papers:

  • Deep learning, particularly generative models, has inspired controllable image synthesis methods and applications. These approaches aim to generate specific visual content using latent prompts. To explore low-level controllable image synthesis for precise rendering and editing tasks, we present a survey of recent works in this field using deep learning. We begin by discussing data sets and evaluation indicators for low-level controllable image synthesis. Then, we review the state-of-the-art research on geometrically controllable image synthesis, focusing on viewpoint/pose and structure/shape controllability. Additionally, we cover photometrically controllable image synthesis methods for 3D re-lighting studies. While our focus is on algorithms, we also provide a brief overview of related applications, products and resources for practitioners.



    加载中


    [1] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis with latent diffusion models, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 10684–10695.
    [2] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, et al., A comprehensive survey of AI-generated content (aigc): A history of generative AI from GAN to ChatGPT, preprint, arXiv: 2303.04226.
    [3] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. V. Arx, et al., On the opportunities and risks of foundation models, preprint, arXiv: 2108.07258.
    [4] L. Zhang, A. Rao, M. Agrawala, Adding conditional control to text-to-image diffusion models, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 3836–3847.
    [5] X. Wang, L. Xie, C. Dong, Y. Shan, Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data, in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), IEEE, (2021), 1905–1914.
    [6] H. Jonathan, J. Ajay, A. Pieter, Denoising diffusion probabilistic models, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 33 (2020), 6840–6851.
    [7] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 27 (2014), 1–9.
    [8] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang, A tutorial on energy-based learning, Predict. Struct. Data, 1 (2006), 1–59.
    [9] J. Zhou, Z. Wu, Z. Jiang, K. Huang, K. Guo, S. Zhao, Background selection schema on deep learning-based classification of dermatological disease, Comput. Biol. Med., 149 (2022), 105966. https://doi.org/10.1016/j.compbiomed.2022.105966 doi: 10.1016/j.compbiomed.2022.105966
    [10] Q. Su, F. Wang, D. Chen, G. Chen, C. Li, L. Wei, Deep convolutional neural networks with ensemble learning and transfer learning for automated detection of gastrointestinal diseases, Comput. Biol. Med., 150 (2022), 106054. https://doi.org/10.1016/j.compbiomed.2022.106054 doi: 10.1016/j.compbiomed.2022.106054
    [11] G. Liu, Q. Ding, H. Luo, M. Sha, X. Li, M. Ju, Cx22: A new publicly available dataset for deep learning-based segmentation of cervical cytology images, Comput. Biol. Med., 150 (2022), 106194. https://doi.org/10.1016/j.compbiomed.2022.106194 doi: 10.1016/j.compbiomed.2022.106194
    [12] L. Xu, R. Magar, A. B. Farimani, Forecasting COVID-19 new cases using deep learning methods, Comput. Biol. Med., 144 (2022), 105342. https://doi.org/10.1016/j.compbiomed.2022.105342 doi: 10.1016/j.compbiomed.2022.105342
    [13] D. P. Kingma, M. Welling, Auto-encoding variational bayes, preprint, arXiv: 1312.6114.
    [14] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, OpenAI Blog, 1 (2019), 9.
    [15] H. Huang, P. S. Yu, C. Wang, An introduction to image synthesis with generative adversarial nets, preprint, arXiv: 1803.04469.
    [16] M. Mirza, S. Osindero, Conditional generative adversarial nets, preprint, arXiv: 1411.1784.
    [17] L. A. Gatys, A. S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2016), 2414–2423. https://doi.org/10.1109/CVPR.2016.265
    [18] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, R. Szeliski, Building rome in a day, in 2009 IEEE 12th International Conference on Computer Vision, IEEE, (2009), 72–79. https://doi.org/10.1109/ICCV.2009.5459148
    [19] L. Yang, T. Yendo, M. P. Tehrani, T. Fujii, M. Tanimoto, Probabilistic reliability based view synthesis for FTV, in 2010 IEEE International Conference on Image Processing, IEEE, (2010), 1785–1788. https://doi.org/10.1109/ICIP.2010.5650222
    [20] Y. Zheng, G. Zeng, H. Li, Q. Cai, J. Du, Colorful 3D reconstruction at high resolution using multi-view representation, J. Visual Commun. Image Represent., 85 (2022), 103486. https://doi.org/10.1016/j.jvcir.2022.103486 doi: 10.1016/j.jvcir.2022.103486
    [21] J. Deng, W. Dong, R. Socher, L. Li, L. Kai, F. Li, ImageNet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848
    [22] S. Christoph, B. Romain, V. Richard, G. Cade, W. Ross, C. Mehdi, et al., Laion-5b: An open large-scale dataset for training next generation image-text models, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 35 (2022), 25278–25294.
    [23] S. M. Mohammad, S. Kiritchenko, Wikiart emotions: An annotated dataset of emotions evoked by art, in Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), (2018), 1–14.
    [24] M. Ben, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing scenes as Neural Radiance Fields for view synthesis, in European Conference on Computer Vision, Springer, (2020), 405–421. https://doi.org/10.1007/978-3-030-58452-8_24
    [25] S. Huang, Q. Li, J. Liao, L. Liu, L. Li, An overview of controllable image synthesis: Current challenges and future trends, SSRN, 2022.
    [26] A. Tsirikoglou, G. Eilertsen, J. Unger, A survey of image synthesis methods for visual machine learning, Comput. Graphics Forum, 39 (2020), 426–451. https://doi.org/10.1111/cgf.14047 doi: 10.1111/cgf.14047
    [27] H. Ren, G. Stella, B. S. Sami, Controllable GAN synthesis using non-rigid structure-from-motion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 678–687.
    [28] J. Zhang, A. Siarohin, Y. Liu, H. Tang, N. Sebe, W. Wang, Training and tuning generative neural radiance fields for attribute-conditional 3D-aware face generation, preprint, arXiv: 2208.12550.
    [29] J. Ko, K. Cho, D. Choi, K. Ryoo, S. Kim, 3D GAN inversion with pose optimization, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, (2023), 2967–2976.
    [30] S. Yang, W. Wang, B. Peng, J. Dong, Designing a 3D-aware StyleNeRF encoder for face editing, in ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2023), 1–5. https://doi.org/10.1109/ICASSP49357.2023.10094932
    [31] J. Collins, S. Goel, K. Deng, A. Luthra, L. Xu, E. Gundogdu, et al., ABO: Dataset and benchmarks for real-world 3D object understanding, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 21126–21136.
    [32] B. Yang, Y. Zhang, Y. Xu, Y. Li, H. Zhou, H. Bao, et al., Learning object-compositional Neural Radiance Field for editable scene rendering, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 13759–13768. https://doi.org/10.1109/ICCV48922.2021.01352
    [33] M. Niemeyer, A. Geiger, GIRAFFE: Representing scenes as compositional generative neural feature fields, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2021), 11453–11464.
    [34] J. Zhu, C. Yang, Y. Shen, Z. Shi, B. Dai, D. Zhao, et al., LinkGAN: Linking GAN latents to pixels for controllable image synthesis, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 7656–7666.
    [35] R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker, Multi-PIE, in 2008 8th IEEE International Conference on Automatic Face and Gesture Recognition, IEEE, (2008), 1–8. https://doi.org/10.1109/AFGR.2008.4813399
    [36] M. Boss, R. Braun, V. Jampani, J. T. Barron, C. Liu, H. P. A. Lensch, NeRD: Neural reflectance decomposition from image collections, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 12664–12674. https://doi.org/10.1109/ICCV48922.2021.01245
    [37] X. Yan, Z. Yuan, Y. Du, Y. Liao, Y. Guo, Z. Li, et al., CLEVR3D: Compositional language and elementary visual reasoning for question answering in 3D real-world scenes, preprint, arXiv: 2112.11691.
    [38] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Nießner, ScanNet: Richly-annotated 3D reconstructions of indoor scenes, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2017), 5828–5839.
    [39] T. Zhou, T. Richard, F. John, F. Graham, S. Noah, Stereo magnification: Learning view synthesis using multiplane images, ACM Trans. Graphics, 37 (2018), 1–12. https://doi.org/10.1145/3197517.3201323 doi: 10.1145/3197517.3201323
    [40] A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, et al., ShapeNet: An information-rich 3D model repository, preprint, arXiv: 1512.03012.
    [41] A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? the KITTI vision benchmark suite, in 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2012), 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
    [42] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, et al., NuScenes: A multimodal dataset for autonomous driving, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2020), 11621–11631.
    [43] S. K. Ramakrishnan, A. Gokaslan, E. Wijmans, O. Maksymets, A. Clegg, J. M. Turner, et al., Habitat-Matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI, in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), (2021), 1–12.
    [44] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, Int. J. Comput. Vision, 47 (2002), 7–42. https://doi.org/10.1023/A:1014573219977 doi: 10.1023/A:1014573219977
    [45] D. Scharstein, R. Szeliski, High-accuracy stereo depth maps using structured light, in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, (2003), 1. https://doi.org/10.1109/CVPR.2003.1211354
    [46] D. Scharstein, C. Pal, Learning conditional random fields for stereo, in 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2007), 1–8. https://doi.org/10.1109/CVPR.2007.383191
    [47] H. Hirschmuller, D. Scharstein, Evaluation of cost functions for stereo matching, in 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2007), 1–8. https://doi.org/10.1109/CVPR.2007.383248
    [48] D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nesic, X. Wang, et al., High-resolution stereo datasets with subpixel-accurate ground truth, in 36th German Conference on Pattern Recognition, Springer, (2014), 31–42. https://doi.org/10.1007/978-3-319-11752-2_3
    [49] N. Silberman, D. Hoiem, K. Pushmeet, R. Fergus, Indoor segmentation and support inference from rgbd images, in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Springer, (2012), 746–760. https://doi.org/https://doi.org/10.1007/978-3-642-33715-4_54
    [50] K. Guo, P. Lincoln, P. Davidson, J. Busch, X. Yu, M. Whalen, et al., The Relightables: Volumetric performance capture of humans with realistic relighting, ACM Trans. Graphics, 38 (2019), 1–19. https://doi.org/10.1145/3355089.3356571 doi: 10.1145/3355089.3356571
    [51] A. Horé, D. Ziou, Image quality metrics: PSNR vs. SSIM, in 2010 20th International Conference on Pattern Recognition, IEEE, (2010), 2366–2369. https://doi.org/10.1109/ICPR.2010.579
    [52] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Process., 13 (2004), 600–612. https://doi.org/10.1109/TIP.2003.819861 doi: 10.1109/TIP.2003.819861
    [53] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang, The unreasonable effectiveness of deep features as a perceptual metric, in Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, (2018), 586–595.
    [54] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016), 1–9.
    [55] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs trained by a two time-scale update rule converge to a local nash equilibrium, in Advances in Neural Information Processing Systems, Curran Associates Inc., 30 (2017), 1–12.
    [56] M. Bińkowski, D. J. Sutherland, M. Arbel, A. Gretton, Demystifying MMD GANs, in International Conference on Learning Representations, 2018.
    [57] Z. Shi, S. Peng, Y. Xu, Y. Liao, Y. Shen, Deep generative models on 3D representations: A survey, preprint, arXiv: 2210.15663.
    [58] R. Huang, S. Zhang, T. Li, R. He, Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), IEEE, (2017), 2439–2448.
    [59] B. Zhao, X. Wu, Z. Cheng, H. Liu, Z. Jie, J. Feng, Multi-view image generation from a single-view, in Proceedings of the 26th ACM International Conference on Multimedia, ACM, (2018), 383–391. https://doi.org/10.1145/3240508.3240536
    [60] K. Regmi, A. Borji, Cross-view image synthesis using conditional GANs, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2018), 3501–3510. https://doi.org/10.1109/CVPR.2018.00369
    [61] K. Regmi, A. Borji, Cross-view image synthesis using geometry-guided conditional GANs, Comput. Vision Image Understanding, 187 (2019), 102788. https://doi.org/10.1016/j.cviu.2019.07.008 doi: 10.1016/j.cviu.2019.07.008
    [62] F. Mokhayeri, K. Kamali, E. Granger, Cross-domain face synthesis using a controllable GAN, in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, (2020), 241–249. https://doi.org/10.1109/WACV45572.2020.9093275
    [63] X. Zhu, Z. Yin, J. Shi, H. Li, D. Lin, Generative adversarial frontal view to bird view synthesis, in 2018 International Conference on 3D Vision (3DV), IEEE, (2018), 454–463. https://doi.org/10.1109/3DV.2018.00059
    [64] H. Ding, S. Wu, H. Tang, F. Wu, G. Gao, X. Jing, Cross-view image synthesis with deformable convolution and attention mechanism, in Pattern Recognition and Computer Vision, Springer, (2020), 386–397. https://doi.org/10.1007/978-3-030-60633-6_32
    [65] B. Ren, H. Tang, N. Sebe, Cascaded cross MLP-Mixer GANs for cross-view image translation, in British Machine Vision Conference, (2021), 1–14.
    [66] J. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, (2017), 2242–2251. https://doi.org/10.1109/ICCV.2017.244
    [67] M. Yin, L. Sun, Q. Li, Novel view synthesis on unpaired data by conditional deformable variational auto-encoder, in Computer Vision–ECCV 2020, Springer, (2020), 87–103. https://doi.org/10.1007/978-3-030-58604-1_6
    [68] X. Shen, J. Plested, Y. Yao, T. Gedeon, Pairwise-GAN: Pose-based view synthesis through pair-wise training, in Neural Information Processing, Springer, (2020), 507–515. https://doi.org/10.1007/978-3-030-63820-7_58
    [69] E. R. Chan, M. Monteiro, P. Kellnhofer, J. Wu, G. Wetzstein, pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2021), 5795–5805. https://doi.org/10.1109/CVPR46437.2021.00574
    [70] S. Cai, A. Obukhov, D. Dai, L. V. Gool, Pix2NeRF: Unsupervised conditional $\pi$-GAN for single image to Neural Radiance Fields translation, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 3971–3980. https://doi.org/10.1109/CVPR52688.2022.00395
    [71] T. Leimkhler, G. Drettakis, FreeStyleGAN, ACM Trans. Graphics, 40 (2021), 1–15. https://doi.org/10.1145/3478513.3480538 doi: 10.1145/3478513.3480538
    [72] S. C. Medin, B. Egger, A. Cherian, Y. Wang, J. B. Tenenbaum, X. Liu, et al., MOST-GAN: 3D morphable StyleGAN for disentangled face image manipulation, in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, 36 (2022), 1962–1971. https://doi.org/10.1609/aaai.v36i2.20091
    [73] R. Or-El, X. Luo, M. Shan, E. Shechtman, J. J. Park, I. Kemelmacher-Shlizerman, StyleSDF: High-resolution 3D-consistent image and geometry generation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 13503–13513.
    [74] X. Zheng, Y. Liu, P. Wang, X. Tong, SDF-StyleGAN: Implicit SDF-based StyleGAN for 3D shape generation, Comput. Graphics Forum, 41 (2022), 52–63. https://doi.org/10.1111/cgf.14602 doi: 10.1111/cgf.14602
    [75] Y. Deng, J. Yang, J. Xiang, X. Tong, GRAM: Generative radiance manifolds for 3D-aware image generation, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 10663–10673. https://doi.org/10.1109/CVPR52688.2022.01041
    [76] J. Xiang, J. Yang, Y. Deng, X. Tong, GRAM-HD: 3D-consistent image generation at high resolution with generative radiance manifolds, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 2195–2205.
    [77] E. R. Chan, C. Z. Lin, M. A. Chan, K. Nagano, B. Pan, S. D. Mello, et al., Efficient geometry-aware 3D generative adversarial networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 16123–16133.
    [78] X. Zhao, F. Ma, D. Güera, Z. Ren, A. G. Schwing, A. Colburn, Generative multiplane images: Making a 2D GAN 3D-aware, in Computer Vision–ECCV 2022, Springer, (2022), 18–35. https://doi.org/10.1007/978-3-031-20065-6_2
    [79] H. A. Alhaija, A. Dirik, A. Knrig, S. Fidler, M. Shugrina, XDGAN: Multi-modal 3D shape generation in 2D space, in British Machine Vision Conference, (2022), 1–14.
    [80] K. Zhang, G. Riegler, N. Snavely, V. Koltun, NeRF++: Analyzing and improving Neural Radiance Fields, preprint, arXiv: 2010.07492.
    [81] D. Rebain, W. Jiang, S. Yazdani, K. Li, K. M. Yi, A. Tagliasacchi, DeRF: Decomposed radiance fields, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2021), 14148–14156. https://doi.org/10.1109/CVPR46437.2021.01393
    [82] K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, et al., Nerfies: Deformable Neural Radiance Fields, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 5845–5854. https://doi.org/10.1109/ICCV48922.2021.00581
    [83] J. Li, Z. Feng, Q. She, H. Ding, C. Wang, G. H. Lee, MINE: Towards continuous depth MPI with NeRF for novel view synthesis, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 12558–12568. https://doi.org/10.1109/ICCV48922.2021.01235
    [84] K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, et al., HyperNeRF: A higher-dimensional representation for topologically varying Neural Radiance Fields, ACM Trans. Graphics, 40 (2021), 1–12. https://doi.org/10.1145/3478513.3480487 doi: 10.1145/3478513.3480487
    [85] T. Chen, P. Wang, Z. Fan, Z. Wang, Aug-NeRF: Training stronger Neural Radiance Fields with triple-level physically-grounded augmentations, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 15170–15181. https://doi.org/10.1109/CVPR52688.2022.01476
    [86] T. Kaneko, AR-NeRF: Unsupervised learning of depth and defocus effects from natural images with aperture rendering Neural Radiance Fields, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 18387–18397.
    [87] X. Li, C. Hong, Y. Wang, Z. Cao, K. Xian, G. Lin, SymmNeRF: Learning to explore symmetry prior for single-view view synthesis, in Proceedings of the Asian Conference on Computer Vision (ACCV), (2022), 1726–1742.
    [88] K. Zhou, W. Li, Y. Wang, T. Hu, N. Jiang, X. Han, et al., NeRFLix: High-quality neural view synthesis by learning a degradation-driven inter-viewpoint mixer, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 12363–12374.
    [89] Z. Wang, S. Wu, W. Xie, M. Chen, V. A. Prisacariu, NeRF–: Neural Radiance Fields without known camera parameters, preprint, arXiv: 2102.07064.
    [90] B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, et al., Local light field fusion: Practical view synthesis with prescriptive sampling guidelines, ACM Trans. Graphics, 38 (2019), 1–14. https://doi.org/10.1145/3306346.3322980 doi: 10.1145/3306346.3322980
    [91] Q. Meng, A. Chen, H. Luo, M. Wu, H. Su, L. Xu, et al., GNeRF: GAN-based Neural Radiance Field without posed camera, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 6351–6361.
    [92] R. Jensen, A. Dahl, G. Vogiatzis, E. Tola, H. Aanæs, Large scale multi-view stereopsis evaluation, in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2014), 406–413. https://doi.org/10.1109/CVPR.2014.59
    [93] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, J. Park, Self-calibrating Neural Radiance Fields, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 5826–5834. https://doi.org/10.1109/ICCV48922.2021.00579
    [94] A. Knapitsch, J. Park, Q. Zhou, V. Koltun, Tanks and temples: Benchmarking large-scale scene reconstruction, ACM Trans. Graphics, 36 (2017), 1–13. https://doi.org/10.1145/3072959.3073599 doi: 10.1145/3072959.3073599
    [95] W. Bian, Z. Wang, K. Li, J. Bian, V. A. Prisacariu, NoPe-NeRF: Optimising Neural Radiance Field with no pose prior, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 4160–4169.
    [96] P. Truong, M. Rakotosaona, F. Manhardt, F. Tombari, SPARF: Neural Radiance Fields from sparse and noisy poses, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 4190–4200.
    [97] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, et al., The replica dataset: A digital replica of indoor spaces, preprint, arXiv: 1906.05797.
    [98] J. Y. Zhang, G. Yang, S. Tulsiani, D. Ramanan, NeRS: Neural reflectance surfaces for sparse-view 3D reconstruction in the wild, in Conference on Neural Information Processing Systems, Curran Associates, Inc., 34 (2021), 29835–29847.
    [99] S. Seo, D. Han, Y. Chang, N. Kwak, MixNeRF: Modeling a ray with mixture density for novel view synthesis from sparse inputs, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 20659–20668.
    [100] A. Cao, R. D. Charette, SceneRF: Self-supervised monocular 3D scene reconstruction with radiance fields, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 9387–9398.
    [101] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, et al., SemanticKITTI: A dataset for semantic scene understanding of lidar sequences, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2019), 9296–9306. https://doi.org/10.1109/ICCV.2019.00939
    [102] J. Chen, W. Yi, L. Ma, X. Jia, H. Lu, GM-NeRF: Learning generalizable model-based Neural Radiance Fields from multi-view images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 20648–20658.
    [103] T. Yu, Z. Zheng, K. Guo, P. Liu, Q. Dai, Y. Liu, Function4D: Real-time human volumetric capture from very sparse consumer rgbd sensors, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2021), 5742–5752. https://doi.org/10.1109/CVPR46437.2021.00569
    [104] B. Bhatnagar, G. Tiwari, C. Theobalt, G. Pons-Moll, Multi-Garment net: Learning to dress 3D people from images, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2019), 5419–5429. https://doi.org/10.1109/ICCV.2019.00552
    [105] W. Cheng, S. Xu, J. Piao, C. Qian, W. Wu, K. Lin, et al., Generalizable neural performer: Learning robust radiance fields for human novel view synthesis, preprint, arXiv: 2204.11798.
    [106] S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, et al., Neural Body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2021), 9050–9059. https://doi.org/10.1109/CVPR46437.2021.00894
    [107] B. Mildenhall, P. Hedman, R. Martin-Brualla, P. P. Srinivasan, J. T. Barron, NeRF in the dark: High dynamic range view synthesis from noisy raw images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 16190–16199.
    [108] L. Ma, X. Li, J. Liao, Q. Zhang, X. Wang, J. Wang, et al., Deblur-NeRF: Neural Radiance Fields from blurry images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 12861–12870.
    [109] X. Huang, Q. Zhang, Y. Feng, H. Li, X. Wang, Q. Wang, Hdr-NeRF: High dynamic range Neural Radiance Fields, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 18398–18408.
    [110] P. Naama, T. Tali, K. Simon, NAN: Noise-aware NeRFs for burst-denoising, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 12672–12681.
    [111] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, P. Hedman, Mip-NeRF 360: Unbounded anti-aliased Neural Radiance Fields, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 5470–5479.
    [112] Y. Xiangli, L. Xu, X. Pan, N. Zhao, A. Rao, C. Theobalt, et al., BungeeNeRF: Progressive Neural Radiance Field for extreme multi-scale scene rendering, in Computer Vision–ECCV 2022, Springer, (2022), 106–122. https://doi.org/10.1007/978-3-031-19824-3_7
    [113] Google, Google earth studio, 2018. Available from: https://www.google.com/earth/studio/.
    [114] M. Tancik, V. Casser, X. Yan, S. Pradhan, B. P. Mildenhall, P. Srinivasan, et al., Block-NeRF: Scalable large scene neural view synthesis, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 8238–8248. https://doi.org/10.1109/CVPR52688.2022.00807
    [115] L. Xu, Y. Xiangli, S. Peng, X. Pan, N. Zhao, C. Theobalt, et al., Grid-guided Neural Radiance Fields for large urban scenes, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 8296–8306.
    [116] T. Haithem, R. Deva, S. Mahadev, Mega-NERF: Scalable construction of large-scale NeRFs for virtual fly-throughs, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 12922–12931.
    [117] C. Choi, S. M. Kim, Y. M. Kim, Balanced spherical grid for egocentric view synthesis, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 16590–16599.
    [118] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, A. Kanazawa, PlenOctrees for real-time rendering of Neural Radiance Fields, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2021), 5732–5741. https://doi.org/10.1109/ICCV48922.2021.00570
    [119] C. Sun, M. Sun, H. Chen, Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 5449–5459. https://doi.org/10.1109/CVPR52688.2022.00538
    [120] L. Liu, J. Gu, K. Z. Lin, T. Chua, C. Theobalt, Neural sparse voxel fields, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 33 (2020), 15651–15663.
    [121] Y. Yao, Z. Luo, S. Li, J. Zhang, Y. Ren, L. Zhou, et al., BlendedMVS: A large-scale dataset for generalized multi-view stereo networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2020), 1790–1799.
    [122] V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, M. Zollhöfer, DeepVoxels: Learning persistent 3D feature embeddings, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2019), 2437–2446.
    [123] H. Wang, J. Ren, Z. Huang, K. Olszewski, M. Chai, Y. Fu, et al., R2L: Distilling Neural Radiance Field to neural light field for efficient novel view synthesis, in Computer Vision–ECCV 2022, Springer, (2022), 612–629. https://doi.org/10.1007/978-3-031-19821-2_35
    [124] T. Neff, P. Stadlbauer, M. Parger, A. Kurz, J. H. Mueller, C. R. A. Chaitanya, et al., DONeRF: Towards real-time rendering of compact Neural Radiance Fields using depth oracle networks, Comput. Graphics Forum, 40 (2021), 45–59. https://doi.org/10.1111/cgf.14340 doi: 10.1111/cgf.14340
    [125] K. Wadhwani, T. Kojima, SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, (2022), 2716–2724. https://doi.org/10.1109/CVPRW56347.2022.00307
    [126] Z. Chen, T. Funkhouser, P. Hedman, A. Tagliasacchi, MobileNeRF: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 16569–16578.
    [127] Y. Chen, X. Chen, X. Wang, Q. Zhang, Y. Guo, Y. Shan, et al., Local-to-global registration for bundle-adjusting Neural Radiance Fields, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 8264–8273.
    [128] C. Sbrolli, P. Cudrano, M. Frosi, M. Matteucci, IC3D: Image-conditioned 3D diffusion for shape generation, preprint, arXiv: 2211.10865.
    [129] J. Gu, Q. Gao, S. Zhai, B. Chen, L. Liu, J. Susskind, Learning controllable 3D diffusion models from single-view images, preprint, arXiv: 2304.06700.
    [130] T. Anciukevičius, Z. Xu, M. Fisher, P. Henderson, H. Bilen, N. J. Mitra, et al., RenderDiffusion: Image diffusion for 3D reconstruction, inpainting and generation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 12608–12618.
    [131] J. Xiang, J. Yang, B. Huang, X. Tong, 3D-aware image generation using 2D diffusion models, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 2383–2393.
    [132] R. Liu, R. Wu, B. V. Hoorick, P. Tokmakov, S. Zakharov, C. Vondrick, Zero-1-to-3: Zero-shot one image to 3D object, preprint, arXiv: 2303.11328.
    [133] E. R. Chan, K. Nagano, M. A. Chan, A. W. Bergman, J. J. Park, A. Levy, et al., Generative novel view synthesis with 3D-aware diffusion models, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 4217–4229.
    [134] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16 x 16 words: Transformers for image recognition at scale, in International Conference on Learning Representations, 2021.
    [135] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., (2017), 6000–6010.
    [136] P. Nguyen-Ha, L. Huynh, E. Rahtu, J. Heikkila, Sequential view synthesis with transformer, in Proceedings of the Asian Conference on Computer Vision (ACCV), 2020.
    [137] J. Yang, Y. Li, L. Yang, Shape transformer nets: Generating viewpoint-invariant 3D shapes from a single image, J. Visual Commun. Image Represent., 81 (2021), 103345. https://doi.org/10.1016/j.jvcir.2021.103345 doi: 10.1016/j.jvcir.2021.103345
    [138] J. Kulhánek, E. Derner, T. Sattler, R. Babuška, ViewFormer: NeRF-free neural rendering from few images using transformers, in Computer Vision–ECCV 2022, Springer, (2022), 198–216. https://doi.org/10.1007/978-3-031-19784-0_12
    [139] P. Zhou, L. Xie, B. Ni, Q. Tian, CIPS-3D: A 3D-aware generator of GANs based on conditionally-independent pixel synthesis, preprint, arXiv: 2110.09788.
    [140] X. Xu, X. Pan, D. Lin, B. Dai, Generative occupancy fields for 3D surface-aware image synthesis, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 34 (2021), 20683–20695.
    [141] Y. Lan, X. Meng, S. Yang, C. C. Loy, B. Dai, Self-supervised geometry-aware encoder for style-based 3D GAN inversion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 20940–20949.
    [142] S. Li, J. van de Weijer, Y. Wang, F. S. Khan, M. Liu, J. Yang, 3D-aware multi-class image-to-image translation with NeRFs, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 12652–12662.
    [143] M. Shahbazi, E. Ntavelis, A. Tonioni, E. Collins, D. P. Paudel, M. Danelljan, et al., NeRF-GAN distillation for efficient 3D-aware generation with convolutions, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, IEEE, (2023), 2888–2898.
    [144] A. Kania, A. Kasymov, M. Ziba, P. Spurek, HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN, preprint, arXiv: 2301.11631.
    [145] A. R. Bhattarai, M. Nießner, A. Sevastopolsky, TriPlaneNet: An encoder for EG3D inversion, preprint, arXiv: 2303.13497.
    [146] N. Müller, Y. Siddiqui, L. Porzi, S. R. Bulo, P. Kontschieder, M. Nießner, Diffrf: Rendering-guided 3D radiance field diffusion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 4328–4338.
    [147] D. Xu, Y. Jiang, P. Wang, Z. Fan, Y. Wang, Z. Wang, NeuralLift-360: Lifting an in-the-wild 2D photo to a 3D object with 360deg views, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 4479–4489.
    [148] H. Chen, J. Gu, A. Chen, W. Tian, Z. Tu, L. Liu, et al., Single-stage diffusion NeRF: A unified approach to 3D generation and reconstruction, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 2416–2425.
    [149] J. Gu, A. Trevithick, K. Lin, J. Susskind, C. Theobalt, L. Liu, et al., NerfDiff: Single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion, in International Conference on Machine Learning, PMLR, (2023), 11808–11826.
    [150] D. Wang, X. Cui, S. Salcudean, Z. J. Wang, Generalizable Neural Radiance Fields for novel view synthesis with transformer, preprint, arXiv: 2206.05375.
    [151] K. Lin, L. Yen-Chen, W. Lai, T. Lin, Y. Shih, R. Ramamoorthi, Vision transformer for NeRF-based view synthesis from a single input image, in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, (2023), 806–815. https://doi.org/10.1109/WACV56688.2023.00087
    [152] J. Liu, Q. Nie, Y. Liu, C. Wang, NeRF-Loc: Visual localization with conditional Neural Radiance Field, preprint, arXiv: 2304.07979.
    [153] Y. Liao, K. Schwarz, L. Mescheder, A. Geiger, Towards unsupervised learning of generative models for 3D controllable image synthesis, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2020), 5871–5880.
    [154] T. Nguyen-Phuoc, C. Richardt, L. Mai, Y. Yang, N. Mitra, BlockGAN: Learning 3D object-aware scene representations from unlabelled images, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 33 (2020), 6767–6778.
    [155] X. Pan, B. Dai, Z. Liu, C. C. Loy, P. Luo, Do 2D GANs know 3D shape? Unsupervised 3D shape reconstruction from 2D image GANs, in International Conference on Learning Representations, 2021.
    [156] A. Tewari, M. B. R, X. Pan, O. Fried, M. Agrawala, C. Theobalt, Disentangled3D: Learning a 3D generative model with disentangled geometry and appearance from monocular images, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 1506–1515. https://doi.org/10.1109/CVPR52688.2022.00157
    [157] S. Kobayashi, E. Matsumoto, V. Sitzmann, Decomposing NeRF for editing via feature field distillation, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 35 (2022), 23311–23330.
    [158] X. Zhang, K. Abhijit, F. Thomas, G. Leonidas, S. Hao, G. Kyle, Nerflets: Local radiance fields for efficient structure-aware 3D scene representation from 2D supervision, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 8274–8284.
    [159] C. Zheng, W. Lin, F. Xu, EditableNeRF: Editing topologically varying Neural Radiance Fields by key points, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 8317–8327.
    [160] J. Zhang, L. Yang, MonodepthPlus: Self-supervised monocular depth estimation using soft-attention and learnable outlier-masking, J. Electron. Imaging, 30 (2021), 023017. https://doi.org/10.1117/1.JEI.30.2.023017 doi: 10.1117/1.JEI.30.2.023017
    [161] R. Liang, J. Zhang, H. Li, C. Yang, Y. Guan, N. Vijaykumar, SPIDR: SDF-based neural point fields for illumination and deformation, preprint, arXiv: 2210.08398.
    [162] Y. Zhang, X. Huang, B. Ni, T. Li, W. Zhang, Frequency-modulated point cloud rendering with easy editing, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 119–129.
    [163] J. Chen, J. Lyu, Y. Wang, NeuralEditor: Editing Neural Radiance Fields via manipulating point clouds, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 12439–12448.
    [164] J. Zhu, Z. Zhang, C. Zhang, J. Wu, A. Torralba, J. B. Tenenbaum, et al., Visual object networks: Image generation with disentangled 3D representations, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 31 (2018).
    [165] A. Mirzaei, T. Aumentado-Armstrong, M. A. Brubaker, J. Kelly, A. Levinshtein, K. G. Derpanis, et al., Reference-guided controllable inpainting of Neural Radiance Fields, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 17815–17825.
    [166] Y. Yin, Z. Fu, F. Yang, G. Lin, OR-NeRF: Object removing from 3D scenes guided by multiview segmentation with Neural Radiance Fields, preprint, arXiv: 2305.10503.
    [167] H. G. Kim, M. Park, S. Lee, S. Kim, Y. M. Ro, Visual comfort aware-reinforcement learning for depth adjustment of stereoscopic 3D images, in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Press, 35 (2021), 1762–1770. https://doi.org/10.1609/aaai.v35i2.16270
    [168] R. Jheng, T. Wu, J. Yeh, W. H. Hsu, Free-form 3D scene inpainting with dual-stream GAN, in British Machine Vision Conference, 2022.
    [169] Q. Wang, Y. Wang, M. Birsak, P. Wonka, BlobGAN-3D: A spatially-disentangled 3D-aware generative model for indoor scenes, preprint, arXiv: 2303.14706.
    [170] J. Gu, L. Liu, P. Wang, C. Theobalt, StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis, in Tenth International Conference on Learning Representations, (2022), 1–25.
    [171] C. Wang, M. Chai, M. He, D. Chen, J. Liao, CLIP-NeRF: Text-and-image driven manipulation of Neural Radiance Fields, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 3825–3834. https://doi.org/10.1109/CVPR52688.2022.00381
    [172] K. Kania, K. M. Yi, M. Kowalski, T. Trzciński, A. Tagliasacchi, CoNeRF: Controllable Neural Radiance Fields, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2022), 18623–18632.
    [173] V. Lazova, V. Guzov, K. Olszewski, S. Tulyakov, G. Pons-Moll, Control-NeRF: Editable feature volumes for scene rendering and manipulation, in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, (2023), 4329–4339. https://doi.org/10.1109/WACV56688.2023.00432
    [174] Y. Yuan, Y. Sun, Y. La, Y. Ma, R. Jia, L. Gao, NeRF-Editing: Geometry editing of Neural Radiance Fields, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 18332–18343. https://doi.org/10.1109/CVPR52688.2022.01781
    [175] C. Sun, Y. Liu, J. Han, S. Gould, NeRFEditor: Differentiable style decomposition for full 3D scene editing, preprint, arXiv: 2212.03848.
    [176] Z. Wang, Y. Deng, J. Yang, J. Yu, X. Tong, Generative deformable radiance fields for disentangled image synthesis of topology-varying objects, Comput. Graphics Forum, 41 (2022), 431–442. https://doi.org/10.1111/cgf.14689 doi: 10.1111/cgf.14689
    [177] K. Tertikas, D. Paschalidou, B. Pan, J. J. Park, M. A. Uy, I. Emiris, et al., PartNeRF: Generating part-aware editable 3D shapes without 3D supervision, preprint, arXiv: 2303.09554.
    [178] C. Bao, Y. Zhang, B. Yang, T. Fan, Z. Yang, H. Bao, et al., SINE: Semantic-driven image-based NeRF editing with prior-guided editing field, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 20919–20929.
    [179] D. Cohen-Bar, E. Richardson, G. Metzer, R. Giryes, D. Cohen-Or, Set-the-Scene: Global-local training for generating controllable NeRF scenes, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, IEEE, (2023), 2920–2929.
    [180] A. Mirzaei, T. Aumentado-Armstrong, K. G. Derpanis, J. Kelly, M. A. Brubaker, I. Gilitschenski, et al., SPIn-NeRF: Multiview segmentation and perceptual inpainting with Neural Radiance Fields, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, (2023), 20669–20679.
    [181] O. Avrahami, D. Lischinski, O. Fried, Blended diffusion for text-driven editing of natural images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 18208–18218.
    [182] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, et al., GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models, in Proceedings of the 39th International Conference on Machine Learning, PMLR, (2022), 16784–16804.
    [183] G. Couairon, J. Verbeek, H. Schwenk, M. Cord, DiffEdit: Diffusion-based semantic image editing with mask guidance, in the Eleventh International Conference on Learning Representations, 2023.
    [184] E. Sella, G. Fiebelman, P. Hedman, H. Averbuch-Elor, Vox-E: Text-guided voxel editing of 3D objects, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 430–440.
    [185] A. Haque, M. Tancik, A. A. Efros, A. Holynski, A. Kanazawa, Instruct-NeRF2NeRF: Editing 3D scenes with instructions, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 19740–19750.
    [186] Y. Lin, H. Bai, S. Li, H. Lu, X. Lin, H. Xiong, et al., CompoNeRF: Text-guided multi-object compositional NeRF with editable 3D scene layout, preprint, arXiv: 2303.13843.
    [187] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, D. Duckworth, NeRF in the wild: Neural Radiance Fields for unconstrained photo collections, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2021), 7206–7215. https://doi.org/10.1109/CVPR46437.2021.00713
    [188] M. Boss, A. Engelhardt, A. Kar, Y. Li, D. Sun, J. T. Barron, et al., SAMURAI: Shape and material from unconstrained real-world arbitrary image collections, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 35 (2022), 26389–26403.
    [189] C. Choi, J. Kim, Y. M. Kim, IBL-NeRF: Image-based lighting formulation of Neural Radiance Fields, preprint, arXiv: 2210.08202.
    [190] Z. Yan, C. Li, G. H. Lee, NeRF-DS: Neural Radiance Fields for dynamic specular objects, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 8285–8295.
    [191] D. Guo, L. Zhu, S. Ling, T. Li, G. Zhang, Q. Yang, et al., Face illumination normalization based on Generative Adversarial Network, Nat. Comput., 22 (2022), 105–117. https://doi.org/10.1007/s11047-022-09892-4 doi: 10.1007/s11047-022-09892-4
    [192] Z. Cui, L. Gu, X. Sun, Y. Qiao, T. Harada, Aleth-NeRF: Low-light condition view synthesis with concealing fields, preprint, arXiv: 2303.05807.
    [193] A. R. Nandhini, V. P. D. Raj, Low-light image enhancement based on generative adversarial network, Front. Genet., 12 (2021), 799777. https://doi.org/10.3389/fgene.2021.799777 doi: 10.3389/fgene.2021.799777
    [194] W. Kim, R. Lee, M. Park, S. Lee, Low-light image enhancement based on maximal diffusion values, IEEE Access, 7 (2019), 129150–129163. https://doi.org/10.1109/ACCESS.2019.2940452 doi: 10.1109/ACCESS.2019.2940452
    [195] P. Ponglertnapakorn, N. Tritrong, S. Suwajanakorn, DiFaReli: Diffusion face relighting, preprint, arXiv: 2304.09479.
    [196] M. Guo, A. Fathi, J. Wu, T. Funkhouser, Object-centric neural scene rendering, preprint, arXiv: 2012.08503.
    [197] Y. Wang, W. Zhou, Z. Lu, H. Li, UDoc-GAN: Unpaired document illumination correction with background light prior, in Proceedings of the 30th ACM International Conference on Multimedia, ACM, (2022), 5074–5082. https://doi.org/10.1145/3503161.3547916
    [198] J. Ling, Z. Wang, F. Xu, ShadowNeuS: Neural SDF reconstruction by shadow ray supervision, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 175–185.
    [199] V. Rudnev, M. Elgharib, W. Smith, L. Liu, V. Golyanik, C. Theobalt, NeRF for outdoor scene relighting, in Computer Vision–ECCV 2022, Springer, (2022), 615–631. https://doi.org/10.1007/978-3-031-19787-1_35
    [200] C. Higuera, B. Boots, M. Mukadam, Learning to read braille: Bridging the tactile reality gap with diffusion models, preprint, arXiv: 2304.01182.
    [201] T. Guo, D. Kang, L. Bao, Y. He, S. Zhang, NeRFReN: Neural Radiance Fields with reflections, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 18409–18418.
    [202] C. LeGendre, W. Ma, G. Fyffe, J. Flynn, L. Charbonnel, J. Busch, et al., DeepLight: Learning illumination for unconstrained mobile mixed reality, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2019), 5911–5921. https://doi.org/10.1109/CVPR.2019.00607
    [203] W. Ye, S. Chen, C. Bao, H. Bao, M. Pollefeys, Z. Cui, et al., IntrinsicNeRF: Learning intrinsic Neural Radiance Fields for editable novel view synthesis, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, (2023), 339–351.
    [204] M. Boss, V. Jampani, R. Braun, C. Liu, J. T. Barron, H. P. A. Lensch, Neural-PIL: Neural pre-integrated lighting for reflectance decomposition, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 34 (2021), 10691–10704.
    [205] S. Saito, T. Simon, J. Saragih, H. Joo, PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2020), 81–90. https://doi.org/10.1109/CVPR42600.2020.00016
    [206] H. Tang, S. Bai, L. Zhang, P. H. Torr, N. Sebe, XingGAN for person image generation, in Computer Vision–ECCV 2020, Springer, (2020), 717–734. https://doi.org/10.1007/978-3-030-58595-2_43
    [207] Y. Ren, X. Yu, J. Chen, T. H. Li, G. Li, Deep image spatial transformation for person image generation, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2020), 7687–7696. https://doi.org/10.1109/CVPR42600.2020.00771
    [208] Y. Liu, Z. Qin, T. Wan, Z. Luo, Auto-painter: Cartoon image generation from sketch by using conditional wasserstein generative adversarial networks, Neurocomputing, 311 (2018), 78–87. https://doi.org/10.1016/j.neucom.2018.05.045 doi: 10.1016/j.neucom.2018.05.045
    [209] H. Li, AI synthesis for the metaverse: From avatars to 3D scenes, Stanford University, Stanford Talks, 2022. Available from: https://talks.stanford.edu/hao-li-pinscreen-on-ai-synthesis-for-the-metaverse-from-avatars-to-3d-scenes/.
    [210] S. Murray, A. Tallon, Mapping gothic france, Columbia University, Media Center for Art History, 2023. Available from: https://mcid.mcah.columbia.edu/art-atlas/mapping-gothic.
    [211] Y. Xiang, C. Lv, Q. Liu, X. Yang, B. Liu, M. Ju, A creative industry image generation dataset based on captions, preprint, arXiv: 2211.09035.
    [212] C. Tatsch, J. A. Bredu, D. Covell, I. B. Tulu, Y. Gu, Rhino: An autonomous robot for mapping underground mine environments, in 2023 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), IEEE, (2023), 1166–1173. https://doi.org/10.1109/AIM46323.2023.10196202
    [213] Y. Tian, L. Li, A. Fumagalli, Y. Tadesse, B. Prabhakaran, Haptic-enabled mixed reality system for mixed-initiative remote robot control, preprint, arXiv: 2102.03521.
    [214] G. Pu, Y. Men, Y. Mao, Y. Jiang, W. Ma, Z. Lian, Controllable image synthesis with attribute-decomposed GAN, IEEE Trans. Pattern Anal. Mach. Intell., 45 (2023), 1514–1532. https://doi.org/10.1109/TPAMI.2022.3161985 doi: 10.1109/TPAMI.2022.3161985
    [215] X. Wu, Y. Zhang, Q. Li, Y. Qi, J. Wang, Y. Guo, Face aging with pixel-level alignment GAN, Appl. Intell., 52 (2022), 14665–14678. https://doi.org/10.1007/s10489-022-03541-0 doi: 10.1007/s10489-022-03541-0
    [216] D. Sero, A. Zaidi, J. Li, J. D. White, T. B. G. Zarzar, M. L. Marazita, et al., Facial recognition from dna using face-to-dna classifiers, Nat. Commun., 10 (2019), 1. https://doi.org/10.1038/s41467-018-07882-8 doi: 10.1038/s41467-018-07882-8
    [217] M. Nicolae, M. Sinn, M. Tran, B. Buesser, A. Rawat, M. Wistuba, et al., Adversarial robustness toolbox v1.0.0, 2018. Available from: https://github.com/Trusted-AI/adversarial-robustness-toolbox.
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1663) PDF downloads(124) Cited by(1)

Article outline

Figures and Tables

Figures(9)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog