Research article

TBRAFusion: Infrared and visible image fusion based on two-branch residual attention Transformer

  • Received: 28 October 2024 Revised: 26 December 2024 Accepted: 03 January 2025 Published: 17 January 2025
  • The fusion of infrared and visible images highlights the target while preserving detailed information, which helps to comprehensively capture the scene information. However, the existing methods continue to face challenges in managing the integration of global and local information, as well as enhancing the extraction of detailed image features, thus ultimately leading to constrained fusion outcomes. To enhance the fusion effect, this paper proposes a dual-branch residual attention-based infrared and visible image fusion network (TBRAFusion). The network utilizes two key modules, TransNext and the dual-branch residual attention (DBRA) module, which are used to process the input images in parallel to extract contrast and detail information. Additionally, an auxiliary function is incorporated into the loss function. Compared with mainstream fusion models, TBRAFusion achieves better fusion results and metrics through these improvements. The experimental results on the TNO dataset show that TBRAFusion improves the metrics in entropy (EN), spatial frequency (SF), sum ofcorrelation differences (SCD), and visual information fidelity (VIF) by 0.42$ \% $, 4$ \% $, 3.9$ \% $, and 1.2$ \% $, respectively. Tests on the MSRDS dataset show improvements of 1.7$ \% $, 5.4$ \% $, 9.6$ \% $, and 4.9$ \% $ in EN, standard deviation (SD), SF, and SCD, respectively.

    Citation: Wangwei Zhang, Hao Sun, Bin Zhou. TBRAFusion: Infrared and visible image fusion based on two-branch residual attention Transformer[J]. Electronic Research Archive, 2025, 33(1): 158-180. doi: 10.3934/era.2025009

    Related Papers:

  • The fusion of infrared and visible images highlights the target while preserving detailed information, which helps to comprehensively capture the scene information. However, the existing methods continue to face challenges in managing the integration of global and local information, as well as enhancing the extraction of detailed image features, thus ultimately leading to constrained fusion outcomes. To enhance the fusion effect, this paper proposes a dual-branch residual attention-based infrared and visible image fusion network (TBRAFusion). The network utilizes two key modules, TransNext and the dual-branch residual attention (DBRA) module, which are used to process the input images in parallel to extract contrast and detail information. Additionally, an auxiliary function is incorporated into the loss function. Compared with mainstream fusion models, TBRAFusion achieves better fusion results and metrics through these improvements. The experimental results on the TNO dataset show that TBRAFusion improves the metrics in entropy (EN), spatial frequency (SF), sum ofcorrelation differences (SCD), and visual information fidelity (VIF) by 0.42$ \% $, 4$ \% $, 3.9$ \% $, and 1.2$ \% $, respectively. Tests on the MSRDS dataset show improvements of 1.7$ \% $, 5.4$ \% $, 9.6$ \% $, and 4.9$ \% $ in EN, standard deviation (SD), SF, and SCD, respectively.



    加载中


    [1] H. Zhang, H. Xu, X. Tian, J. Jiang, J. Ma, Image fusion meets deep learning: A survey and perspective, Inf. Fusion, 76 (2021), 323–336. https://doi.org/10.1016/j.inffus.2021.06.008 doi: 10.1016/j.inffus.2021.06.008
    [2] J. Ma, L. Tang, M. Xu, H. Zhang, G. Xiao, Stdfusionnet: An infrared and visible image fusion network based on salient target detection, IEEE Trans. Instrum. Meas., 70 (2021), 1–13. https://doi.org/10.1109/TIM.2021.3075747 doi: 10.1109/TIM.2021.3075747
    [3] C. Li, C. Zhu, Y. Huang, J. Tang, L. Wang, Cross-modal ranking with soft consistency and noisy labels for robust rgb-t tracking, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 808–823.
    [4] L. Tang, J. Yuan, J. Ma, Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network, Inf. Fusion, 82 (2022), 28–42. https://doi.org/10.1016/j.inffus.2021.12.004 doi: 10.1016/j.inffus.2021.12.004
    [5] Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, T. Harada, Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes, in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, (2017), 5108–5115. https://doi.org/10.1109/IROS.2017.8206396
    [6] J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, X. Liu, Dual attention suppression attack: Generate adversarial camouflage in physical world, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), 8565–8574.
    [7] J. Ma, Y. Ma, C. Li, Infrared and visible image fusion methods and applications: A survey, Inf. Fusion, 45 (2019), 153–178. https://doi.org/10.1016/j.inffus.2018.02.004 doi: 10.1016/j.inffus.2018.02.004
    [8] A. B. Hamida, A. Benoit, P. Lambert, C. B. Amar, 3-D deep learning approach for remote sensing image classification, IEEE Trans. Geosci. Remote Sens., 56 (2018), 4420–4434. https://doi.org/10.1109/TGRS.2018.2818945 doi: 10.1109/TGRS.2018.2818945
    [9] S. Li, H. Yin, L. Fang, Remote sensing image fusion via sparse representations over learned dictionaries, IEEE Trans. Geosci. Remote Sens., 51 (2013), 4779–4789. https://doi.org/10.1109/TGRS.2012.2230332 doi: 10.1109/TGRS.2012.2230332
    [10] Z. Fu, X. Wang, J. Xu, N. Zhou, Y. Zhao, Infrared and visible images fusion based on RCPA and NSCT, Infrared Phys. Technol., 77 (2016), 114–123. https://doi.org/10.1016/j.infrared.2016.05.012 doi: 10.1016/j.infrared.2016.05.012
    [11] J. Zhao, Q. Zhou, Y. Chen, H. Feng, Z. Xu, Q. Li, Fusion of visible and infrared images using saliency analysis and detail preserving based image decomposition, Infrared Phys. Technol., 56 (2013), 93–99. https://doi.org/10.1016/j.infrared.2012.11.003 doi: 10.1016/j.infrared.2012.11.003
    [12] Y. Liu, J. Jin, Q. Wang, Y. Shen, X. Dong, Region level based multi-focus image fusion using quaternion wavelet and normalized cut, Signal Process., 97 (2014), 9–30. https://doi.org/10.1016/j.sigpro.2013.10.010 doi: 10.1016/j.sigpro.2013.10.010
    [13] P. R. Hill, C. N. Canagarajah, D. R. Bull, Image fusion using complex wavelets, in BMVC, (2002), 1–10.
    [14] X. Liu, W. Mei, H. Du, Structure tensor and nonsubsampled shearlet transform based algorithm for CT and MRI image fusion, Neurocomputing, 235 (2017), 131–139. https://doi.org/10.1016/j.neucom.2017.01.006 doi: 10.1016/j.neucom.2017.01.006
    [15] Q. Zhang, X. Maldague, An adaptive fusion approach for infrared and visible images based on NSCT and compressed sensing, Infrared Phys. Technol., 74 (2016), 11–20. https://doi.org/10.1016/j.infrared.2015.11.003 doi: 10.1016/j.infrared.2015.11.003
    [16] M. Wu, Y. Ma, F. Fan, X. Mei, J. Huang, Infrared and visible image fusion via joint convolutional sparse representation, J. Opt. Soc. Am. A, 37 (2020), 1105–1115. https://doi.org/10.1364/JOSAA.388447 doi: 10.1364/JOSAA.388447
    [17] Y. Liu, X. Chen, R. K. Ward, Z. J. Wang, Image fusion with convolutional sparse representation, IEEE Signal Process Lett., 23 (2016), 1882–1886. https://doi.org/10.1109/LSP.2016.2618776 doi: 10.1109/LSP.2016.2618776
    [18] H. Li, X. J. Wu, J. Kittler, MDLatLRR: A novel decomposition method for infrared and visible image fusion, IEEE Trans. Image Process., 29 (2020), 4733–4746. https://doi.org/10.1109/TIP.2020.2975984 doi: 10.1109/TIP.2020.2975984
    [19] D. P. Bavirisetti, G. Xiao, G. Liu, Multi-sensor image fusion based on fourth order partial differential equations, in 2017 20th International Conference on Information Fusion (Fusion), IEEE, (2017), 1–9. https://doi.org/10.23919/ICIF.2017.8009719
    [20] N. Cvejic, D. Bull, N. Canagarajah, Region-based multimodal image fusion using ica bases, IEEE Sens. J., 7 (2007), 743–751. https://doi.org/10.1109/JSEN.2007.894926 doi: 10.1109/JSEN.2007.894926
    [21] J. Mou, W. Gao, Z. Song, Image fusion based on non-negative matrix factorization and infrared feature extraction, in 2013 6th International Congress on Image and Signal Processing (CISP), IEEE, (2013), 1046–1050. https://doi.org/10.1109/CISP.2013.6745210
    [22] C. H. Liu, Y. Qi, W. R. Ding, Infrared and visible image fusion method based on saliency detection in sparse domain, Infrared Phys. Technol., 83 (2017), 94–102. https://doi.org/10.1016/j.infrared.2017.04.018 doi: 10.1016/j.infrared.2017.04.018
    [23] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), 2278–2324. https://doi.org/10.1109/5.726791 doi: 10.1109/5.726791
    [24] D. P. Kingma, Auto-encoding variational bayes, preprint, arXiv: 1312.6114.
    [25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 27 (2014), 1–9.
    [26] Y. Liu, X. Chen, H. Peng, Z. Wang, Multi-focus image fusion with a deep convolutional neural network, Inf. Fusion, 36 (2017) 191–207. https://doi.org/10.1016/j.inffus.2016.12.001
    [27] H. Li, X. J. Wu, Densefuse: A fusion approach to infrared and visible images, IEEE Trans. Image Process., 28 (2018), 2614–2623. https://doi.org/10.1109/TIP.2018.2887342 doi: 10.1109/TIP.2018.2887342
    [28] H. Li, X. J. Wu, T. Durrani, Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models, IEEE Trans. Instrum. Meas., 69 (2020), 9645–9656. https://doi.org/10.1109/TIM.2020.3005230 doi: 10.1109/TIM.2020.3005230
    [29] L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, D. Chisholm, Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion, IEEE Trans. Instrum. Meas., 70 (2020), 1–15. https://doi.org/10.1109/TIM.2020.3022438 doi: 10.1109/TIM.2020.3022438
    [30] S. Park, A. G. Vien, C. Lee, Cross-modal transformers for infrared and visible image fusion, IEEE Trans. Circuits Syst. Video Technol., 34 (2023), 770–785. https://doi.org/10.1109/TCSVT.2023.3289170 doi: 10.1109/TCSVT.2023.3289170
    [31] Z. Zhu, Z. Wang, G. Qi, N. Mazur, P. Yang, Y. Liu, Brain tumor segmentation in MRI with multi-modality spatial information enhancement and boundary shape correction, Pattern Recognit., 153 (2024), 110553. https://doi.org/10.1016/j.patcog.2024.110553 doi: 10.1016/j.patcog.2024.110553
    [32] J. Ma, W. Yu, P. Liang, C. Li, J. Jiang, Fusiongan: A generative adversarial network for infrared and visible image fusion, Inf. Fusion, 48 (2019), 11–26. https://doi.org/10.1016/j.inffus.2018.09.004 doi: 10.1016/j.inffus.2018.09.004
    [33] J. Ma, H. Xu, J. Jiang, X. Mei, X. P. Zhang, Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion, IEEE Trans. Image Process., 29 (2020), 4980–4995. https://doi.org/10.1109/TIP.2020.2977573 doi: 10.1109/TIP.2020.2977573
    [34] J. Ma, H. Zhang, Z. Shao, P. Liang, Han Xu, Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, IEEE Trans. Instrum. Meas., 70 (2020), 1–14. https://doi.org/10.1109/TIM.2020.3038013 doi: 10.1109/TIM.2020.3038013
    [35] J. Li, H. Huo, C. Li, R. Wang, Q. Feng, Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks, IEEE Trans. Multimedia, 23 (2020), 1383–1396. https://doi.org/10.1109/TMM.2020.2997127 doi: 10.1109/TMM.2020.2997127
    [36] D. Rao, T. Xu, X. J. Wu, Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network, IEEE Trans. Image Process., 2023, (2023). https://doi.org/10.1109/TIP.2023.3273451
    [37] J. Tang, A contrast based image fusion technique in the DCT domain, Digital Signal Process., 14 (2004), 218–226. https://doi.org/10.1016/j.dsp.2003.06.001 doi: 10.1016/j.dsp.2003.06.001
    [38] J. Tang, Q. Sun, Z.Wang, Y. Cao, Perfect-reconstruction four-tap size-limited filter banks for image fusion application, in 2007 International Conference on Mechatronics and Automation, IEEE, (2007), 255–260. https://doi.org/10.1109/ICMA.2007.4303550
    [39] A. Dosovitskiy, An image is worth 16 $\times$ 16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
    [40] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10012–10022.
    [41] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang, Restormer: Efficient transformer for high-resolution image restoration, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 5728–5739.
    [42] W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, et al., Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 568–578.
    [43] D. Shi, Transnext: Robust foveal visual perception for vision transformers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2024), 17773–17783.
    [44] Y. Liu, Y. Ma, Z. Zhu, J. Cheng, X. Chen, Transsea: Hybrid cnn-transformer with semantic awareness for 3d brain tumor segmentation, IEEE Trans. Instrum. Meas., 73 (2024), 16–31. https://doi.org/10.1109/TIM.2024.3413130 doi: 10.1109/TIM.2024.3413130
    [45] Z. Zhu, X. He, G. Qi, Y. Li, B. Cong, Y. Liu, Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI, Inf. Fusion, 91 (2023), 376–387. https://doi.org/10.1016/j.inffus.2022.10.022 doi: 10.1016/j.inffus.2022.10.022
    [46] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 7132–7141.
    [47] X. Zhu, D. Cheng, Z. Zhang, S. Lin, J. Dai, An empirical study of spatial attention mechanisms in deep networks, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 6688–6697.
    [48] Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, J. Zhang, Didfuse: Deep image decomposition for infrared and visible image fusion, preprint, arXiv: 2003.09210.
    [49] H. Zhang, J. Ma, Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion, Int. J. Comput. Vis., 129 (2021), 2761–2785. https://doi.org/10.1007/s11263-021-01501-8 doi: 10.1007/s11263-021-01501-8
    [50] H. Xu, J. Ma, J. Jiang, X. Guo, H. Ling, U2fusion: A unified unsupervised image fusion network, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2020), 502–518. https://doi.org/10.1109/TPAMI.2020.3012548 doi: 10.1109/TPAMI.2020.3012548
    [51] J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, et al., Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 5802–5811.
    [52] P. Liang, J. Jiang, X. Liu, J. Ma, Fusion from decomposition: A self-supervised decomposition approach for image fusion, in European Conference on Computer Vision, Springer, (2022), 719–735. https://doi.org/10.1007/978-3-031-19797-0_41
    [53] W. Tang, F. He, Y. Liu, Y. Duan, T. Si, Datfuse: Infrared and visible image fusion via dual attention transformer, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 3159–3172. https://doi.org/10.1109/TCSVT.2023.3234340 doi: 10.1109/TCSVT.2023.3234340
    [54] Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, et al., Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 5906–5916.
    [55] J. W. Roberts, J. A. Van Aardt, F. B. Ahmed, Assessment of image fusion procedures using entropy, image quality, and multispectral classification, J. Appl. Remote Sens., 2 (2008), 023522. https://doi.org/10.1117/1.2945910 doi: 10.1117/1.2945910
    [56] Y. J. Rao, In-fibre bragg grating sensors, Meas. Sci. Technol., 8 (1997), 355.
    [57] V. Aslantas, E. Bendes, A new image quality metric for image fusion: The sum of the correlations of differences, AEU Int. J. Electron. Commun., 69 (2015), 1890–1896. https://doi.org/10.1016/j.aeue.2015.09.004 doi: 10.1016/j.aeue.2015.09.004
    [58] Z. Wang, A. C. Bovik, A universal image quality index, IEEE Signal Process Lett., 9 (2002), 81–84. https://doi.org/10.1109/97.995823 doi: 10.1109/97.995823
    [59] Y. Han, Y. Cai, Y. Cao, X. Xu, A new image fusion performance metric based on visual information fidelity, Inf. Fusion, 14 (2013), 127–135. https://doi.org/10.1016/j.inffus.2011.08.002 doi: 10.1016/j.inffus.2011.08.002
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(157) PDF downloads(16) Cited by(0)

Article outline

Figures and Tables

Figures(10)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog