Research article

TSTBFuse: a two-stage three-branch feature extraction method for infrared and visible image fusion

  • Received: 17 April 2025 Revised: 09 June 2025 Accepted: 13 June 2025 Published: 27 June 2025
  • The purpose of image fusion is to combine information from different source images to produce a comprehensively representative image. Traditional autoencoder architectures often struggle to effectively extract both unique and shared features from these image types. A novel two-stage three-branch feature extraction method (TSTBFuse) was proposed in the study, specialized for the fusion of infrared and visible images. The proposed architecture employed a three-branch encoder that separately captured infrared-specific thermal radiation features, visible-specific texture details, and shared structural information. A two-stage end-to-end training strategy was introduced: the first stage focused on reconstructing the original input images to preserve modality-specific information, while the second stage leveraged the learned representations to generate high-quality fused images. we designed a comprehensive loss function combining mean squared error (MSE), structural similarity index (SSIM), and gradient loss, ensuring both pixel-level accuracy and structural integrity. Extensive experiments on public datasets (TNO, MSRS and RoadScene) demonstrated that TSTBFuse consistently outperformed seven state-of-the-art methods in both subjective and objective evaluations. Furthermore, the method exhibited strong generalization capabilities, successfully extending to challenging tasks such as magnetic resonance imaging-computed tomography (MRI-CT) medical image fusion and red-green-blue (RGB)-infrared image fusion without retraining. The code is publicly available at: https://github.com/QXinYue/TSTBFuse.

    Citation: Wangwei Zhang, Xinyue Qin, Menghao Dai, Bin Zhou, Changhai Wang, ZhiHeng Wang, SongZe Li. TSTBFuse: a two-stage three-branch feature extraction method for infrared and visible image fusion[J]. Electronic Research Archive, 2025, 33(6): 4045-4073. doi: 10.3934/era.2025180

    Related Papers:

  • The purpose of image fusion is to combine information from different source images to produce a comprehensively representative image. Traditional autoencoder architectures often struggle to effectively extract both unique and shared features from these image types. A novel two-stage three-branch feature extraction method (TSTBFuse) was proposed in the study, specialized for the fusion of infrared and visible images. The proposed architecture employed a three-branch encoder that separately captured infrared-specific thermal radiation features, visible-specific texture details, and shared structural information. A two-stage end-to-end training strategy was introduced: the first stage focused on reconstructing the original input images to preserve modality-specific information, while the second stage leveraged the learned representations to generate high-quality fused images. we designed a comprehensive loss function combining mean squared error (MSE), structural similarity index (SSIM), and gradient loss, ensuring both pixel-level accuracy and structural integrity. Extensive experiments on public datasets (TNO, MSRS and RoadScene) demonstrated that TSTBFuse consistently outperformed seven state-of-the-art methods in both subjective and objective evaluations. Furthermore, the method exhibited strong generalization capabilities, successfully extending to challenging tasks such as magnetic resonance imaging-computed tomography (MRI-CT) medical image fusion and red-green-blue (RGB)-infrared image fusion without retraining. The code is publicly available at: https://github.com/QXinYue/TSTBFuse.



    加载中


    [1] Y. Kuai, D. Li, Z. Gao, M. Yuan, D. Zhang, Visible-infrared dual-sensor tracking based on transformer via progressive feature enhancement and fusion, IEEE Sens. J., 24 (2024), 14519–14528. https://doi.org/10.1109/JSEN.2024.3372991 doi: 10.1109/JSEN.2024.3372991
    [2] J. Wang, L. Chu, C. Guo, Y. Zhang, Z. Cao, Target track enhancement based on asynchronous radar and camera fusion in intelligent driving system, IEEE Sens. J., 24 (2023), 3131–3143. https://doi.org/10.1109/JSEN.2023.3339328 doi: 10.1109/JSEN.2023.3339328
    [3] H. Qin, X. Zhang, R. Gong, Y. Ding, Y. Xu, X. Liu, Distribution-sensitive information retention for accurate binary neural network, Int. J. Comput. Vision, 131 (2023), 26–47. https://doi.org/10.1007/s11263-022-01687-5 doi: 10.1007/s11263-022-01687-5
    [4] Q. Zhang, T. Xiao, N. Huang, D. Zhang, J. Han, Revisiting feature fusion for RGB-T salient object detection, IEEE Trans. Circuits Syst. Video Technol., 31 (2020), 1804–1818. https://doi.org/10.1109/TCSVT.2020.3014663 doi: 10.1109/TCSVT.2020.3014663
    [5] X. Li, C. Luo, J. Dezert, Y. Tan, Generic object recognition based on feature fusion in robot perception, Int. J. Rob. Autom., 31 (2016), 1–7. https://dx.doi.org/10.2316/Journal.206.2016.5.206-4706 doi: 10.2316/Journal.206.2016.5.206-4706
    [6] N. Aldahoul, H. A. Karim, M. A. Momo, F. I. F. Escobara, M. J. T. Tan, Space object recognition with stacking of CoAtNets using fusion of RGB and depth images, IEEE Access, 11 (2023), 5089–5109. https://doi.org/10.1109/ACCESS.2023.3235965 doi: 10.1109/ACCESS.2023.3235965
    [7] C. Thomas, T. Ranchin, L. Wald, J. Chanussot, Synthesis of multispectral images to high spatial resolution: A critical review of fusion methods based on remote sensing physics, IEEE Trans. Geosci. Remote Sens., 46 (2008), 1301–1312. https://doi.org/10.1109/TGRS.2007.912448 doi: 10.1109/TGRS.2007.912448
    [8] Y. Liu, X. Chen, R. K. Ward, Z. J. Wang, Image fusion with convolutional sparse representation, IEEE Signal Process. Lett., 23 (2016), 1882–1886. https://doi.org/10.1109/LSP.2016.2618776 doi: 10.1109/LSP.2016.2618776
    [9] A. Dogra, B. Goyal, S. Agrawal, From multi-scale decomposition to non-multi-scale decomposition methods: A comprehensive survey of image fusion techniques and its applications, IEEE Access, 5 (2017), 16040–16067. https://doi.org/10.1109/ACCESS.2017.2735865 doi: 10.1109/ACCESS.2017.2735865
    [10] E. Karami, S. Prasad, M. Shehata, Image matching using SIFT, SURF, BRIEF and ORB: Performance comparison for distorted images, preprint, arXiv: 1710.02726. https://doi.org/10.48550/arXiv.1710.02726
    [11] J. A. Aghamaleki, A. Ghorbani, Image fusion using dual tree discrete wavelet transform and weights optimization, Visual Comput., 39 (2023), 1181–1191. https://doi.org/10.1007/s00371-021-02396-9 doi: 10.1007/s00371-021-02396-9
    [12] Y. Liu, S. Liu, Z. Wang, A general framework for image fusion based on multi-scale transform and sparse representation, Inf. Fusion, 24 (2015), 147–164. https://doi.org/10.1016/j.inffus.2014.09.004 doi: 10.1016/j.inffus.2014.09.004
    [13] H. Ghassemian, A review of remote sensing image fusion methods, Inf. Fusion, 32 (2016), 75–89. https://doi.org/10.1016/j.inffus.2016.03.003 doi: 10.1016/j.inffus.2016.03.003
    [14] J. Fu, W. Li, J. Du, B. Xiao, Multimodal medical image fusion via laplacian pyramid and convolutional neural network reconstruction with local gradient energy strategy, Comput. Biol. Med., 126 (2020), 104048. https://doi.org/10.1016/j.compbiomed.2020.104048 doi: 10.1016/j.compbiomed.2020.104048
    [15] J. Ma, Y. Ma, C. Li, Infrared and visible image fusion methods and applications: A survey, Inf. Fusion, 45 (2019), 153–178. https://doi.org/10.1016/j.inffus.2018.02.004 doi: 10.1016/j.inffus.2018.02.004
    [16] H. Li, X. J. Wu, DenseFuse: A fusion approach to infrared and visible images, IEEE Trans. Image Process., 28 (2018), 2614–2623. https://doi.org/10.1109/TIP.2018.2887342 doi: 10.1109/TIP.2018.2887342
    [17] H. Li, X. J. Wu, J. Kittler, Infrared and visible image fusion using a deep learning framework, in 2018 24th International Conference on Pattern Recognition (ICPR), (2018), 2705–2710. https://doi.org/10.1109/ICPR.2018.8546006
    [18] J. Ma, W. Yu, P. Liang, C. Li, J. Jiang, FusionGAN: A generative adversarial network for infrared and visible image fusion, Inf. Fusion, 48 (2019), 11–26. https://doi.org/10.1016/j.inffus.2018.09.004 doi: 10.1016/j.inffus.2018.09.004
    [19] J. Ma, H. Zhang, Z. Shao, P. Liang, H. Xu, GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, IEEE Trans. Instrum. Meas., 70 (2020), 1–14. https://doi.org/10.1109/TIM.2020.3038013 doi: 10.1109/TIM.2020.3038013
    [20] Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, J. Zhang, DIDFuse: Deep image decomposition for infrared and visible image fusion, preprint, arXiv: 2003.09210. https://doi.org/10.48550/arXiv.2003.09210
    [21] H. Li, X. J. Wu, T. Durrani, NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models, IEEE Trans. Instrum. Meas., 69 (2020), 9645–9656. https://doi.org/10.1109/TIM.2020.3005230 doi: 10.1109/TIM.2020.3005230
    [22] N. Aishwarya, C. B. Thangammal, Visible and infrared image fusion using DTCWT and adaptive combined clustered dictionary, Infrared Phys. Technol., 93 (2018), 300–309. https://doi.org/10.1016/j.infrared.2018.08.013 doi: 10.1016/j.infrared.2018.08.013
    [23] J. Li, J. Liu, S. Zhou, Q. Zhang, N. K. Kasabov, Learning a coordinated network for detail-refinement multiexposure image fusion, IEEE Trans. Circuits Syst. Video Technol., 33 (2022), 713–727. https://doi.org/10.1109/TCSVT.2022.3202692 doi: 10.1109/TCSVT.2022.3202692
    [24] F. P. An, X. M. Ma, L. Bai, Image fusion algorithm based on unsupervised deep learning-optimized sparse representation, Biomed. Signal Process. Control, 71 (2022), 103140. https://doi.org/10.1016/j.bspc.2021.103140 doi: 10.1016/j.bspc.2021.103140
    [25] H. Zhang, H. Xu, X. Tian, J. Jiang, J. Ma, Image fusion meets deep learning: A survey and perspective, Inf. Fusion, 76 (2021), 323–336. https://doi.org/10.1016/j.inffus.2021.06.008 doi: 10.1016/j.inffus.2021.06.008
    [26] R. C. Gonzalez, Digital Image Processing, Pearson Education India, 2009. https://doi.org/10.1117/1.3115362
    [27] G. M. Foody, Sharpening fuzzy classification output to refine the representation of sub-pixel land cover distribution, Int. J. Remote Sens., 19 (1998), 2593–2599. https://doi.org/10.1080/014311698214659 doi: 10.1080/014311698214659
    [28] P. J. Burt, E. H. Adelson, The Laplacian pyramid as a compact image code, Read. Comput. Vision, 1987 (1987), 671–679. https://doi.org/10.1016/B978-0-08-051581-6.50065-9 doi: 10.1016/B978-0-08-051581-6.50065-9
    [29] L. Li, S. Jiang, Q. Huang, Learning image Vicept description via mixed-norm regularization for large scale semantic image search, CVPR 2011, 2011 (2011), 825–832. https://doi.org/10.1109/CVPR.2011.5995570 doi: 10.1109/CVPR.2011.5995570
    [30] M. A. Azam, K. B. Khan, S. Salahuddin, E. Rehman, S. A. Khan, M. A. Khan, et al., A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics, Comput. Biol. Med., 144 (2022), 105253. https://doi.org/10.1016/j.compbiomed.2022.105253 doi: 10.1016/j.compbiomed.2022.105253
    [31] S. Kavitha, K. K. Thyagharajan, Efficient DWT-based fusion techniques using genetic algorithm for optimal parameter estimation, Soft Comput., 21 (2017), 3307–3316. https://doi.org/10.1007/s00500-015-2009-6 doi: 10.1007/s00500-015-2009-6
    [32] M. Yin, W. Liu, X. Zhao, Y. Yin, Y. Guo, A novel image fusion algorithm based on nonsubsampled shearlet transform, Optik, 125 (2014), 2274–2282. https://doi.org/10.1016/j.ijleo.2013.10.064 doi: 10.1016/j.ijleo.2013.10.064
    [33] Y. Li, H. Zhao, Z. Hu, Q. Wang, Y. Chen, IVFuseNet: Fusion of infrared and visible light images for depth prediction, Inf. Fusion, 58 (2020), 1–12. https://doi.org/10.1016/j.inffus.2019.12.014 doi: 10.1016/j.inffus.2019.12.014
    [34] Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, L. Zhang, IFCNN: A general image fusion framework based on convolutional neural network, Inf. Fusion, 54 (2020), 99–118. https://doi.org/10.1016/j.inffus.2019.07.011 doi: 10.1016/j.inffus.2019.07.011
    [35] J. Yue, L. Fang, S. Xia, Y. Deng, J. Ma, Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models, IEEE Trans. Image Process., 32 (2023), 5705–5720. https://doi.org/10.1109/TIP.2023.3322046 doi: 10.1109/TIP.2023.3322046
    [36] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 4700–4708. https://doi.org/10.1109/CVPR.2017.243
    [37] H. Xu, M. Gong, X. Tian, J. Huang, J. Ma, CUFD: An encoder-decoder network for visible and infrared image fusion based on common and unique feature decomposition, Comput. Vision Image Understanding, 218 (2022), 103407. https://doi.org/10.1016/j.cviu.2022.103407 doi: 10.1016/j.cviu.2022.103407
    [38] W. Tang, F. He, Y. Liu, Y. Duan, T. Si, DATFuse: Infrared and visible image fusion via dual attention transformer, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 3159–3172. https://doi.org/10.1109/TCSVT.2023.3234340 doi: 10.1109/TCSVT.2023.3234340
    [39] M. Xia, C. Lin, B. Xu, Q. Li, H. Fang, Z. Huang, DSAFusion: Detail-semantic-aware network for infrared and low-light visible image fusion, Infrared Phys. Technol., 147 (2025), 105804. https://doi.org/10.1016/j.infrared.2025.105804 doi: 10.1016/j.infrared.2025.105804
    [40] Z. Huang, C. Lin, B. Xu, M. Xia, Q. Li, Y. Li and N. Sang, T2EA: Target-aware Taylor expansion approximation network for infrared and visible image fusion, IEEE Trans. Circuits Syst. Video Technol., 35 (2025), Forthcoming.
    [41] Z. Huang, C. Lin, B. Xu, M. Xia, Q. Li, N. Sang, MSCS: Multi-stage feature learning with channel-spatial attention mechanism for infrared and visible image fusion, Infrared Phys. Technol., 142 (2024), 105514. https://doi.org/10.1016/j.infrared.2024.105514 doi: 10.1016/j.infrared.2024.105514
    [42] R. Hou, VIF-Net: An unsupervised framework for infrared and visible image fusion, IEEE Trans. Comput. Imaging, 6 (2020), 640–651. https://doi.org/10.1109/TCI.2020.2965304 doi: 10.1109/TCI.2020.2965304
    [43] Y. Liu, Y. Zang, D. Zhou, J. Cao, R. Nie, R. Hou, et al., An improved hybrid network with a Transformer module for medical image fusion, IEEE J. Biomed. Health Inf., 27 (2023), 3489–3500. https://doi.org/10.1109/JBHI.2023.3264819 doi: 10.1109/JBHI.2023.3264819
    [44] Z. Ding, H. Li, D. Zhou, Y. Liu, R. Hou, A robust infrared and visible image fusion framework via multi-receptive-field attention and color visual perception, Appl. Intell., 53 (2023), 8114–8132. https://doi.org/10.1007/s10489-022-03952-z doi: 10.1007/s10489-022-03952-z
    [45] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594
    [46] L. Tang, J. Yuan, H. Zhang, X. Jiang, J. Ma, PIAFusion: A progressive infrared and visible image fusion network based on illumination aware, Inf. Fusion, 83 (2022), 79–92. https://doi.org/10.1016/j.inffus.2022.03.007 doi: 10.1016/j.inffus.2022.03.007
    [47] A. Toet, M. A. Hogervorst, Progress in color night vision, Optical Eng., 51 (2012), 010901. https://doi.org/10.1117/1.OE.51.1.010901 doi: 10.1117/1.OE.51.1.010901
    [48] H. Xu, J. Ma, Z. Le, J. Jiang, X. Guo, Fusiondn: A unified densely connected network for image fusion, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 12484–12491. https://doi.org/10.1609/aaai.v34i07.6936
    [49] H. Xu, H. Zhang, J. Ma, Classification saliency-based rule for visible and infrared image fusion, IEEE Trans. Comput. Imaging, 7 (2021), 824–836. https://doi.org/10.1109/TCI.2021.3100986 doi: 10.1109/TCI.2021.3100986
    [50] H. Zhang, J. Ma, SDNet: A versatile squeeze-and-decomposition network for real-time image fusion, Int. J. Comput. Vision, 129 (2021), 2761–2785. https://doi.org/10.1007/s11263-021-01501-8 doi: 10.1007/s11263-021-01501-8
    [51] H. Xu, J. Ma, J. Jiang, X. Guo, H. Ling, U2Fusion: A unified unsupervised image fusion network, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2020), 502–518. https://doi.org/10.1109/TPAMI.2020.3012548 doi: 10.1109/TPAMI.2020.3012548
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(623) PDF downloads(26) Cited by(0)

Article outline

Figures and Tables

Figures(11)  /  Tables(7)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog