This paper presents a method called MCADFusion, a feature decomposition technique specifically designed for the fusion of infrared and visible images, incorporating target radiance and detailed texture. MCADFusion employs an innovative two-branch architecture that effectively extracts and decomposes both local and global features from different source images, thereby enhancing the processing of image feature information. The method begins with a multi-scale feature extraction module and a reconstructor module to obtain local and global feature information from rich source images. Subsequently, the local and global features of different source images are decomposed using the the channel attention module (CAM) and the spatial attention module (SAM). Feature fusion is then performed through a two-channel attention merging method. Finally, image reconstruction is achieved using the restormer module. During the training phase, MCADFusion employs a two-stage strategy to optimize the network parameters, resulting in high-quality fused images. Experimental results demonstrate that MCADFusion surpasses existing techniques in both subjective visual evaluation and objective assessment on publicly available TNO and MSRS datasets, underscoring its superiority.
Citation: Wangwei Zhang, Menghao Dai, Bin Zhou, Changhai Wang. MCADFusion: a novel multi-scale convolutional attention decomposition method for enhanced infrared and visible light image fusion[J]. Electronic Research Archive, 2024, 32(8): 5067-5089. doi: 10.3934/era.2024233
This paper presents a method called MCADFusion, a feature decomposition technique specifically designed for the fusion of infrared and visible images, incorporating target radiance and detailed texture. MCADFusion employs an innovative two-branch architecture that effectively extracts and decomposes both local and global features from different source images, thereby enhancing the processing of image feature information. The method begins with a multi-scale feature extraction module and a reconstructor module to obtain local and global feature information from rich source images. Subsequently, the local and global features of different source images are decomposed using the the channel attention module (CAM) and the spatial attention module (SAM). Feature fusion is then performed through a two-channel attention merging method. Finally, image reconstruction is achieved using the restormer module. During the training phase, MCADFusion employs a two-stage strategy to optimize the network parameters, resulting in high-quality fused images. Experimental results demonstrate that MCADFusion surpasses existing techniques in both subjective visual evaluation and objective assessment on publicly available TNO and MSRS datasets, underscoring its superiority.
[1] | B. Meher, S. Agrawal, R. Panda, A. Abraham, A survey on region based image fusion methods, Inf. Fusion, 48 (2019), 119–132. https://doi.org/10.1016/j.inffus.2018.07.010 doi: 10.1016/j.inffus.2018.07.010 |
[2] | S. Karim, G. Tong, J. Li, A. Qadir, U. Farooq, Y. Yu, Current advances and future perspectives of image fusion: A comprehensive review, Inf. Fusion, 90 (2023), 185–217. https://doi.org/10.1016/j.inffus.2022.09.019 doi: 10.1016/j.inffus.2022.09.019 |
[3] | K. Ma, Z. Duanmu, H. Yeganeh, Z. Wang, Multi-exposure image fusion by optimizing a structural similarity index, IEEE Trans. Comput. Imaging, 4 (2018), 60–72. https://doi.org/10.1109/TCI.2017.2786138 doi: 10.1109/TCI.2017.2786138 |
[4] | X. Zhang, Deep learning-based multi-focus image fusion: A survey and a comparative study, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 4819–4838. https://doi.org/10.1109/TPAMI.2021.3078906 doi: 10.1109/TPAMI.2021.3078906 |
[5] | J. Liu, X. Fan, J. Jiang, R. Liu, Z. Luo, Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion, IEEE Trans. Circuits Syst. Video Technol., 32 (2022), 105–119. https://doi.org/10.1109/TCSVT.2021.3056725 doi: 10.1109/TCSVT.2021.3056725 |
[6] | H. Xu, J. Ma, J. Jiang, X. Guo, H. Ling, U2Fusion: A unified unsupervised image fusion network, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 502–518. https://doi.org/10.1109/TPAMI.2020.3012548 doi: 10.1109/TPAMI.2020.3012548 |
[7] | Z. Zhao, S. Xu, C. Zhang, J. Liu, J. Zhang, P. Li, DIDFuse: Deep image decomposition for infrared and visible image fusion, in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, (2021), 970–976. http://doi.org/10.24963/ijcai.2020/135 |
[8] | W. G. C. Bandara, V. M. Patel, Hypertransformer: A textural and spectral feature fusion transformer for pansharpening, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 1767–1777. https://doi.org/10.1109/CVPR52688.2022.00181 |
[9] | S. Xu, J. Zhang, Z. Zhao, K. Sun, J. Liu, C. Zhang, Deep gradient projection networks for pan-sharpening, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 1366–1375. https://doi.org/10.1109/CVPR46437.2021.00142 |
[10] | Z. Zhao, J. Zhan, S. Xu, K. Sun, L. Huang, J. Liu, et al., FGF-GAN: A lightweight generative adversarial network for pansharpening via fast guided filter, in 2021 IEEE International Conference on Multimedia and Expo (ICME), (2021), 1–6. https://doi.org/10.1109/ICME51207.2021.9428272 |
[11] | D. Cheng, R. Liao, S. Fidler, R. Urtasun, DARNet: Deep active ray network for building segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 7431–7439. https://doi.org/10.1109/CVPR.2019.00761 |
[12] | H. Qin, M. Zhang, Y. Ding, A. Li, Z. Cai, Z. Liu, et al., BiBench: Benchmarking and analyzing network binarization, in Proceedings of the 40th International Conference on Machine Learning, (2023), 28351–28388. |
[13] | C. He, K. Li, Y. Zhang, Y. Zhang, Z. Guo, X. Li, et al., Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects, preprint, arXiv: 2308.03166. |
[14] | C. He, K. Li, Y. Zhang, L. Tang, Y. Zhang, Z. Guo, et al., Camouflaged object detection with feature decomposition and edge reconstruction, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 22046–22055. https://doi.org/10.1109/CVPR52729.2023.02111 |
[15] | C. He, K. Li, Y. Zhang, G. Xu, L. Tang, Y. Zhang, et al., Weakly-supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping, in Advances in Neural Information Processing Systems, 36 (2023). |
[16] | J. Wang, Z. Yin, P. Hu, A. Liu, R. Tao, H. Qin, et al., Defensive patches for robust recognition in the physical world, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 2456–2465. https://doi.org/10.1109/CVPR52688.2022.00249 |
[17] | H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, J. C. Ye, Diffusion posterior sampling for general noisy inverse problems, preprint, arXiv: 2209.14687. |
[18] | X. Deng, P. L. Dragotti, Deep convolutional neural network for multi-modal image restoration and fusion, IEEE Trans. Pattern Anal. Mach. Intell., 43 (2021), 3333–3348. https://doi.org/10.1109/TPAMI.2020.2984244 doi: 10.1109/TPAMI.2020.2984244 |
[19] | A. Ben Hamza, Y. He, H. Krim, A. Willsky, A multiscale approach to pixel-level image fusion, Integr. Comput.-Aided Eng., 12 (2005), 135–146. https://doi.org/10.3233/ICA-2005-12201 doi: 10.3233/ICA-2005-12201 |
[20] | J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, et al., Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 5802–5811. https://doi.org/10.1109/CVPR52688.2022.00571 |
[21] | L. Tang, Y. Deng, Y. Ma, J. Huang, J. Ma, Superfusion: A versatile image registration and fusion network with semantic awareness, IEEE/CAA J. Autom. Sin., 9 (2022), 2121–2137. https://doi.org/10.1109/JAS.2022.106082 doi: 10.1109/JAS.2022.106082 |
[22] | H. Li, X. J. Wu, J. Kittler, RFN-Nest: An end-to-end residual fusion network for infrared and visible images, Inf. Fusion, 73 (2021), 72–86. https://doi.org/10.1016/j.inffus.2021.02.023 doi: 10.1016/j.inffus.2021.02.023 |
[23] | X. Li, X. Guo, P. Han, X. Wang, H. Li, T. Luo, Laplacian redecomposition for multimodal medical image fusion, IEEE Trans. Instrum. Meas., 69 (2020), 6880–6890. https://doi.org/10.1109/TIM.2020.2975405 doi: 10.1109/TIM.2020.2975405 |
[24] | H. Li, X. J. Wu, J. Kittler, Infrared and visible image fusion using a deep learning framework, in 2018 24th International Conference on Pattern Recognition (ICPR), (2018), 2705–2710. https://doi.org/10.1109/ICPR.2018.8546006 |
[25] | H. Li, X. J. Wu, DenseFuse: A fusion approach to infrared and visible images, IEEE Trans. Image Process., 28 (2019), 2614–2623. https://doi.org/10.1109/TIP.2018.2887342 doi: 10.1109/TIP.2018.2887342 |
[26] | S. Li, X. Kang, J. Hu, Image fusion with guided filtering, IEEE Trans. Image Process., 22 (2013), 2864–2875. https://doi.org/10.1109/TIP.2013.2244222 doi: 10.1109/TIP.2013.2244222 |
[27] | M. Li, Y. Dong, X. Wang, Pixel level image fusion based the wavelet transform, in 2013 6th International Congress on Image and Signal Processing (CISP), 2 (2013), 995–999. https://doi.org/10.1109/CISP.2013.6745310 |
[28] | Y. Liu, X. Chen, H. Peng, Z. Wang, Multi-focus image fusion with a deep convolutional neural network, Inf. Fusion, 36 (2017), 191–207. https://doi.org/10.1016/j.inffus.2016.12.001 doi: 10.1016/j.inffus.2016.12.001 |
[29] | J. Ma, H. Zhang, Z. Shao, P. Liang, H. Xu, GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion, IEEE Trans. Instrum. Meas., 70 (2021), 1–14. https://doi.org/10.1109/TIM.2020.3038013 doi: 10.1109/TIM.2020.3038013 |
[30] | H. Li, X. J. Wu, T. Durrani, NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models, IEEE Trans. Instrum. Meas., 69 (2020), 9645–9656. https://doi.org/10.1109/TIM.2020.3005230 doi: 10.1109/TIM.2020.3005230 |
[31] | S. Huang, Y. Yang, X. Jin, Y. Zhang, Q. Jiang, S. Yao, Multi-sensor image fusion using optimized support vector machine and multiscale weighted principal component analysis, Electronics, 9 (2020), 1531. https://doi.org/10.3390/electronics9091531 doi: 10.3390/electronics9091531 |
[32] | R. Liu, J. Liu, Z. Jiang, X. Fan, Z. Luo, A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion, IEEE Trans. Image Process., 30 (2021), 1261–1274. https://doi.org/10.1109/TIP.2020.3043125 doi: 10.1109/TIP.2020.3043125 |
[33] | J. Ma, L. Tang, M. Xu, H. Zhang, G. Xiao, STDFusionNet: An infrared and visible image fusion network based on salient target detection, IEEE Trans. Instrum. Meas., 70 (2021), 1–13. https://doi.org/10.1109/TIM.2021.3075747 doi: 10.1109/TIM.2021.3075747 |
[34] | D. Wang, J. Liu, X. Fan, R. Liu, Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration, preprint, arXiv: 2205.11876. |
[35] | J. Wang, Z. Wei, T. Zhang, W. Zeng, Deeply-fused nets, preprint, arXiv: 1605.07716. |
[36] | C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594 |
[37] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, preprint, arXiv: 1706.03762. |
[38] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
[39] | Z. Wu, Z. Liu, J. Lin, Y. Lin, S. Han, Lite transformer with long-short range attention, preprint, arXiv: 2004.11886. |
[40] | S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang, restormer: Efficient transformer for high-resolution image restoration, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 5728–5739. https://doi.org/10.1109/CVPR52688.2022.00564 |
[41] | X. Lin, S. Sun, W. Huang, B. Sheng, P. Li, D. D. F. Feng, EAPT: Efficient attention pyramid transformer for image processing, IEEE Trans. Multimedia, 25 (2023), 50–61. https://doi.org/10.1109/TMM.2021.3120873 doi: 10.1109/TMM.2021.3120873 |
[42] | L. Tang, J. Yuan, J. Ma, Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network, Inf. Fusion, 82 (2022), 28–42. https://doi.org/10.1016/j.inffus.2021.12.004 doi: 10.1016/j.inffus.2021.12.004 |
[43] | Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, et al., CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 5906–5916. https://doi.org/10.1109/CVPR52729.2023.00572 |
[44] | L. Tang, J. Yuan, H. Zhang, X. Jiang, J. Ma, PIAFusion: A progressive infrared and visible image fusion network based on illumination aware, Inf. Fusion, 83 (2022), 79–92. https://doi.org/10.1016/j.inffus.2022.03.007 doi: 10.1016/j.inffus.2022.03.007 |
[45] | A. Toet, M. A. Hogervorst, Progress in color night vision, Opt. Eng., 51 (2012), 010901. https://doi.org/10.1117/1.OE.51.1.010901 doi: 10.1117/1.OE.51.1.010901 |
[46] | H. Zhang, J. Ma, SDNet: A versatile squeeze-and-decomposition network for real-time image fusion, Int. J. Comput. Vision, (2021), 1–25. https://doi.org/10.1007/s11263-021-01501-8 doi: 10.1007/s11263-021-01501-8 |
[47] | P. Liang, J. Jiang, X. Liu, J. Ma, Fusion from decomposition: A self-supervised decomposition approach for image fusion, in European Conference on Computer Vision (ECCV), (2022), 719–735. https://doi.org/10.1007/978-3-031-19797-0_41 |
[48] | Z. Huang, J. Liu, X. Fan, R. Liu, W. Zhong, Z. Luo, Reconet: Recurrent correction network for fast and efficient multi-modality image fusion, in European Conference on Computer Vision, Springer, (2022), 539–555. https://doi.org/10.1007/978-3-031-19797-0_31 |
[49] | W. Tan, H. Zhou, J. Song, H. Li, Y. Yu, J. Du, Infrared and visible image perceptive fusion through multi-level gaussian curvature filtering image decomposition, Appl. Opt., 58 (2019), 3064–3073. https://doi.org/10.1364/AO.58.003064 doi: 10.1364/AO.58.003064 |
[50] | W. Tang, F. He, Y. Liu, Y. Duan, T. Si, DATFuse: Infrared and visible image fusion via dual attention transformer, IEEE Trans. Circuits Syst. Video Technol., 33 (2023), 3159–3172. https://doi.org/10.1109/TCSVT.2023.3234340 doi: 10.1109/TCSVT.2023.3234340 |
[51] | W. Wang, U. Neumann, Depth-aware CNN for RGB-D segmentation, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 135–150. https://doi.org/10.1007/978-3-030-01252-6_9 |
[52] | D. Lin, G. Chen, D. Cohen-Or, P. A. Heng, H. Huang, Cascaded feature network for semantic segmentation of RGB-D images, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017), 1311–1319. https://doi.org/10.1109/ICCV.2017.147 |
[53] | G. Qi, Y. Zhang, K. Wang, N. Mazur, Y. Liu, D. Malaviya, Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion, Remote Sens., 14 (2022), 420. https://doi.org/10.3390/rs14020420 doi: 10.3390/rs14020420 |
[54] | Z. Zhu, H. Yin, Y. Chai, Y. Li, G. Qi, A novel multi-modality image fusion method based on image decomposition and sparse representation, Inf. Sci., 432 (2018), 516–529. https://doi.org/10.1016/j.ins.2017.09.010 doi: 10.1016/j.ins.2017.09.010 |
[55] | K. Wang, M. Zheng, H. Wei, G. Qi, Y. Li, Multi-modality medical image fusion using convolutional neural network and contrast pyramid, Sensors, 20 (2020), 2169. https://doi.org/10.3390/s20082169 doi: 10.3390/s20082169 |
[56] | C. Wang, C. Wang, W. Li, H. Wang, A brief survey on RGB-D semantic segmentation using deep learning, Displays, 70 (2021), 102080. https://doi.org/10.1016/j.displa.2021.102080 doi: 10.1016/j.displa.2021.102080 |
[57] | Y. Chen, L. Wan, Z. Li, Q. Jing, Z. Sun, Neural feature search for RGB-infrared person re-identification, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 587–597. https://doi.org/10.1109/CVPR46437.2021.00065 |
[58] | C. Wang, X. Ning, W. Li, X. Bai, X. Gao, 3D person re-identification based on global semantic guidance and local feature aggregation, IEEE Trans. Circuits Syst. Video Technol., 34 (2024), 4698–4712. https://doi.org/10.1109/TCSVT.2023.3328712 doi: 10.1109/TCSVT.2023.3328712 |