Security systems place great emphasis on the safety of stored cargo, as any loss or tampering can result in significant economic damage. The cargo identification module within the security system faces the challenge of achieving a 99.99% recognition accuracy. However, current identification methods are limited in accuracy due to the lack of cargo data, insufficient utilization of image features and minimal differences between actual cargo classes. First, we collected and created a cargo identification dataset named "Cargo" using industrial cameras. Subsequently, an Attention-guided Multi-granularity feature fusion model (AGMG-Net) was proposed for cargo identification. This model extracts both coarse-grained and fine-grained features of the cargo using two branch networks and fuses them to fully utilize the information contained in these features. Furthermore, the Attention-guided Multi-stage Attention Accumulation (AMAA) module is introduced for target localization, and the Multi-region Optimal Selection method Based on Confidence (MOSBC) module is used for target cropping. The features from the two branches are fused using a fusion branch in a Concat manner for multi-granularity feature fusion. The experimental results show that the proposed model achieves an average recognition rate of 99.58, 92.73 and 88.57% on the self-built dataset Cargo, and the publicly available datasets Flower and Butterfly20, respectively. This is better than the state-of-the-art model. Therefore, this research method accurately identifies cargo categories and provides valuable assistance to security systems.
Citation: Aigou Li, Chen Yang. AGMG-Net: Leveraging multiscale and fine-grained features for improved cargo recognition[J]. Mathematical Biosciences and Engineering, 2023, 20(9): 16744-16761. doi: 10.3934/mbe.2023746
Security systems place great emphasis on the safety of stored cargo, as any loss or tampering can result in significant economic damage. The cargo identification module within the security system faces the challenge of achieving a 99.99% recognition accuracy. However, current identification methods are limited in accuracy due to the lack of cargo data, insufficient utilization of image features and minimal differences between actual cargo classes. First, we collected and created a cargo identification dataset named "Cargo" using industrial cameras. Subsequently, an Attention-guided Multi-granularity feature fusion model (AGMG-Net) was proposed for cargo identification. This model extracts both coarse-grained and fine-grained features of the cargo using two branch networks and fuses them to fully utilize the information contained in these features. Furthermore, the Attention-guided Multi-stage Attention Accumulation (AMAA) module is introduced for target localization, and the Multi-region Optimal Selection method Based on Confidence (MOSBC) module is used for target cropping. The features from the two branches are fused using a fusion branch in a Concat manner for multi-granularity feature fusion. The experimental results show that the proposed model achieves an average recognition rate of 99.58, 92.73 and 88.57% on the self-built dataset Cargo, and the publicly available datasets Flower and Butterfly20, respectively. This is better than the state-of-the-art model. Therefore, this research method accurately identifies cargo categories and provides valuable assistance to security systems.
[1] | H. A. Khan, W. Jue, M. Mushtaq, M. U. Mushtaq, Brain tumor classification in MRI image using convolutional neural network, Math. Biosci. Eng., 17 (2020), 6203–6216. https://doi.org/10.3934/mbe.2020328 doi: 10.3934/mbe.2020328 |
[2] | S. Cao, B. Song, Visual attentional-driven deep learning method for flower recognition, Math. Biosci. Eng., 18 (2021), 1981–1991. https://doi.org/10.3934/mbe.2021103 doi: 10.3934/mbe.2021103 |
[3] | S. M. Zainab, K. Khan, A. Fazil, M. Zakwan, Foreign Object Debris (FOD) classification through material recognition using deep convolutional neural network with focus on metal, IEEE Access, 11 (2023), 10925–10934. https://doi.org/10.1109/ACCESS.2023.3239424 doi: 10.1109/ACCESS.2023.3239424 |
[4] | Z. Cao, Y. Qin, Z. Xie, Q. Liu, E. Zhang, Z. Wu, et al., An effective railway intrusion detection method using dynamic intrusion region and lightweight neural network, Measurement, 191 (2022), 110564. https://doi.org/10.1016/j.measurement.2021.110564 doi: 10.1016/j.measurement.2021.110564 |
[5] | F. Azam, A. Rizvi, W. Z. Khan, M. Y. Aalsalem, H. Yu, Y. B. Zikria, Aircraft classification based on PCA and feature fusion techniques in convolutional neural network, IEEE Access, 9 (2021), 161683–161694. https://doi.org/10.1109/ACCESS.2021.3132062 doi: 10.1109/ACCESS.2021.3132062 |
[6] | F. Peng, L. Qin, M. Long, Face morphing attack detection and attacker identification based on a watchlist, Signal Process. Image Commun., 107 (2022), 116748. https://doi.org/10.1016/j.image.2022.116748 doi: 10.1016/j.image.2022.116748 |
[7] | A. S. Jaggi, R. S. Sawhney, P. P. Balestrassi, J. Simonton, G. Upreti, An experimental approach for developing radio frequency identification (RFID) ready packaging, J. Cleaner Prod., 85 (2014), 371–381. https://doi.org/10.1016/j.jclepro.2014.08.105 doi: 10.1016/j.jclepro.2014.08.105 |
[8] | L. Cui, L. Wang, J. Deng, RFID technology investment evaluation model for the stochastic joint replenishment and delivery problem, Expert Syst. Appl., 41 (2014), 1792–1805. https://doi.org/10.1016/j.eswa.2013.08.078 doi: 10.1016/j.eswa.2013.08.078 |
[9] | L. Tarjan, I. Šenk, S. Tegeltija, S. Stankovski, G. Ostojic, A readability analysis for QR code application in a traceability system, Comput. Electron. Agric., 109 (2014), 1–11. https://doi.org/10.1016/j.compag.2014.08.015 doi: 10.1016/j.compag.2014.08.015 |
[10] | Y. Zhu, W. Min, S. Jiang, Attribute-guided feature learning for few-shot image recognition, IEEE Trans. Multimedia, 23 (2021), 1200–1209. https://doi.org/10.1109/TMM.2020.2993952 doi: 10.1109/TMM.2020.2993952 |
[11] | X. Zeng, W. Wu, G. Tian, F. Li, Y. Liu, Deep superpixel convolutional network for image recognition, IEEE Signal Process. Lett., 28 (2021), 922–926. https://doi.org/10.1109/LSP.2021.3075605 doi: 10.1109/LSP.2021.3075605 |
[12] | Y. K. Yi, Y. Zhang, J. Myung, House style recognition using deep convolutional neural network, Autom. Constr., 118 (2020), 103307. https://doi.org/10.1016/j.autcon.2020.103307 doi: 10.1016/j.autcon.2020.103307 |
[13] | O. C. Koyun, R. K. Keser, I. B. Akkaya, B. U. Töreyin, Focus-and-Detect: A small object detection framework for aerial images, Signal Process. Image Commun., 104 (2022), 116675. https://doi.org/10.1016/j.image.2022.116675 doi: 10.1016/j.image.2022.116675 |
[14] | S. Wang, M. Xu, Y. Sun, G. Jiang, Y. Weng, X. Liu, et al., Improved single shot detection using DenseNet for tiny target detection, Concurrency Comput.:Exper. Pract., 35 (2023), 7491. https://doi.org/10.1002/cpe.7491 doi: 10.1002/cpe.7491 |
[15] | X. Dong, Y. Qin, R. Fu, Y. Gao, S. Liu, Y. Ye, et al., Multiscale deformable attention and multilevel features aggregation for remote sensing object detection, IEEE Geosci. Remote Sens. Lett., 19 (2022), 1–5. https://doi.org/10.1109/LGRS.2022.3178479 doi: 10.1109/LGRS.2022.3178479 |
[16] | H. Cao, H. Liu, E. Song, C. Hung, G. Ma, X. Xu, et al., Dual-branch residual network for lung nodule segmentation, Appl. Soft Comput., 86 (2020), 105934. https://doi.org/10.1016/j.asoc.2019.105934 doi: 10.1016/j.asoc.2019.105934 |
[17] | H. Shi, G. Cao, Z. Ge, Y. Zhang, P. Fu, Double-branch network with pyramidal convolution and iterative attention for hyperspectral image classification, Remote Sens., 13 (2021), 1403. https://doi.org/10.3390/rs13071403 doi: 10.3390/rs13071403 |
[18] | J. Wang, Y. Cui, G. Shi, J. Zhao, X. Yang, Y. Qiang, et al., Multi-branch cross attention model for prediction of KRAS mutation in rectal cancer with t2-weighted MRI, Appl. Intell., 50 (2020), 2352–2369. https://doi.org/10.1007/s10489-020-01658-8 doi: 10.1007/s10489-020-01658-8 |
[19] | D. Zhang, M. Ye, Y. Liu, L. Xiong, L. Zhou, Multi-source unsupervised domain adaptation for object detection, Inf. Fusion, 78 (2022), 138–148. https://doi.org/10.1016/j.inffus.2021.09.011 doi: 10.1016/j.inffus.2021.09.011 |
[20] | J. Cao, Y. Pang, S. Zhao, X. Li, High-level semantic networks for multi-scale object detection, IEEE Trans. Circuits Syst. Video Technol., 30 (2020), 3372–3386. https://doi.org/10.1109/TCSVT.2019.2950526 doi: 10.1109/TCSVT.2019.2950526 |
[21] | H. Xie, X. Zeng, H. Lei, J. Du, J. Wang, G. Zhang, et al., Cross-attention multi-branch network for fundus diseases classification using SLO images, Med. Image Anal., 71 (2021), 102031. https://doi.org/10.1016/j.media.2021.102031 doi: 10.1016/j.media.2021.102031 |
[22] | Q. Xu, C. Yang, J. Tang, B. Luo, Grouped bidirectional LSTM network and multistage fusion convolutional transformer for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens., 60 (2022), 1–14. https://doi.org/10.1109/TGRS.2022.3207294 doi: 10.1109/TGRS.2022.3207294 |
[23] | Y. Zhang, X. Liu, S. Wa, S. Chen, Q. Ma, GANsformer: A detection network for aerial images with high performance combining convolutional network and transformer, Remote Sens., 14 (2022), 923. https://doi.org/10.3390/rs14040923 doi: 10.3390/rs14040923 |
[24] | C. Yu, Y. Liu, C. Li, L. Qi, X. Xia, T. Liu, et al., Multibranch feature difference learning network for cross-spectral image patch matching, IEEE Trans. Geosci. Remote Sens., 60 (2022), 1–15. https://doi.org/10.1109/TGRS.2022.3176358 doi: 10.1109/TGRS.2022.3176358 |
[25] | B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2921–2929. https://doi.org/10.1109/CVPR.2016.319 |
[26] | M. Meng, T. Zhang, W. Yang, J. Zhao, Y. Zhang, F. Wu, Diverse complementary part mining for weakly supervised object localization, IEEE Trans. Image Process., 31 (2022), 1774–1788. https://doi.org/10.1109/TIP.2022.3145238 doi: 10.1109/TIP.2022.3145238 |
[27] | F. Shao, L. Chen, J. Shao, W. Ji, S. Xiao, L. Ye, et al., Deep learning for weakly-supervised object detection and localization: A survey, Neurocomputing, 496 (2022), 192–207. https://doi.org/10.1016/j.neucom.2022.01.095 doi: 10.1016/j.neucom.2022.01.095 |
[28] | Y. Wang, J. Zhang, M. Kan, S. Shan, X. Chen, Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation, in 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 12272–12281. https://doi.org/10.1109/CVPR42600.2020.01229 |
[29] | J. Choe, D. Han, S. Yun, J. Ha, S. J. Oh, H. Shim, Region-based dropout with attention prior for weakly supervised object localization, Pattern Recognit., 116 (2021), 107949. https://doi.org/10.1016/j.patcog.2021.107949 doi: 10.1016/j.patcog.2021.107949 |
[30] | B. Wang, C. Yuan, B. Li, X. Ding, Z. Li, Y. Wu, et al., Multi-scale low-discriminative feature reactivation for weakly supervised object localization, IEEE Trans. Image Process., 30 (2021), 6050–6065. https://doi.org/10.1109/TIP.2021.3091833 doi: 10.1109/TIP.2021.3091833 |
[31] | Z. Ling, L. Li, A. Zhang, RSMNet: A regional similar module network for weakly supervised object localization, Neural Process. Lett., 54 (2022), 5079–5097. https://doi.org/10.1007/s11063-022-10849-y doi: 10.1007/s11063-022-10849-y |
[32] | D. Hwang, J. Ha, H. Shim, J. Choe, Entropy regularization for weakly supervised object localization, Pattern Recognit. Lett., 169 (2023), 1–7. https://doi.org/10.1016/j.patrec.2023.03.018 doi: 10.1016/j.patrec.2023.03.018 |
[33] | L. Zhang, H. Yang, Adaptive attention augmentor for weakly supervised object localization, Neurocomputing, 454 (2021), 474–482. https://doi.org/10.1016/j.neucom.2021.05.024 doi: 10.1016/j.neucom.2021.05.024 |
[34] | W. Gao, F. Wan, X. Pan, Z. Peng, Q. Tian, Z. Han, et al., Ts-cam: Token semantic coupled attention map for weakly supervised object localization, in 2021 IEEE International Conference on Computer Vision (ICCV), (2021), 2866–2875. https://doi.org/10.1109/ICCV48922.2021.00288 |
[35] | S. Wang, Z. Wang, H. Li, J. Chang, W. Ouyang, Q. Tian, Semantic-guided information alignment network for fine-grained image recognition, in IEEE Trans. Circuits Syst. Video Technol., (2023). https://doi.org/10.1109/TCSVT.2023.3263870 |
[36] | R. Du, D. Chang, A. K. Bhunia, J. Xie, Z. Ma, Y. Song, et al., Fine-grained visual classification via progressive multi-granularity training of jigsaw patches, in European Conference on Computer Vision, 12365 (2020). https://doi.org/10.1007/978-3-030-58565-5_10 |
[37] | S. Wang, J. Chang, H. Li, Z. Wang, W. Ouyang, Q. Tian, Open-set fine-grained retrieval via prompting vision-language evaluator, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 19381–19391. |
[38] | S. Wang, J. Chang, Z. Wang, H. Li, W. Ouyang, Q. Tian, Fine-grained retrieval prompt tuning, in AAAI Conference on Artificial Intelligence, 37 (2023), 2644–2652. https://doi.org/10.1609/aaai.v37i2.25363 |
[39] | H. Ebbinghaus, Memory: a contribution to experimental psychology, Ann. Neurosci., 20 (2013), 155. https://doi.org/10.5214/ans.0972.7531.200408 doi: 10.5214/ans.0972.7531.200408 |
[40] | F. Meyer, Color image segmentation, in 1992 International Conference on Image Processing and its Applications, (1992), 303–306. |
[41] | T. Chen, W. Wu, Y. Gao, L. Dong, X. Luo, L. Lin, Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding, in 26th ACM international conference on Multimedia, (2018), 2023–2031. https://doi.org/10.1145/3240508.3240523 |
[42] | K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. |
[43] | H. Zhang, C. Wu, Z. Zhang, Y. Zhu, H. Lin, Z. Zhang, et al., ResNeSt: Split-attention networks, in 2022 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2022), 2735–2745. https://doi.org/10.1109/CVPRW56347.2022.00309 |
[44] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16 $\times$ 16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
[45] | Z. Dai, H. Liu, Q. V. Le, M. Tan, Coatnet: Marrying convolution and attention for all data sizes, preprint, arXiv: 2106.04803. |