Object detection in drone-captured scenarios is a recent popular task. Due to the high flight altitude of unmanned aerial vehicle (UAV), the large variation of target scales, and the existence of dense occlusion of targets, in addition to the high requirements for real-time detection. To solve the above problems, we propose a real-time UAV small target detection algorithm based on improved ASFF-YOLOv5s. Based on the original YOLOv5s algorithm, the new shallow feature map is passed into the feature fusion network through multi-scale feature fusion to improve the extraction capability for small target features, and the Adaptively Spatial Feature Fusion (ASFF) is improved to improve the multi-scale information fusion capability. To obtain anchor frames for the VisDrone2021 dataset, we improve the K-means algorithm to obtain four different scales of anchor frames on each prediction layer. The Convolutional Block Attention Module (CBAM) is added in front of the backbone network and each prediction network layer to improve the capture capability of important features and suppress redundant features. Finally, to address the shortcomings of the original GIoU loss function, the SIoU loss function is used to accelerate the convergence of the model and improve accuracy. Extensive experiments conducted on the dataset VisDrone2021 show that the proposed model can detect a wide range of small targets in various challenging environments. At a detection rate of 70.4 FPS, the proposed model obtained a precision value of 32.55%, F1-score of 39.62%, and a mAP value of 38.03%, which improved 2.77, 3.98, and 5.1%, respectively, compared with the original algorithm, for the detection performance of small targets and to meet the task of real-time detection of UAV aerial images. The current work provides an effective method for real-time detection of small targets in UAV aerial photography in complex scenes, and can be extended to detect pedestrians, cars, etc. in urban security surveillance.
Citation: Siyuan Shen, Xing Zhang, Wenjing Yan, Shuqian Xie, Bingjia Yu, Shizhi Wang. An improved UAV target detection algorithm based on ASFF-YOLOv5s[J]. Mathematical Biosciences and Engineering, 2023, 20(6): 10773-10789. doi: 10.3934/mbe.2023478
Object detection in drone-captured scenarios is a recent popular task. Due to the high flight altitude of unmanned aerial vehicle (UAV), the large variation of target scales, and the existence of dense occlusion of targets, in addition to the high requirements for real-time detection. To solve the above problems, we propose a real-time UAV small target detection algorithm based on improved ASFF-YOLOv5s. Based on the original YOLOv5s algorithm, the new shallow feature map is passed into the feature fusion network through multi-scale feature fusion to improve the extraction capability for small target features, and the Adaptively Spatial Feature Fusion (ASFF) is improved to improve the multi-scale information fusion capability. To obtain anchor frames for the VisDrone2021 dataset, we improve the K-means algorithm to obtain four different scales of anchor frames on each prediction layer. The Convolutional Block Attention Module (CBAM) is added in front of the backbone network and each prediction network layer to improve the capture capability of important features and suppress redundant features. Finally, to address the shortcomings of the original GIoU loss function, the SIoU loss function is used to accelerate the convergence of the model and improve accuracy. Extensive experiments conducted on the dataset VisDrone2021 show that the proposed model can detect a wide range of small targets in various challenging environments. At a detection rate of 70.4 FPS, the proposed model obtained a precision value of 32.55%, F1-score of 39.62%, and a mAP value of 38.03%, which improved 2.77, 3.98, and 5.1%, respectively, compared with the original algorithm, for the detection performance of small targets and to meet the task of real-time detection of UAV aerial images. The current work provides an effective method for real-time detection of small targets in UAV aerial photography in complex scenes, and can be extended to detect pedestrians, cars, etc. in urban security surveillance.
[1] | Y. Huang, H. Cui, J. Ma, Y. Hao, Research on an aerial object detection algorithm based on improved YOLOv5, in 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), (2022), 396–400. https://doi.org/10.1109/CVIDLICCEA56201.2022.9825196 |
[2] | M. Xu, X. Wang, S. Zhang, R. Wan, F. Zhao, Detection algorithm of aerial vehicle target based on improved YOLOv3, J. Phys., 2284 (2022), 012022. https://doi.org/10.1088/1742-6596/2284/1/012022 doi: 10.1088/1742-6596/2284/1/012022 |
[3] | P. Fang, Y. Shi, Small object detection using context information fusion in faster R-CNN, in 2018 IEEE 4th International Conference on Computer and Communications (ICCC), (2018), 1537–1540. https://doi.org/10.1109/CompComm.2018.8780579 |
[4] | H. Liu, F. Sun, J. Gu, L. J. S. Deng, Sf-yolov5: A lightweight small object detection algorithm based on improved feature fusion mode, Sensors, 22 (2022), 5817. https://doi.org/10.3390/s22155817 doi: 10.3390/s22155817 |
[5] | Y. Gong, X. Yu, Y. Ding, X. Peng, J. Zhao, Z. Han, Effective fusion factor in FPN for tiny object detection, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2021), 1160–1168. |
[6] | A. M. Roy, R. Bose, J. Bhaduri, A fast accurate fine-grain object detection model based on YOLOv4 deep neural network, Neural Comput. Appl., 34 (2022), 1–27. https://doi.org/10.1007/s00521-021-06651-x doi: 10.1007/s00521-021-06651-x |
[7] | A. M. Roy, J. Bhaduri, Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4, Comput. Electron. Agric., 193 (2022), 106694. https://doi.org/10.1016/j.compag.2022.106694 doi: 10.1016/j.compag.2022.106694 |
[8] | A. Bochkovskiy, C. Y. Wang, H. M. Liao, Yolov4: Optimal speed and accuracy of object detection, preprint, arXiv: 2004.10934. |
[9] | J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 779–788. |
[10] | J. Redmon, A. Farhadi, YOLO9000: better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 7263–7271. |
[11] | J. Redmon, A. Farhadi, Yolov3: An incremental improvement, preprint, arXiv: 1804.02767 |
[12] | M. Qiu, L. Huang, B. H. Tang, ASFF-YOLOv5: multielement detection method for road traffic in UAV images based on multiscale feature fusion, Remote Sens., 14 (2022), 3498. https://doi.org/10.3390/rs14143498 doi: 10.3390/rs14143498 |
[13] | S. Woo, J. Park, J. Y. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 3–19. |
[14] | T. Han, Research on small object detection algorithm based on feature enhancement module, J. Phys., 1757 (2021), 012032. https://doi.org/10.1088/1742-6596/1757/1/012032 doi: 10.1088/1742-6596/1757/1/012032 |
[15] | H. Yu, L. Yun, Z. Chen, F. Cheng, C. Zhang, Neuroscience, A small object detection algorithm based on modulated deformable convolution and large kernel convolution, Comput. Intell. Neurosci., 2023 (2023), 2506274. https://doi.org/10.1155/2023/2506274 doi: 10.1155/2023/2506274 |
[16] | Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better learning for bounding box regression, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999 |
[17] | Z. Gevorgyan, SIoU loss: More powerful learning for bounding box regression, preprint, arXiv: 2205.12740 |
[18] | S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems 28 (NIPS 2015), 28 (2015), 28. |
[19] | T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017), 2980–2988. |
[20] | K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017), 2961–2969. |
[21] | Z. Cai, N. Vasconcelos, Cascade r-cnn: Delving into high quality object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 6154–6162. |
[22] | W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, et al., Ssd: Single shot multibox detector, in Computer Vision–ECCV 2016, 9905 (2016), 21–37. https://doi.org/10.1007/978-3-319-46448-0_2 |