An ultra-lightweight detector with high accuracy and speed for aerial images

Lei Yang; Guowu Yuan; Hao Wu; Wenhua Qian; Lei Yang; Guowu Yuan; Hao Wu; Wenhua Qian

doi:10.3934/mbe.2023621

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 8: 13947-13973. doi: 10.3934/mbe.2023621

Previous Article Next Article

Research article Special Issues

An ultra-lightweight detector with high accuracy and speed for aerial images

1.
School of Information Science and Engineering, Yunnan University, Kunming 650504, Yunnan, China
2.
Yunnan Key Laboratory of Intelligent Systems and Computing, Kunming 650504, Yunnan, China

Academic Editor: Jorge Bernardino

Received: 07 March 2023 Revised: 17 May 2023 Accepted: 05 June 2023 Published: 20 June 2023

Aerial remote sensing images have complex backgrounds and numerous small targets compared to natural images, so detecting targets in aerial images is more difficult. Resource exploration and urban construction planning need to detect targets quickly and accurately in aerial images. High accuracy is undoubtedly the advantage for detection models in target detection. However, high accuracy often means more complex models with larger computational and parametric quantities. Lightweight models are fast to detect, but detection accuracy is much lower than conventional models. It is challenging to balance the accuracy and speed of the model in remote sensing image detection. In this paper, we proposed a new YOLO model. We incorporated the structures of YOLOX-Nano and slim-neck, then used the SPPF module and SIoU function. In addition, we designed a new upsampling paradigm that combined linear interpolation and attention mechanism, which can effectively improve the model's accuracy. Compared with the original YOLOX-Nano, our model had better accuracy and speed balance while maintaining the model's lightweight. The experimental results showed that our model achieved high accuracy and speed on NWPU VHR-10, RSOD, TGRS-HRRSD and DOTA datasets.

Keywords:

Citation: Lei Yang, Guowu Yuan, Hao Wu, Wenhua Qian. An ultra-lightweight detector with high accuracy and speed for aerial images[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 13947-13973. doi: 10.3934/mbe.2023621

Related Papers:

[1]	Yang Pan, Jinhua Yang, Lei Zhu, Lina Yao, Bo Zhang . Aerial images object detection method based on cross-scale multi-feature fusion. Mathematical Biosciences and Engineering, 2023, 20(9): 16148-16168. doi: 10.3934/mbe.2023721
[2]	Vishal Pandey, Khushboo Anand, Anmol Kalra, Anmol Gupta, Partha Pratim Roy, Byung-Gyu Kim . Enhancing object detection in aerial images. Mathematical Biosciences and Engineering, 2022, 19(8): 7920-7932. doi: 10.3934/mbe.2022370
[3]	Auwalu Saleh Mubarak, Zubaida Said Ameen, Fadi Al-Turjman . Effect of Gaussian filtered images on Mask RCNN in detection and segmentation of potholes in smart cities. Mathematical Biosciences and Engineering, 2023, 20(1): 283-295. doi: 10.3934/mbe.2023013
[4]	Xin Shu, Xin Cheng, Shubin Xu, Yunfang Chen, Tinghuai Ma, Wei Zhang . How to construct low-altitude aerial image datasets for deep learning. Mathematical Biosciences and Engineering, 2021, 18(2): 986-999. doi: 10.3934/mbe.2021053
[5]	Qihan Feng, Xinzheng Xu, Zhixiao Wang . Deep learning-based small object detection: A survey. Mathematical Biosciences and Engineering, 2023, 20(4): 6551-6590. doi: 10.3934/mbe.2023282
[6]	Yantao Song, Wenjie Zhang, Yue Zhang . A novel lightweight deep learning approach for simultaneous optic cup and optic disc segmentation in glaucoma detection. Mathematical Biosciences and Engineering, 2024, 21(4): 5092-5117. doi: 10.3934/mbe.2024225
[7]	Hongxia Ni, Minzhen Wang, Liying Zhao . An improved Faster R-CNN for defect recognition of key components of transmission line. Mathematical Biosciences and Engineering, 2021, 18(4): 4679-4695. doi: 10.3934/mbe.2021237
[8]	Zhigao Zeng, Cheng Huang, Wenqiu Zhu, Zhiqiang Wen, Xinpan Yuan . Flower image classification based on an improved lightweight neural network with multi-scale feature fusion and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20(8): 13900-13920. doi: 10.3934/mbe.2023619
[9]	Zhiqin Zhu, Shaowen Wang, Shuangshuang Gu, Yuanyuan Li, Jiahan Li, Linhong Shuai, Guanqiu Qi . Driver distraction detection based on lightweight networks and tiny object detection. Mathematical Biosciences and Engineering, 2023, 20(10): 18248-18266. doi: 10.3934/mbe.2023811
[10]	Xiaotang Liu, Zheng Xing, Huanai Liu, Hongxing Peng, Huiming Xu, Jingqi Yuan, Zhiyu Gou . Combination of UAV and Raspberry Pi 4B: Airspace detection of red imported fire ant nests using an improved YOLOv4 model. Mathematical Biosciences and Engineering, 2022, 19(12): 13582-13606. doi: 10.3934/mbe.2022634

Abstract

1. Introduction

Along with the rapid development of aerial photography technology, aerial remote sensing data are becoming increasingly diversified. Data acquisition speed is accelerating, the update cycle is shortening and the timeliness is becoming stronger. Therefore, automatic target detection technology in aerial images came into being. The technology is widely used in the fields of urban traffic planning ^[1], water conservancy construction ^[2], earth resource exploration ^[3] and military information processing ^[4]. Humans can already use UAVs to take numerous aerial images, or even to enable real-time monitoring of target areas. The UAVs need to carry a lightweight model with the highest possible precision to achieve real-time detection of ground targets.

Currently, the mainstream target detection models include two major categories: the two-stage algorithms represented by the R-CNN series ^[5,6,7], and the one-stage algorithms represented by the YOLO series ^{[8,9,10,11,12,13]}. The two-stage algorithms first extract the candidate boxes for the input images and then classify and regress the candidate boxes with high accuracy but low speed. The one-stage algorithms directly calculate the class probabilities and position coordinates of the targets in the input images. They are faster than the two-stage algorithms but not as accurate. In recent years, one-stage models have made breakthroughs. With the proposal of excellent models, such as YOLOv4 ^[11], YOLOv5 ^[12] and YOLOX ^[13], it has been found that one-stage models can already match the detection accuracy of two-stage models while maintaining their high-speed characteristics. These models work well on ordinary images but do not perform well on aerial images. Y. Li et al. ^[14] believed that the detection accuracy of ordinary models is not high because of the special perspective of aerial images. Aerial remote sensing images are mostly taken at high altitudes at a top angle. Compared to conventional images captured horizontally, the targets' size is small, the extractable features are few and the background is complex.

As a result, many researchers have proposed various improved models based on deep learning for aerial image detection. For example, A. V. Etten ^[15] proposed the YOLT model by improving YOLOv2, which enhanced the detection performance for small targets by connecting features of multiple layers to obtain a more fine-grained feature representation through a ResNet-like residual structure. M. Ahmed et al. ^[16] proposed the Fused RetinaNet model, which uses a new contextual fusion module instead of a feature pyramid network to improve the representation of the underlying semantic and top-level spatial information. H. Liu et al. ^[17] introduced a hybrid attention module and variable convolution in the C-CenterNet model to enhance the feature extraction and fusion of the model. S. Du et al. ^[18] proposed an improved YOLO model using a negative sample focusing mechanism and an inflated convolutional attention module to improve the detection accuracy of the model for small targets.

Although the above models have achieved high detection accuracy, they are not lightweight and their detection speed is slower than those using lightweight structures. The commonly used lightweight structures, including MobileNet ^[19,20,21] and ShuffleNet ^[22,23], can achieve high detection speed but low accuracy. However, some remote sensing applications, such as the real-time detection of UAVs, need a high accuracy and speed model. Balancing the speed and accuracy of the detection model is the key problem in these applications, and also the focus of this paper. This work uses YOLOX-Nano ^[13] as the basic model. On the premise of ensuring the lightweight of the model, some deep learning methods are used to improve the detection accuracy and speed of the model as much as possible. Four aerial image datasets (NWPU VHR-10, RSOD, TGRS-HRRSD and DOTA) are tested to verify the generalization ability of our improved methods.

The main contributions of this paper are as follows:

1) This paper proposes a new lightweight model for remote sensing image detection, which has high precision and speed.

2) This paper incorporates the YOLOX-Nano model and slim-neck structure to reconstruct the network, which is very effective for balancing the speed and accuracy of the detection.

3) This paper proposes a new upsampling paradigm that combines linear interpolation and attention mechanisms. This paradigm can effectively improve detection accuracy.

The rest of this paper is organized as follows: Section 2 introduces the current popular lightweight and remote sensing detection models, and then introduces the YOLOX-Nano model that inspired this article. Section 3 focuses on our methods to improve YOLOX-Nano. Section 4 presents the experimental results of our methods and our analysis. Section 5 concludes our work.

2. Related work

2.1. Lightweight models in target detection

To achieve real-time detection in some specific situations, researchers have proposed some lighter models, such as the MobileNet family built with depthwise separable convolution (DWC) ^[19,20,21] and the ShuffleNet family made with grouped point-by-point convolution ^[22,23]. Some researchers have used these lightweight structures for detection, such as the YOLOX-Nano model proposed by Z. Ge et al., which uses the DWC in MobileNet to replace the regular convolution ^[13]. RangiLyu used ShuffleNet as the backbone network to build the NanoDet model, greatly reducing the number of parameters and the computation of the model ^[24].

Some conventional target detection models, such as YOLOv3, YOLOv4, etc., also have their lightweight versions, e.g., YOLOv3-Tiny and YOLOv4-Tiny. These lightweight models are obtained by simplifying conventional models' infrastructure and reducing the original models' computational effort. YOLOv3-Tiny does not use residual structures, employs only a few conventional convolutional structures in the backbone network, significantly reducing the depth and uses only two feature layers in the neck network for classification and regression prediction. The backbone network of YOLOv4-Tiny uses the CSPNet ^[25] structure, and the neck network uses only two feature layers as outputs. Compared to YOLOv3 and YOLOv4, these models are very lightweight and are widely used in industry, but there is no doubt that this simplification significantly reduces the accuracy of the models.

2.2. Target detection in remote sensing images

Aerial remote sensing images are large and complex in the background, so they contain many small targets that are difficult to detect. After thoroughly studying the characteristics of remote sensing images, previous researchers have proposed a series of solutions.

Some researchers have tried to introduce attention mechanisms and feature fusion algorithms to extract the most valuable features possible, improving the accuracy of small-target detection. For example, X. Luo et al. ^[26] added the improved efficient channel attention module (IECA) and the adaptive feature fusion algorithm (ASFF) to the YOLOv4 model, greatly improving the detection accuracy. D. Yan et al. ^[27] introduced the SE attention mechanism and feature pyramid structure (FPN) into the Faster R-CNN model to achieve high-accuracy detection of tailing pools in remotely sensed images. Some researchers have addressed the challenge of small-target detection from the remote sensing images' pre-processing and post-processing stages. For example, F. C. Akyon et al. ^[28] proposed the Slicing-Aided Hyper Inference (SAHI) framework, in which a large image is cut with overlap before it is detected. Then, each slice is fed to the detector one by one. Finally, the detection results are combined into a large complete image in the post-processing stage. L. Yang et al. ^[29] introduced the SAHI framework into their improved YOLOX model to achieve high-precision automatic detection of small objects in large remote sensing images.

All of the above researchers have made efforts and achieved good results in improving the accuracy of neural network models for remote sensing image detection. However, the methods they have adopted have increased the number of parameters and the computational effort of the original model. Therefore, they are not conducive to achieving real-time detection of aerial targets. Some researchers have applied lightweight models to remote sensing images to improve the model's speed. For example, J. Liu et al. ^[30] used a modified YOLOv4-Tiny to achieve real-time detection of insulator identification and defects in aerial images, while X. Li et al. ^[31] used MobileNet to replace the backbone network of the YOLO model, significantly reducing the parameters and computation and achieving fast detection of remotely sensed images.

Although the lightweight models have greatly improved the detection speed and achieved real-time detection, they also show a significant decrease in detection accuracy. How to balance the accuracy and speed of detection is the focus of our research. In this paper, we use YOLOX-Nano as the base model for improvement, and the following section provides a detailed description of our methodology.

2.3. About YOLOX-Nano

YOLOX is a new YOLO model proposed by Z. Ge et al. in 2021 ^[13], surpassing all previous YOLO versions in detection performance. YOLOX has made many improvements based on the previous YOLO version, improving detection accuracy and speed. YOLOX-Nano is a lightweight version of YOLOX that uses depthwise separable convolution to build the network. Its model structure is shown in Figure 1, and the overall model contains three parts: backbone network, neck network and detection head. The backbone network is used to implement feature extraction, the neck network is used to implement feature fusion and the detection head is used to calculate the class and location coordinates of a target. YOLOX-Nano's backbone is CSPDarknet. It employs a cross-stage partial (CSP) structure ^[25]. The structure divides the input into two parts: the one processed by a bottleneck structure and the other used by a short-circuit connection. Then, the model splices the two parts. The structure reduces memory consumption and enhances CNN's learning ability. The neck network uses a PAFPN structure to achieve information fusion of low-level and high-level features. The detection head uses a decoupled detection head, which trains the parameters used for classification and regression separately and calculates each branch, helping to improve the accuracy of classification and localization.

Figure 1. Structure of the YOLOX-Nano model.

Methods	mAP₅₀ (%)	mAP_50-95 (%)	P (M)	Latency (ms)	FPS
YOLOX+SPP	86.16	52.55	0.90	19.77	50.58
YOLOX+SPPF	86.63	53.73	0.90	17.91	55.83
YOLOX+SimSPPF	84.41	52.61	0.90	17.55	56.98
YOLOX+ASPP	86.78	53.86	2.97	22.36	44.72
YOLOX+SPPCSPC	86.18	54.36	2.51	20.11	49.72

Methods	NWPU VHR-10		RSOD		TGRS-HRRSD		DOTA
Methods	mAP_50-95 (%)	FPS	mAP_50-95 (%)	FPS	mAP_50-95 (%)	FPS	mAP_50-95 (%)	FPS
YOLOX (SPP)	52.55	50.58	56.62	51.57	57.92	87.79	55.48	93.02
SPPF	53.73	55.83	57.39	55.46	58.79	89.77	57.01	95.37

SPPF	Slim-Neck	ECA	SIoU	mAP₅₀ (%)	mAP_50-95 (%)	P(M)	Latency (ms)	FPS
				86.16	52.55	0.90	19.77	50.58
√				86.63(+0.47)	53.73(+1.18)	0.90	17.91(-1.86)	55.83(+5.25)
√	√			87.82(+1.19)	54.41(+0.68)	1.06(+0.16)	19.32(+1.41)	51.75(-4.08)
√	√	√		88.21(+0.39)	55.12(+0.71)	1.06	19.79(+0.47)	50.53(-1.22)
√	√	√	√	88.47(+0.26)	55.68(+0.56)	1.06	18.92(-0.87)	52.85(+2.32)

Models	mAP₅₀ (%)	mAP_50-95 (%)	P (M)	Latency (ms)	FPS
YOLOv3-Tiny ^[10]	75.46	42.38	8.86	48.46	20.53
YOLOv4-Tiny ^[11]	80.27	47.65	6.06	42.98	23.27
YOLOv5n ^[12]	87.23	54.06	1.90	26.12	38.28
YOLO-Fastest ^[54]	65.18	31.79	0.35	10.28	97.27
FastestDet ^[55]	65.67	32.82	0.24	9.32	107.29
NanoDet ^[24]	80.12	46.23	0.95	14.69	68.07
YOLOX-Nano ^[13]	86.16	52.55	0.90	19.77	50.58
YOLOX-Nano++(ours)	88.47	55.68	1.06	18.92	52.85

[1]	M. Lu, Y. Xu, H. Li, Vehicle Re-Identification based on UAV viewpoint: dataset and method, Remote Sens., 14 (2022), 4630. https://doi.org/10.3390/rs14184603 doi: 10.3390/rs14184603
[2]	S. Ijlil, A. Essahlaoui, M. Mohajane, N. Essahlaoui, E. M. Mili, A. V. Rompaey, Machine learning algorithms for modeling and mapping of groundwater pollution risk: A study to reach water security and sustainable development (Sdg) goals in a editerranean aquifer system, Remote Sens., 14 (2022), 2379. https://doi.org/10.3390/rs14102379 doi: 10.3390/rs14102379
[3]	Z. Jiang, Z. Song, Y. Bai, X. He, S. Yu, S. Zhang, et al., Remote sensing of global sea surface pH based on massive underway data and machine mearning, Remote Sens., 14 (2022), 2366. https://doi.org/10.3390/rs14102366 doi: 10.3390/rs14102366
[4]	Y. Zhao, L. Ge, H. Xie, G. Bai, Z. Zhang, Q. Wei, et al., ASTF: Visual abstractions of time-varying patterns in radio signals, IEEE Trans. Visual Comput. Graphics, 29 (2023), 214–224. https://doi.org/10.1109/TVCG.2022.3209469 doi: 10.1109/TVCG.2022.3209469
[5]	R. Girshick, J. Donahue, T. Darrell J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81
[6]	R. Girshick, Fast R-CNN, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169
[7]	S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031
[8]	J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779–788. https://doi.org/10.1109/CVPR.2016.91
[9]	J. Redmon, A. Farhadi, YOLO9000: Better, Faster, Stronger, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6517–6525. https://doi.org/10.1109/CVPR.2017.690
[10]	J. Redmon, A. Farhadi, YOLOv3: an incremental improvement, arXiv preprint, (2018), arXiv: 1804.02767. http://arXiv.org/abs/1804.02767
[11]	A. Bochkovskiy, C. Y. Wang, H. Liao, YOLOv4: optimal speed and accuracy of object detection, arXiv preprint, (2020), arXiv: 2004.10934. http://arXiv.org/abs/2004.10934
[12]	G. Jocher, Yolov5, 2020. Available from: https://github.com/ultralytics/yolov5.
[13]	Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, YOLOX: Exceeding YOLO series in 2021, arXiv preprint, (2021), arXiv: 2107.08430. https://arXiv.org/abs/2107.08430
[14]	Y. Li, X. Liu, H. Zhang, X. Li, X. Sun, Optical remote sensing image retrieval based on convolutional neural networks (in Chinese), Opt. Precis. Eng., 26 (2018), 200–207. https://doi.org/10.3788/ope.20182601.0200 doi: 10.3788/ope.20182601.0200
[15]	A. Van Etten, You only look twice: Rapid multi-scale object detection in satellite imagery, arXiv preprint, (2018), arXiv: 1805.09512. https://doi.org/10.48550/arXiv.1805.09512
[16]	M. Ahmed, Y. Wang, A. Maher, X. Bai, Fused RetinaNet for small target detection in aerial images, Int. J. Remote Sens., 43 (2022), 2813–2836. https://doi.org/10.1080/01431161.2022.2071115 doi: 10.1080/01431161.2022.2071115
[17]	H. Liu, G. Yuan, L. Yang, K. Liu, H. Zhou, An appearance defect detection method for cigarettes based on C‐CenterNet, Electronics, 11 (2022), 2182. https://doi.org/10.3390/electronics11142182 doi: 10.3390/electronics11142182
[18]	S. Du, B. Zhang, P. Zhang, P. Xiang, H. Xue, FA-YOLO: An improved YOLO model for infrared occlusion object detection under confusing background, Wireless Commun. Mobile Comput., 2021 (2021). https://doi.org/10.1155/2021/1896029 doi: 10.1155/2021/1896029
[19]	A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint, (2017), arXiv: 1704.04861. https://doi.org/10.48550/arXiv.1704.04861
[20]	M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
[21]	A. Howard, M. Sandler, B. Chen, W. Wang, L. C. Chen, M. Tan, et al., Searching for mobileNetV3, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
[22]	X. Zhang, X. Zhou, M. Lin, J. Sun, ShuffleNet: An extremely efficient convolutional neural network for mobile devices, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 6848–6856.
[23]	N. Ma, X. Zhang, H. T. Zheng, J. Sun, ShuffleNet V2: Practical guidelines for efficient CNN architecture design, in European Conference on Computer Vision (ECCV), (2018), 122–138. https://doi.org/10.1109/CVPR.2018.00716
[24]	RangiLyu, NanoDet-Plus: Super fast and high accuracy lightweight anchor-free object detection model, 2021. Available from: https://github.com/RangiLyu/nanodet.
[25]	C. Y. Wang, H. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, I. H. Yeh, CSPNet: A bew backbone that can enhance learning capability of CNN, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203
[26]	X. Luo, Y. Wu, L. Zhao, YOLOD: A target detection method for UAV aerial imagery, Remote Sens., 14 (2022), 3240. https://doi.org/10.3390/rs14143240 doi: 10.3390/rs14143240
[27]	D. Yan, G. Li, X. Li, H. Zhang, H. Lei, K. Lu, et al., An improved faster R-CNN method to detect tailings ponds from high-resolution remote sensing images, Remote Sens. 13 (2021), 2052. https://doi.org/10.3390/rs13112052 doi: 10.3390/rs13112052
[28]	F. C. Akyon, S. O. Altinuc, A. Temizel, Slicing aided hyper inference and fine-tuning for small object detection, in 2022 IEEE International Conference on Image Processing (ICIP), (2022), 966–970. https://doi.org/10.1109/ICIP46576.2022.9897990
[29]	L. Yang, G. Yuan, H. Zhou, H. Liu, J. Chen, H. Wu, RS-YOLOX: A high-precision detector for object detection in satellite remote sensing images, Appli. Sci., 12 (2022), 8707. https://doi.org/10.3390/app12178707 doi: 10.3390/app12178707
[30]	J. Liu, C. Liu, Y. Wu, Z. Sun, H. Xu, Insulators' identification and missing defect detection in aerial images based on cascaded YOLO models, Comput. Intell. Neurosci., 2022 (2022). https://doi.org/10.1155/2022/7113765 doi: 10.1155/2022/7113765
[31]	X. Li, Y. Qin, F. Wang, F. Guo, J. T. W. Yeow, Pitaya detection in orchards using the MobileNet-YOLO model, in 2020 39th Chinese Control Conference (CCC), (2020), 6274–6278. https://doi.org/10.23919/CCC50068.2020.9189186
[32]	Z. Tian, C. Shen, H. Chen, T. He, FCOS: Fully convolutional one-stage object detection, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
[33]	H. Law, J. Deng, CornerNet: Detecting objects as paired keypoints, Int. J. Comput. Vision, 128 (2020), 642–656. https://doi.org/10.1007/s11263-019-01204-1 doi: 10.1007/s11263-019-01204-1
[34]	G. Song, Y. Liu, X. Wang, Revisiting the sibling head in object detector, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11563–11572.
[35]	K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824 doi: 10.1109/TPAMI.2015.2389824
[36]	L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., 40 (2018), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184 doi: 10.1109/TPAMI.2017.2699184
[37]	C. Y. Wang, A. Bochkovskiy, H. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 7464–7475.
[38]	H. Li, J. Li, H. Wei, Z. Liu, Z. Zhan, Q. Ren, Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles, arXiv preprint, (2022), arXiv: 2206.02424. https://doi.org/10.48550/arXiv.2206.02424
[39]	V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, arXiv preprint, (2018), arXiv: 1603.07285. https://doi.org/10.48550/arXiv.1603.07285
[40]	F. Yu, V. Koltun, T. Funkhouser, Dilated residual networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 636–644. https://doi.org/10.1109/CVPR.2017.75
[41]	Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11531–11539. https://doi.org/10.1109/CVPR42600.2020.01155
[42]	B. Jiang, R. Luo, J. Mao, T. Xiao, Y. Jiang, Acquisition of localization confidence for accurate object detection, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 784–799.
[43]	J. He, S. Erfani, X. Ma, J. Bailey, Y. Chi, X. S. Hua, Alpha-IoU: A family of power intersection over union losses for bounding box regression, in NeurIPS 2021 Conference, 2021.
[44]	H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A metric and a loss for bounding box regression, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 658–666. https://doi.org/10.1109/CVPR.2019.00075
[45]	Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better learning for bounding box regression, in Proceedings of the AAAI Conference on Artificial Intelligence, 2020. https://doi.org/10.1609/aaai.v34i07.6999
[46]	Z. Gevorgyan, SIoU loss: More powerful learning for bounding box regression, arXiv preprint, (2022), arXiv: 2205.12740. https://doi.org/10.48550/arXiv.2205.12740
[47]	G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Trans. Geosci. Remote Sens., 54 (2016), 7405–7415. https://doi.org/10.1109/TGRS.2016.2601622 doi: 10.1109/TGRS.2016.2601622
[48]	Y. Long, Y. Gong, Z. Xiao, Q. Liu, Accurate object localization in remote sensing images based on convolutional neural networks, IEEE Trans. Geosci. Remote Sens., 55 (2017), 2486–2498. https://doi.org/10.1109/TGRS.2016.2645610 doi: 10.1109/TGRS.2016.2645610
[49]	X. Lu, Y. Zhang, Y. Yuan, Y. Feng, Gated and axis-concentrated localization network for remote sensing object detection, IEEE Trans. Geosci. Remote Sens., 58 (2020), 179–192. https://doi.org/10.1109/TGRS.2019.2935177 doi: 10.1109/TGRS.2019.2935177
[50]	L. Yang, R. Y. Zhang, L. Li, X. Xie, SimAM: A simple, parameter-free attention module for convolutional neural networks, in Proceedings of the 38th International Conference on Machine Learning, 139 (2021), 11863–11874.
[51]	Z. Zhong, Z. Q. Lin, R. Bidart, X. Hu, I. B. Daya, Z. Li, et al., Squeeze-and-attention networks for semantic segmentatio, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 13065–13074.
[52]	R. Saini, N. K. Jha, B. Das, S. Mittal, C. K. Mohan, ULSAM: Ultra-lightweight subspace attention module for compact convolutional neural networks, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2020), 1616–1625. https://doi.org/10.1109/WACV45572.2020.9093341
[53]	Y. Liu, Z. Shao, Y. Teng, N. Hoffmann, NAM: Normalization-based attention module, arXiv preprint, (2021), arXiv: 2111.12419. https://doi.org/10.48550/arXiv.2111.12419
[54]	X. Ma, Yolo-Fastest: yolo-fastest-v1.1.0, 2021. Available from: https://github.com/dog-qiuqiu/Yolo-Fastest.
[55]	X. Ma, FastestDet: Ultra lightweight anchor-free real-time object detection algorithm, 2022. Available from: https://github.com/dog-qiuqiu/FastestDet.
[56]	X. Yang, J. Yan, Z. Feng, T. He, R3Det: Refined single-stage detector with feature refinement for rotating object, in Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 3163–3173. https://doi.org/10.1609/aaai.v35i4.16426
[57]	J. Han, J. Ding, J. Li, G. S. Xia, Align deep features for oriented object detection, IEEE Trans. Geosci. Remote Sens., 60 (2022), 1–11. https://doi.org/10.1109/TGRS.2021.3062048 doi: 10.1109/TGRS.2021.3062048
[58]	X. Xie, G. Cheng, J. Wang, X. Yao, J. Han, Oriented R-CNN for object detection, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 3500–3509. https://doi.org/10.1109/ICCV48922.2021.00350
[59]	J. Ding, N. Xue, Y. Long, G. S. Xia, Q. Lu, Learning RoI transformer for oriented object detection in aerial images, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 2844–2853. https://doi.org/10.1109/CVPR.2019.00296
[60]	S. Zhong, H. Zhou, Z. Ma, F. Zhang, J. Duan, Multiscale contrast enhancement method for small infrared target detection, Optik, 271 (2022), 170134. https://doi.org/10.1016/j.ijleo.2022.170134 doi: 10.1016/j.ijleo.2022.170134
[61]	S. Zhong, H. Zhou, X. Cui, X. Cao, F. Zhang, J. Duan, Infrared small target detection based on local-image construction and maximum correntropy, Measurement, 211 (2023), 112662. https://doi.org/10.1016/j.measurement.2023.112662 doi: 10.1016/j.measurement.2023.112662

1.	Zilin Xia, Jinan Gu, Wenbo Wang, Zedong Huang, Research on a lightweight electronic component detection method based on knowledge distillation, 2023, 20, 1551-0018, 20971, 10.3934/mbe.2023928
2.	Hanyi Shi, Ningzhi Wang, Xinyao Xu, Yue Qian, Lingbin Zeng, Yi Zhu, HeMoDU: High-Efficiency Multi-Object Detection Algorithm for Unmanned Aerial Vehicles on Urban Roads, 2024, 24, 1424-8220, 4045, 10.3390/s24134045
3.	Bao Liu, Wenqiang Jiang, RS-YOLOx: target feature enhancement and bounding box auxiliary regression based object detection approach for remote sensing, 2024, 18, 1931-3195, 10.1117/1.JRS.18.016514

Methods	mAP₅₀ (%)	mAP_50-95 (%)	P(M)	Latency (ms)	FPS
Interpolation	86.16	52.55	0.90	19.77	50.58
+SimAM ^[50]	86.11	53.05	0.90	20.26	49.35
+SA ^[51]	87.02	52.87	0.90	21.18	47.21
+ULSAM ^[52]	85.28	52.14	0.90	24.17	41.37
+NAM ^[53]	87.26	53.71	0.90	20.87	47.91
+ECA ^[41]	87.35	54.69	0.90	20.34	49.16
Note: The dataset was NWPU VHR-10, and the interpolation in the table represents the nearest neighbor interpolation algorithm.

Methods	mAP₅₀ (%)	mAP_50-95 (%)	P (M)	Latency (ms)	FPS
IoU	86.16	52.55	0.90	19.77	50.58
GIoU	86.36	52.67	0.90	19.86	50.35
DIoU	86.22	54.08	0.90	20.03	49.92
CIoU	85.05	53.30	0.90	20.67	48.37
SIoU	87.71	54.14	0.90	18.01	55.52

Models	mAP₅₀ (%)	mAP_50-95 (%)	P (M)	Latency (ms)	FPS
Improved Faster R-CNN ^[27]	81.64	48.29	60.42	273.24	3.66
YOLOv4 ^[11]	86.57	53.29	64.02	26.54	37.68
YOLOv5m ^[12]	92.61	61.02	21.20	34.76	28.77
YOLOD ^[26]	88.13	56.69	70.12	30.19	33.12
RS-YOLOX ^[29]	90.21	58.34	14.38	27.83	35.93
YOLOX-Nano++(ours)	88.47	55.68	1.06	18.92	52.85

Methods	mAP₅₀ (%)	mAP_50-95 (%)	P(M)	Latency (ms)	FPS
Interpolation	86.16	52.55	0.90	19.77	50.58
+SimAM ^[50]	86.11	53.05	0.90	20.26	49.35
+SA ^[51]	87.02	52.87	0.90	21.18	47.21
+ULSAM ^[52]	85.28	52.14	0.90	24.17	41.37
+NAM ^[53]	87.26	53.71	0.90	20.87	47.91
+ECA ^[41]	87.35	54.69	0.90	20.34	49.16
Note: The dataset was NWPU VHR-10, and the interpolation in the table represents the nearest neighbor interpolation algorithm.

Mathematical Biosciences and Engineering

An ultra-lightweight detector with high accuracy and speed for aerial images

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. Lightweight models in target detection

2.2. Target detection in remote sensing images

2.3. About YOLOX-Nano

3. Methods

3.1. Our improved YOLOX-Nano model

3.2. Improvement in the backbone

3.3. Improvements in the neck

3.3.1. Slim-neck built by GSConv

3.3.2. A new upsampling paradigm

3.4. Improvement in the head

4. Experimentation and discussion

4.1. Datasets

4.2. Evaluation indicators

4.3. Experimental environment

4.4. Effects of spatial pyramid pooling

4.5. Effects of slim-neck formed by GSConv

4.6. Effects of linear interpolation+attention mechanism

4.7. Study of the localization loss function

4.8. Ablation experiments and comparison experiments

4.9. Display of object detection results

4.10. Discussion

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

2. Related work

2.1. Lightweight models in target detection

2.2. Target detection in remote sensing images

2.3. About YOLOX-Nano

3. Methods

3.1. Our improved YOLOX-Nano model

3.2. Improvement in the backbone

3.3. Improvements in the neck

3.3.1. Slim-neck built by GSConv

3.3.2. A new upsampling paradigm

3.4. Improvement in the head

4. Experimentation and discussion

4.1. Datasets

4.2. Evaluation indicators

4.3. Experimental environment

4.4. Effects of spatial pyramid pooling

4.5. Effects of slim-neck formed by GSConv

4.6. Effects of linear interpolation+attention mechanism

4.7. Study of the localization loss function

4.8. Ablation experiments and comparison experiments

4.9. Display of object detection results

4.10. Discussion

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References