With the increasing prominence of autonomous vehicles in recent years, rapid and accurate environmental perception has become crucial for the operational safety and decision-making capabilities. To address the challenge of achieving an optimal balance between accuracy and real-time performance under in-vehicle computational constraints, this paper presented an efficient object detection algorithm for self-driving cars, which extracted the hierarchical cross-scale features based on a shifted-window attention mechanism. By integrating this improved feature representation with the more efficient feature fusion neck and detection head based on depth-wise separable convolution, the proposed approach significantly reduced model complexity and improves detection speed while maintaining near-identical detection accuracy. Experimental results demonstrate that this method simultaneously enhances processing speed and reduced model complexity while maintaining high detection precision, with floating-point operations reduced from 21.5 G to 6.0 G, a decrease of 15.5 G and an increase of 139 frames per second compared to YOLOv11s. This combination of efficiency and accuracy made the proposed algorithm particularly adaptable for resource-constraint self-driving systems.
Citation: Jingwen Qi, Jian Wang. SDOD: An efficient object detection method for self-driving cars based on hierarchical cross-scale features[J]. Electronic Research Archive, 2025, 33(9): 5591-5615. doi: 10.3934/era.2025249
With the increasing prominence of autonomous vehicles in recent years, rapid and accurate environmental perception has become crucial for the operational safety and decision-making capabilities. To address the challenge of achieving an optimal balance between accuracy and real-time performance under in-vehicle computational constraints, this paper presented an efficient object detection algorithm for self-driving cars, which extracted the hierarchical cross-scale features based on a shifted-window attention mechanism. By integrating this improved feature representation with the more efficient feature fusion neck and detection head based on depth-wise separable convolution, the proposed approach significantly reduced model complexity and improves detection speed while maintaining near-identical detection accuracy. Experimental results demonstrate that this method simultaneously enhances processing speed and reduced model complexity while maintaining high detection precision, with floating-point operations reduced from 21.5 G to 6.0 G, a decrease of 15.5 G and an increase of 139 frames per second compared to YOLOv11s. This combination of efficiency and accuracy made the proposed algorithm particularly adaptable for resource-constraint self-driving systems.
| [1] |
Y. Dai, S. G. Lee, Perception, planning and control for self-driving system based on on-board sensors, Adv. Mech. Eng., 12 (2020), 1687814020956494. https://doi.org/10.1177/1687814020956494 doi: 10.1177/1687814020956494
|
| [2] |
C. Qiu, H. Tang, Y. Yang, X. Wan, X. Xu, S. Lin, et al., Machine vision-based autonomous road hazard avoidance system for self-driving vehicles, Sci. Rep., 14 (2024), 12178. https://doi.org/10.1038/s41598-024-62629-4 doi: 10.1038/s41598-024-62629-4
|
| [3] |
M. Reda, A. Onsy, A. Y. Haikal, A. Ghanbari, Path planning algorithms in the autonomous driving system: A comprehensive review, Robot. Auton. Syst., 174 (2024), 104630. https://doi.org/10.1016/j.robot.2024.104630 doi: 10.1016/j.robot.2024.104630
|
| [4] |
S. M. Hosseinian, H. Mirzahossein, Efficiency and safety of traffic networks under the effect of autonomous vehicles, Iran. J. Sci. Technol. Trans. Civ. Eng., 48 (2024), 1861–1885. https://doi.org/10.1007/s40996-023-01291-8 doi: 10.1007/s40996-023-01291-8
|
| [5] |
Q. Chen, Y. Xie, S. Guo, J. Bai, Q. Shu, Sensing system of environmental perception technologies for driverless vehicle: A review of state of the art and challenges, Sens. Actuators A Phys., 319 (2021), 112566. https://doi.org/10.1016/j.sna.2021.112566 doi: 10.1016/j.sna.2021.112566
|
| [6] | M. Kuderer, S. Gulati, W. Burgard, Learning driving styles for autonomous vehicles from demonstration, in 2015 IEEE International Conference on Robotics and Automation (ICRA), (2015), 2641–2646. https://doi.org/10.1109/ICRA.2015.7139555 |
| [7] |
A. Charroud, K. El-Moutaouakil, V. Palade, A. Yahyaouy, Enhanced autoencoder-based lidar localization in self-driving vehicles, Appl. Soft Comput., 152 (2024), 111225. https://doi.org/10.1016/j.asoc.2023.111225 doi: 10.1016/j.asoc.2023.111225
|
| [8] |
Z. Liu, Y. Cai, H. Wang, L. Chen, H. Gao, Y. Jia, et al., Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions, IEEE Trans. Intell. Transp. Syst., 23 (2021), 6640–6653. https://doi.org/10.1109/TITS.2021.3059674 doi: 10.1109/TITS.2021.3059674
|
| [9] |
Z. S. Dhaif, N. K. El-Abbadi, A review of machine learning techniques utilised in self-driving cars, Iraqi J. Comput. Sci. Math., 5 (2024), 1. https://doi.org/10.52866/ijcsm.2024.05.01.015 doi: 10.52866/ijcsm.2024.05.01.015
|
| [10] |
A. Boukerche, Z. Hou, Object detection using deep learning methods in traffic scenarios, ACM Comput. Surv., 54 (2021), 1–35. https://doi.org/10.1145/3434398 doi: 10.1145/3434398
|
| [11] |
N. J. Zakaria, M. I. Shapiai, R. Abd-Ghani, M. N. M. Yassin, M. Z. Ibrahim, N. Wahid, Lane detection in autonomous vehicles: A systematic review, IEEE Access, 11 (2023), 3729–3765. https://doi.org/10.1109/ACCESS.2023.3234442 doi: 10.1109/ACCESS.2023.3234442
|
| [12] |
L. Yang, Y. Xu, S. Wang, C. Yuan, Z. Zhang, B. Li, et al., PDNet: Toward better one-stage object detection with prediction decoupling, IEEE Trans. Image Process., 31 (2022), 5121–5133. https://doi.org/10.1109/TIP.2022.3193223 doi: 10.1109/TIP.2022.3193223
|
| [13] | C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7464–7475. |
| [14] |
L. Du, R. Zhang, X. Wang, Overview of two-stage object detection algorithms, J. Phys. Conf. Ser., 1544 (2020), 012033. 10.1088/1742-6596/1544/1/012033 doi: 10.1088/1742-6596/1544/1/012033
|
| [15] | G. Jocher, J. Qiu, Ultralytics YOLO11, Version 11.0.0, Computer software, 2024. |
| [16] | Z. Chen, K. Chen, J. Chen, Vehicle and pedestrian detection using support vector machine and histogram of oriented gradients features, in 2013 International Conference on Computer Sciences and Applications, (2013), 365–368. https://doi.org/10.1109/CSA.2013.92 |
| [17] | Y. Tian, R. Sukthankar, M. Shah, Spatiotemporal deformable part models for action detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013), 2642–2649. https://doi.org/10.1109/CVPR.2013.341 |
| [18] | J. Yan, Z. Lei, L. Wen, S. Z. Li, The fastest deformable part model for object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 2497–2504. |
| [19] | R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81 |
| [20] | R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision, (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169 |
| [21] | S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., 28, 2015. |
| [22] | Z. Cai, N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 6154–6162. https://doi.org/10.1109/CVPR.2018.00644 |
| [23] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 30, 2017. |
| [24] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16 $\times$ 16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
| [25] | Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986 |
| [26] | J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 779–788. |
| [27] | J. Redmon, A. Farhadi, Yolo9000: Better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 7263–7271. https://doi.org/10.1109/CVPR.2017.690 |
| [28] |
X. Han, J. Chang, K. Wang, Real-time object detection based on YOLO-v2 for tiny vehicle object, Proc. Comput. Sci., 183 (2021), 61–72. https://doi.org/10.1016/j.procs.2021.02.031 doi: 10.1016/j.procs.2021.02.031
|
| [29] | J. Redmon, A. Farhadi, Yolov3: An incremental improvement, preprint, arXiv: 1804.02767. |
| [30] | J. Choi, D. Chun, H. Kim, H. J. Lee, Gaussian Yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 502–511. https://doi.org/10.1109/ICCV.2019.00059 |
| [31] | A. Bochkovskiy, C. Y. Wang, H. Y.M. Liao, Yolov4: Optimal speed and accuracy of object detection, preprint, arXiv: 2004.10934. https://doi.org/10.48550/arXiv.2004.10934 |
| [32] |
B. Mahaur, K. K. Mishra, Small-object detection based on Yolov5 in autonomous driving systems, Pattern Recognit. Lett., 168 (2023), 115–122. https://doi.org/10.1016/j.patrec.2023.03.009 doi: 10.1016/j.patrec.2023.03.009
|
| [33] | M. Sohan, T. Sai-Ram, C. V. Rami-Reddy, A review on Yolov8 and its advancements, in International Conference on Data Intelligence and Cognitive Informatics, (2024), 529–545. https://doi.org/10.1007/978-981-99-7962-2_39 |
| [34] |
H. Tao, Z. Huang, Y. Wang, J. Qiu, V. Stojanovic, Efficient feature fusion network for small objects detection of traffic signs based on cross-dimensional and dual-domain information, Meas. Sci. Technol., 36 (2025), 035004. https://doi.org/10.1088/1361-6501/adb2ad doi: 10.1088/1361-6501/adb2ad
|
| [35] |
H. Tao, Y. Zheng, Y. Wang, J. Qiu, V. Stojanovic, Enhanced feature extraction yolo industrial small object detection algorithm based on receptive-field attention and multi-scale features, Meas. Sci. Technol., 35 (2024), 105023. https://doi.org/10.1088/1361-6501/ad633d doi: 10.1088/1361-6501/ad633d
|
| [36] |
Y. Sun, H. Tao, V. Stojanovic, Pseudo-label guided dual classifier domain adversarial network for unsupervised cross-domain fault diagnosis with small samples, Adv. Eng. Inform., 64 (2025), 102986. https://doi.org/10.1016/j.aei.2024.102986 doi: 10.1016/j.aei.2024.102986
|
| [37] |
J. Li, Q. W. Deng, W. X. Gao, B. Yang, L. Jia, J. Zhou, et al., Dsf-yolo for robust multiscale traffic sign detection under adverse weather conditions, Sci. Rep., 15 (2025), 24550. https://doi.org/10.1038/s41598-025-02877-0 doi: 10.1038/s41598-025-02877-0
|