SDOD: An efficient object detection method for self-driving cars based on hierarchical cross-scale features

Jingwen Qi; Jian Wang; Jingwen Qi; Jian Wang

doi:10.3934/era.2025249

Electronic Research Archive

2025, Volume 33, Issue 9: 5591-5615. doi: 10.3934/era.2025249

Previous Article Next Article

Research article

SDOD: An efficient object detection method for self-driving cars based on hierarchical cross-scale features

Jingwen Qi ^{1
,
,},
Jian Wang ^1,2

1.
School of Automation and Intelligence, Beijing Jiaotong University, Beijing, China
2.
Beijing Engineering Research Center of EMC and GNSS Technology for Rail Transportation, Beijing Jiaotong University, Beijing, China

Received: 04 June 2025 Revised: 19 August 2025 Accepted: 03 September 2025 Published: 17 September 2025

With the increasing prominence of autonomous vehicles in recent years, rapid and accurate environmental perception has become crucial for the operational safety and decision-making capabilities. To address the challenge of achieving an optimal balance between accuracy and real-time performance under in-vehicle computational constraints, this paper presented an efficient object detection algorithm for self-driving cars, which extracted the hierarchical cross-scale features based on a shifted-window attention mechanism. By integrating this improved feature representation with the more efficient feature fusion neck and detection head based on depth-wise separable convolution, the proposed approach significantly reduced model complexity and improves detection speed while maintaining near-identical detection accuracy. Experimental results demonstrate that this method simultaneously enhances processing speed and reduced model complexity while maintaining high detection precision, with floating-point operations reduced from 21.5 G to 6.0 G, a decrease of 15.5 G and an increase of 139 frames per second compared to YOLOv11s. This combination of efficiency and accuracy made the proposed algorithm particularly adaptable for resource-constraint self-driving systems.
- object detection,
- self-driving,
- hierarchical features,
- cross-scale features
Citation: Jingwen Qi, Jian Wang. SDOD: An efficient object detection method for self-driving cars based on hierarchical cross-scale features[J]. Electronic Research Archive, 2025, 33(9): 5591-5615. doi: 10.3934/era.2025249

Related Papers:

Abstract

With the increasing prominence of autonomous vehicles in recent years, rapid and accurate environmental perception has become crucial for the operational safety and decision-making capabilities. To address the challenge of achieving an optimal balance between accuracy and real-time performance under in-vehicle computational constraints, this paper presented an efficient object detection algorithm for self-driving cars, which extracted the hierarchical cross-scale features based on a shifted-window attention mechanism. By integrating this improved feature representation with the more efficient feature fusion neck and detection head based on depth-wise separable convolution, the proposed approach significantly reduced model complexity and improves detection speed while maintaining near-identical detection accuracy. Experimental results demonstrate that this method simultaneously enhances processing speed and reduced model complexity while maintaining high detection precision, with floating-point operations reduced from 21.5 G to 6.0 G, a decrease of 15.5 G and an increase of 139 frames per second compared to YOLOv11s. This combination of efficiency and accuracy made the proposed algorithm particularly adaptable for resource-constraint self-driving systems.

References

[1]	Y. Dai, S. G. Lee, Perception, planning and control for self-driving system based on on-board sensors, Adv. Mech. Eng., 12 (2020), 1687814020956494. https://doi.org/10.1177/1687814020956494 doi: 10.1177/1687814020956494
[2]	C. Qiu, H. Tang, Y. Yang, X. Wan, X. Xu, S. Lin, et al., Machine vision-based autonomous road hazard avoidance system for self-driving vehicles, Sci. Rep., 14 (2024), 12178. https://doi.org/10.1038/s41598-024-62629-4 doi: 10.1038/s41598-024-62629-4
[3]	M. Reda, A. Onsy, A. Y. Haikal, A. Ghanbari, Path planning algorithms in the autonomous driving system: A comprehensive review, Robot. Auton. Syst., 174 (2024), 104630. https://doi.org/10.1016/j.robot.2024.104630 doi: 10.1016/j.robot.2024.104630
[4]	S. M. Hosseinian, H. Mirzahossein, Efficiency and safety of traffic networks under the effect of autonomous vehicles, Iran. J. Sci. Technol. Trans. Civ. Eng., 48 (2024), 1861–1885. https://doi.org/10.1007/s40996-023-01291-8 doi: 10.1007/s40996-023-01291-8
[5]	Q. Chen, Y. Xie, S. Guo, J. Bai, Q. Shu, Sensing system of environmental perception technologies for driverless vehicle: A review of state of the art and challenges, Sens. Actuators A Phys., 319 (2021), 112566. https://doi.org/10.1016/j.sna.2021.112566 doi: 10.1016/j.sna.2021.112566
[6]	M. Kuderer, S. Gulati, W. Burgard, Learning driving styles for autonomous vehicles from demonstration, in 2015 IEEE International Conference on Robotics and Automation (ICRA), (2015), 2641–2646. https://doi.org/10.1109/ICRA.2015.7139555
[7]	A. Charroud, K. El-Moutaouakil, V. Palade, A. Yahyaouy, Enhanced autoencoder-based lidar localization in self-driving vehicles, Appl. Soft Comput., 152 (2024), 111225. https://doi.org/10.1016/j.asoc.2023.111225 doi: 10.1016/j.asoc.2023.111225
[8]	Z. Liu, Y. Cai, H. Wang, L. Chen, H. Gao, Y. Jia, et al., Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions, IEEE Trans. Intell. Transp. Syst., 23 (2021), 6640–6653. https://doi.org/10.1109/TITS.2021.3059674 doi: 10.1109/TITS.2021.3059674
[9]	Z. S. Dhaif, N. K. El-Abbadi, A review of machine learning techniques utilised in self-driving cars, Iraqi J. Comput. Sci. Math., 5 (2024), 1. https://doi.org/10.52866/ijcsm.2024.05.01.015 doi: 10.52866/ijcsm.2024.05.01.015
[10]	A. Boukerche, Z. Hou, Object detection using deep learning methods in traffic scenarios, ACM Comput. Surv., 54 (2021), 1–35. https://doi.org/10.1145/3434398 doi: 10.1145/3434398
[11]	N. J. Zakaria, M. I. Shapiai, R. Abd-Ghani, M. N. M. Yassin, M. Z. Ibrahim, N. Wahid, Lane detection in autonomous vehicles: A systematic review, IEEE Access, 11 (2023), 3729–3765. https://doi.org/10.1109/ACCESS.2023.3234442 doi: 10.1109/ACCESS.2023.3234442
[12]	L. Yang, Y. Xu, S. Wang, C. Yuan, Z. Zhang, B. Li, et al., PDNet: Toward better one-stage object detection with prediction decoupling, IEEE Trans. Image Process., 31 (2022), 5121–5133. https://doi.org/10.1109/TIP.2022.3193223 doi: 10.1109/TIP.2022.3193223
[13]	C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7464–7475.
[14]	L. Du, R. Zhang, X. Wang, Overview of two-stage object detection algorithms, J. Phys. Conf. Ser., 1544 (2020), 012033. 10.1088/1742-6596/1544/1/012033 doi: 10.1088/1742-6596/1544/1/012033
[15]	G. Jocher, J. Qiu, Ultralytics YOLO11, Version 11.0.0, Computer software, 2024.
[16]	Z. Chen, K. Chen, J. Chen, Vehicle and pedestrian detection using support vector machine and histogram of oriented gradients features, in 2013 International Conference on Computer Sciences and Applications, (2013), 365–368. https://doi.org/10.1109/CSA.2013.92
[17]	Y. Tian, R. Sukthankar, M. Shah, Spatiotemporal deformable part models for action detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013), 2642–2649. https://doi.org/10.1109/CVPR.2013.341
[18]	J. Yan, Z. Lei, L. Wen, S. Z. Li, The fastest deformable part model for object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 2497–2504.
[19]	R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81
[20]	R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision, (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169
[21]	S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., 28, 2015.
[22]	Z. Cai, N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 6154–6162. https://doi.org/10.1109/CVPR.2018.00644
[23]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 30, 2017.
[24]	A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16 $\times$ 16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
[25]	Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
[26]	J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 779–788.
[27]	J. Redmon, A. Farhadi, Yolo9000: Better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 7263–7271. https://doi.org/10.1109/CVPR.2017.690
[28]	X. Han, J. Chang, K. Wang, Real-time object detection based on YOLO-v2 for tiny vehicle object, Proc. Comput. Sci., 183 (2021), 61–72. https://doi.org/10.1016/j.procs.2021.02.031 doi: 10.1016/j.procs.2021.02.031
[29]	J. Redmon, A. Farhadi, Yolov3: An incremental improvement, preprint, arXiv: 1804.02767.
[30]	J. Choi, D. Chun, H. Kim, H. J. Lee, Gaussian Yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 502–511. https://doi.org/10.1109/ICCV.2019.00059
[31]	A. Bochkovskiy, C. Y. Wang, H. Y.M. Liao, Yolov4: Optimal speed and accuracy of object detection, preprint, arXiv: 2004.10934. https://doi.org/10.48550/arXiv.2004.10934
[32]	B. Mahaur, K. K. Mishra, Small-object detection based on Yolov5 in autonomous driving systems, Pattern Recognit. Lett., 168 (2023), 115–122. https://doi.org/10.1016/j.patrec.2023.03.009 doi: 10.1016/j.patrec.2023.03.009
[33]	M. Sohan, T. Sai-Ram, C. V. Rami-Reddy, A review on Yolov8 and its advancements, in International Conference on Data Intelligence and Cognitive Informatics, (2024), 529–545. https://doi.org/10.1007/978-981-99-7962-2_39
[34]	H. Tao, Z. Huang, Y. Wang, J. Qiu, V. Stojanovic, Efficient feature fusion network for small objects detection of traffic signs based on cross-dimensional and dual-domain information, Meas. Sci. Technol., 36 (2025), 035004. https://doi.org/10.1088/1361-6501/adb2ad doi: 10.1088/1361-6501/adb2ad
[35]	H. Tao, Y. Zheng, Y. Wang, J. Qiu, V. Stojanovic, Enhanced feature extraction yolo industrial small object detection algorithm based on receptive-field attention and multi-scale features, Meas. Sci. Technol., 35 (2024), 105023. https://doi.org/10.1088/1361-6501/ad633d doi: 10.1088/1361-6501/ad633d
[36]	Y. Sun, H. Tao, V. Stojanovic, Pseudo-label guided dual classifier domain adversarial network for unsupervised cross-domain fault diagnosis with small samples, Adv. Eng. Inform., 64 (2025), 102986. https://doi.org/10.1016/j.aei.2024.102986 doi: 10.1016/j.aei.2024.102986
[37]	J. Li, Q. W. Deng, W. X. Gao, B. Yang, L. Jia, J. Zhou, et al., Dsf-yolo for robust multiscale traffic sign detection under adverse weather conditions, Sci. Rep., 15 (2025), 24550. https://doi.org/10.1038/s41598-025-02877-0 doi: 10.1038/s41598-025-02877-0

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)