Research article

SDOD: An efficient object detection method for self-driving cars based on hierarchical cross-scale features

  • Published: 17 September 2025
  • With the increasing prominence of autonomous vehicles in recent years, rapid and accurate environmental perception has become crucial for the operational safety and decision-making capabilities. To address the challenge of achieving an optimal balance between accuracy and real-time performance under in-vehicle computational constraints, this paper presented an efficient object detection algorithm for self-driving cars, which extracted the hierarchical cross-scale features based on a shifted-window attention mechanism. By integrating this improved feature representation with the more efficient feature fusion neck and detection head based on depth-wise separable convolution, the proposed approach significantly reduced model complexity and improves detection speed while maintaining near-identical detection accuracy. Experimental results demonstrate that this method simultaneously enhances processing speed and reduced model complexity while maintaining high detection precision, with floating-point operations reduced from 21.5 G to 6.0 G, a decrease of 15.5 G and an increase of 139 frames per second compared to YOLOv11s. This combination of efficiency and accuracy made the proposed algorithm particularly adaptable for resource-constraint self-driving systems.

    Citation: Jingwen Qi, Jian Wang. SDOD: An efficient object detection method for self-driving cars based on hierarchical cross-scale features[J]. Electronic Research Archive, 2025, 33(9): 5591-5615. doi: 10.3934/era.2025249

    Related Papers:

  • With the increasing prominence of autonomous vehicles in recent years, rapid and accurate environmental perception has become crucial for the operational safety and decision-making capabilities. To address the challenge of achieving an optimal balance between accuracy and real-time performance under in-vehicle computational constraints, this paper presented an efficient object detection algorithm for self-driving cars, which extracted the hierarchical cross-scale features based on a shifted-window attention mechanism. By integrating this improved feature representation with the more efficient feature fusion neck and detection head based on depth-wise separable convolution, the proposed approach significantly reduced model complexity and improves detection speed while maintaining near-identical detection accuracy. Experimental results demonstrate that this method simultaneously enhances processing speed and reduced model complexity while maintaining high detection precision, with floating-point operations reduced from 21.5 G to 6.0 G, a decrease of 15.5 G and an increase of 139 frames per second compared to YOLOv11s. This combination of efficiency and accuracy made the proposed algorithm particularly adaptable for resource-constraint self-driving systems.



    加载中


    [1] Y. Dai, S. G. Lee, Perception, planning and control for self-driving system based on on-board sensors, Adv. Mech. Eng., 12 (2020), 1687814020956494. https://doi.org/10.1177/1687814020956494 doi: 10.1177/1687814020956494
    [2] C. Qiu, H. Tang, Y. Yang, X. Wan, X. Xu, S. Lin, et al., Machine vision-based autonomous road hazard avoidance system for self-driving vehicles, Sci. Rep., 14 (2024), 12178. https://doi.org/10.1038/s41598-024-62629-4 doi: 10.1038/s41598-024-62629-4
    [3] M. Reda, A. Onsy, A. Y. Haikal, A. Ghanbari, Path planning algorithms in the autonomous driving system: A comprehensive review, Robot. Auton. Syst., 174 (2024), 104630. https://doi.org/10.1016/j.robot.2024.104630 doi: 10.1016/j.robot.2024.104630
    [4] S. M. Hosseinian, H. Mirzahossein, Efficiency and safety of traffic networks under the effect of autonomous vehicles, Iran. J. Sci. Technol. Trans. Civ. Eng., 48 (2024), 1861–1885. https://doi.org/10.1007/s40996-023-01291-8 doi: 10.1007/s40996-023-01291-8
    [5] Q. Chen, Y. Xie, S. Guo, J. Bai, Q. Shu, Sensing system of environmental perception technologies for driverless vehicle: A review of state of the art and challenges, Sens. Actuators A Phys., 319 (2021), 112566. https://doi.org/10.1016/j.sna.2021.112566 doi: 10.1016/j.sna.2021.112566
    [6] M. Kuderer, S. Gulati, W. Burgard, Learning driving styles for autonomous vehicles from demonstration, in 2015 IEEE International Conference on Robotics and Automation (ICRA), (2015), 2641–2646. https://doi.org/10.1109/ICRA.2015.7139555
    [7] A. Charroud, K. El-Moutaouakil, V. Palade, A. Yahyaouy, Enhanced autoencoder-based lidar localization in self-driving vehicles, Appl. Soft Comput., 152 (2024), 111225. https://doi.org/10.1016/j.asoc.2023.111225 doi: 10.1016/j.asoc.2023.111225
    [8] Z. Liu, Y. Cai, H. Wang, L. Chen, H. Gao, Y. Jia, et al., Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions, IEEE Trans. Intell. Transp. Syst., 23 (2021), 6640–6653. https://doi.org/10.1109/TITS.2021.3059674 doi: 10.1109/TITS.2021.3059674
    [9] Z. S. Dhaif, N. K. El-Abbadi, A review of machine learning techniques utilised in self-driving cars, Iraqi J. Comput. Sci. Math., 5 (2024), 1. https://doi.org/10.52866/ijcsm.2024.05.01.015 doi: 10.52866/ijcsm.2024.05.01.015
    [10] A. Boukerche, Z. Hou, Object detection using deep learning methods in traffic scenarios, ACM Comput. Surv., 54 (2021), 1–35. https://doi.org/10.1145/3434398 doi: 10.1145/3434398
    [11] N. J. Zakaria, M. I. Shapiai, R. Abd-Ghani, M. N. M. Yassin, M. Z. Ibrahim, N. Wahid, Lane detection in autonomous vehicles: A systematic review, IEEE Access, 11 (2023), 3729–3765. https://doi.org/10.1109/ACCESS.2023.3234442 doi: 10.1109/ACCESS.2023.3234442
    [12] L. Yang, Y. Xu, S. Wang, C. Yuan, Z. Zhang, B. Li, et al., PDNet: Toward better one-stage object detection with prediction decoupling, IEEE Trans. Image Process., 31 (2022), 5121–5133. https://doi.org/10.1109/TIP.2022.3193223 doi: 10.1109/TIP.2022.3193223
    [13] C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7464–7475.
    [14] L. Du, R. Zhang, X. Wang, Overview of two-stage object detection algorithms, J. Phys. Conf. Ser., 1544 (2020), 012033. 10.1088/1742-6596/1544/1/012033 doi: 10.1088/1742-6596/1544/1/012033
    [15] G. Jocher, J. Qiu, Ultralytics YOLO11, Version 11.0.0, Computer software, 2024.
    [16] Z. Chen, K. Chen, J. Chen, Vehicle and pedestrian detection using support vector machine and histogram of oriented gradients features, in 2013 International Conference on Computer Sciences and Applications, (2013), 365–368. https://doi.org/10.1109/CSA.2013.92
    [17] Y. Tian, R. Sukthankar, M. Shah, Spatiotemporal deformable part models for action detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013), 2642–2649. https://doi.org/10.1109/CVPR.2013.341
    [18] J. Yan, Z. Lei, L. Wen, S. Z. Li, The fastest deformable part model for object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 2497–2504.
    [19] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81
    [20] R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision, (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169
    [21] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., 28, 2015.
    [22] Z. Cai, N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 6154–6162. https://doi.org/10.1109/CVPR.2018.00644
    [23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, et al., Attention is all you need, Adv. Neural Inf. Process. Syst., 30, 2017.
    [24] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16 $\times$ 16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929.
    [25] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
    [26] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 779–788.
    [27] J. Redmon, A. Farhadi, Yolo9000: Better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 7263–7271. https://doi.org/10.1109/CVPR.2017.690
    [28] X. Han, J. Chang, K. Wang, Real-time object detection based on YOLO-v2 for tiny vehicle object, Proc. Comput. Sci., 183 (2021), 61–72. https://doi.org/10.1016/j.procs.2021.02.031 doi: 10.1016/j.procs.2021.02.031
    [29] J. Redmon, A. Farhadi, Yolov3: An incremental improvement, preprint, arXiv: 1804.02767.
    [30] J. Choi, D. Chun, H. Kim, H. J. Lee, Gaussian Yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 502–511. https://doi.org/10.1109/ICCV.2019.00059
    [31] A. Bochkovskiy, C. Y. Wang, H. Y.M. Liao, Yolov4: Optimal speed and accuracy of object detection, preprint, arXiv: 2004.10934. https://doi.org/10.48550/arXiv.2004.10934
    [32] B. Mahaur, K. K. Mishra, Small-object detection based on Yolov5 in autonomous driving systems, Pattern Recognit. Lett., 168 (2023), 115–122. https://doi.org/10.1016/j.patrec.2023.03.009 doi: 10.1016/j.patrec.2023.03.009
    [33] M. Sohan, T. Sai-Ram, C. V. Rami-Reddy, A review on Yolov8 and its advancements, in International Conference on Data Intelligence and Cognitive Informatics, (2024), 529–545. https://doi.org/10.1007/978-981-99-7962-2_39
    [34] H. Tao, Z. Huang, Y. Wang, J. Qiu, V. Stojanovic, Efficient feature fusion network for small objects detection of traffic signs based on cross-dimensional and dual-domain information, Meas. Sci. Technol., 36 (2025), 035004. https://doi.org/10.1088/1361-6501/adb2ad doi: 10.1088/1361-6501/adb2ad
    [35] H. Tao, Y. Zheng, Y. Wang, J. Qiu, V. Stojanovic, Enhanced feature extraction yolo industrial small object detection algorithm based on receptive-field attention and multi-scale features, Meas. Sci. Technol., 35 (2024), 105023. https://doi.org/10.1088/1361-6501/ad633d doi: 10.1088/1361-6501/ad633d
    [36] Y. Sun, H. Tao, V. Stojanovic, Pseudo-label guided dual classifier domain adversarial network for unsupervised cross-domain fault diagnosis with small samples, Adv. Eng. Inform., 64 (2025), 102986. https://doi.org/10.1016/j.aei.2024.102986 doi: 10.1016/j.aei.2024.102986
    [37] J. Li, Q. W. Deng, W. X. Gao, B. Yang, L. Jia, J. Zhou, et al., Dsf-yolo for robust multiscale traffic sign detection under adverse weather conditions, Sci. Rep., 15 (2025), 24550. https://doi.org/10.1038/s41598-025-02877-0 doi: 10.1038/s41598-025-02877-0
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(543) PDF downloads(18) Cited by(0)

Article outline

Figures and Tables

Figures(13)  /  Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog