According to the latest statistics at the end of 2022, the total length of highways in China has reached 5.3548 million kilometers, with a maintenance mileage of 5.3503 million kilometers, accounting for 99.9% of the total maintenance coverage. Relying on inefficient manual pavement detection methods is difficult to meet the needs of large-scale detection. To tackle this issue, experiments were conducted to explore deep learning-based intelligent identification models, leveraging pavement distress data as the fundamental basis. The dataset encompasses pavement micro-cracks, which hold particular significance for the purpose of pavement preventive maintenance. The two-stage model Faster R-CNN achieved a mean average precision (mAP) of 0.938, which surpassed the one-stage object detection algorithms YOLOv5 (mAP: 0.91) and YOLOv7 (mAP: 0.932). To balance model weight and detection performance, this study proposes a YOLO-based optimization method on the basis of YOLOv5. This method achieves comparable detection performance (mAP: 0.93) to that of two-stage detectors, while exhibiting only a minimal increase in the number of parameters. Overall, the two-stage model demonstrated excellent detection performance when using a residual network (ResNet) as the backbone, whereas the YOLO algorithm of the one-stage detection model proved to be more suitable for practical engineering applications.
Citation: Hui Yao, Yaning Fan, Xinyue Wei, Yanhao Liu, Dandan Cao, Zhanping You. Research and optimization of YOLO-based method for automatic pavement defect detection[J]. Electronic Research Archive, 2024, 32(3): 1708-1730. doi: 10.3934/era.2024078
According to the latest statistics at the end of 2022, the total length of highways in China has reached 5.3548 million kilometers, with a maintenance mileage of 5.3503 million kilometers, accounting for 99.9% of the total maintenance coverage. Relying on inefficient manual pavement detection methods is difficult to meet the needs of large-scale detection. To tackle this issue, experiments were conducted to explore deep learning-based intelligent identification models, leveraging pavement distress data as the fundamental basis. The dataset encompasses pavement micro-cracks, which hold particular significance for the purpose of pavement preventive maintenance. The two-stage model Faster R-CNN achieved a mean average precision (mAP) of 0.938, which surpassed the one-stage object detection algorithms YOLOv5 (mAP: 0.91) and YOLOv7 (mAP: 0.932). To balance model weight and detection performance, this study proposes a YOLO-based optimization method on the basis of YOLOv5. This method achieves comparable detection performance (mAP: 0.93) to that of two-stage detectors, while exhibiting only a minimal increase in the number of parameters. Overall, the two-stage model demonstrated excellent detection performance when using a residual network (ResNet) as the backbone, whereas the YOLO algorithm of the one-stage detection model proved to be more suitable for practical engineering applications.
[1] | K. Wang, Z. Hou, W. Gong, Automation techniques for digital highway data vehicle (DHDV), in 7th International Conference on Managing Pavement Assets, Citeseer, 2008. |
[2] | S. Zhu, X. Xia, Q. Zhang, K. Belloulata, An image segmentation algorithm in image processing based on threshold segmentation, in 2007 Third International IEEE Conference on Signal-Image technologies and Internet-Based System, (2007), 673–678. https://doi.org/10.1109/sitis.2007.116 |
[3] | S. S. Al-Amri, N. V. Kalyankar, Image segmentation by using threshold techniques, preprint, arXiv: 1005.4020. https://doi.org/10.48550/arXiv.1005.4020 |
[4] | N. Kanopoulos, N. Vasanthavada, R. L. Baker, Design of an image edge detection filter using the Sobel operator, IEEE J. Solid-State Circuits, 23 (1988), 358–367. https://doi.org/10.1109/4.996 doi: 10.1109/4.996 |
[5] | W. Dong, Z. Shisheng, Color image recognition method based on the prewitt operator, in 2008 International Conference on Computer Science and Software Engineering, 6 (2008), 170–173. https://doi.org/10.1109/CSSE.2008.567 |
[6] | L. Er-Sen, Z. Shu-Long, Z. Bao-shan, Z. Yong, X. Chao-gui, S. Li-hua, An adaptive edge-detection method based on the canny operator, in 2009 International Conference on Environmental Science and Information Application Technology, 1 (2009), 465–469. https://doi.org/10.1109/ESIAT.2009.49 |
[7] | A. Marques, P. L. Correia, Automatic road pavement crack detection using SVM, in Lisbon, Portugal: Dissertation for the Master of Science Degree in Electrical and Computer Engineering at Instituto Superior Técnico, 2012. |
[8] | K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. https://doi.org/10.48550/arXiv.1409.1556 |
[9] | C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594 |
[10] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[11] | R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2014), 580–587. https://doi.org/10.1109/cvpr.2014.81 |
[12] | A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst., (2012), 25. https://doi.org/10.1145/3065386 doi: 10.1145/3065386 |
[13] | K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015), 1904–1916. https://doi.org/10.18280/ts.370620 doi: 10.18280/ts.370620 |
[14] | R. Girshick, Fast R-CNN, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 2380–7504. https://doi.org/10.1109/ICCV.2015.169 |
[15] | S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., (2015), 28. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031 |
[16] | T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 2980–2988. https://doi.org/10.1109/TPAMI.2018.2858826 |
[17] | J. Redmon, S. K. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 779–788. https://doi.org/10.48550/arXiv.1506.02640 |
[18] | C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, preprint, arXiv: 2207.02696. https://doi.org/10.48550/arXiv.2207.02696 |
[19] | C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, et al., YOLOv6: A single-stage object detection framework for industrial applications, preprint, arXiv: 2209.02976. https://doi.org/10.48550/arXiv.2209.02976 |
[20] | A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao, YOLOv4: Optimal speed and accuracy of object detection, preprint, arXiv: 2004.10934. https://doi.org/10.48550/arXiv.2004.10934 |
[21] | J. Redmon, A. Farhadi, YOLOv3: An incremental improvement, preprint, arXiv: 1804.02767. https://doi.org/10.48550/arXiv.1804.02767 |
[22] | J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 7263–7271. https://doi.org/10.1109/CVPR.2017.690 |
[23] | W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, et al., Ssd: Single shot multibox detector, in Computer Vision–ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, 9905 (2016). https://doi.org/10.1007/978-3-319-46448-0_2 |
[24] | A. Womg, M. J. Shafiee, F. Li, B. Chwyl, Tiny SSD: A tiny single-shot detection deep convolutional neural network for real-time embedded object detection, in 2018 15th Conference on Computer and Robot Vision (CRV), (2018), 95–101. https://doi.org/10.1109/CRV.2018.00023 |
[25] | V. Mandal, L. Uong, Y. Adu-Gyamfi, Automated road crack detection using deep convolutional neural networks, in 2018 IEEE International Conference on Big Data (Big Data), (2018), 5212–5215. https://doi.org/10.1109/BigData.2018.8622327 |
[26] | S. Dong, J. Zhang, F. Wang, X. Wang, YOLO-pest: a real-time multi-class crop pest detection model, in International Conference on Computer Application and Information Security (ICCAIS 2021), 12260 (2022), 12–18. https://doi.org/10.1117/12.2637467 |
[27] | L. Liu, C. Ke, H. Lin, H. Xu, Research on pedestrian detection algorithm based on MobileNet-YOLO, Comput. Intell. Neurosci., 2022 (2022). https://doi.org/10.1155/2022/8924027 doi: 10.1155/2022/8924027 |
[28] | S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 8759–8768, https://doi.org/10.1109/CVPR.2018.00913 |
[29] | M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, preprint, arXiv: 1905.11946v2. https://doi.org/10.48550/arXiv.1905.11946 |
[30] | S. Woo, J. Park, J. Y. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 3–19. https://doi.org/10.48550/arXiv.1807.06521 |
[31] | Q. L. Zhang, Y. B. Yang, SA-Net: Shuffle attention for deep convolutional neural networks, in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2021), 2235–2239. https://doi.org/10.1109/ICASSP39728.2021.9414568 |
[32] | L. Yang, R. Y. Zhang, L. Li, X. Xie, Simam: A simple, parameter-free attention module for convolutional neural networks, in International Conference on Machine Learning, (2021), 11863–11874. |
[33] | J. Yu, Y. Jiang, Z. Wang, Z. Cao, T. Huang, Unitbox: An advanced object detection network, in Proceedings of the 24th ACM International Conference on Multimedia, (2016), 516–520. https://doi.org/10.1145/2964284.2967274 |
[34] | H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A metric and a loss for bounding box regression, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 658–666. https://doi.org/10.1109/CVPR.2019.00075 |
[35] | Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better learning for bounding box regression, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 12993–13000. https://doi.org/10.48550/arXiv.1911.08287 |
[36] | Z. Zheng, P. Wang, D. Ren, W. Liu, R. Ye, Q. Hu, et al., Enhancing geometric factors in model learning and inference for object detection and instance segmentation, IEEE Trans. Cybern., 52 (2021), 8574–8586. https://doi.org/10.48550/arXiv.2005.03572 doi: 10.48550/arXiv.2005.03572 |
[37] | Z. Yang, X. Wang, J. Li, EIoU: An improved vehicle detection algorithm based on vehiclenet neural network, in Journal of Physics: Conference Series, 1924 (2021), 012001. https://doi.org/10.48550/arXiv.2005.03572 |
[38] | H. Zhang, H. Chang, B. Ma, N. Wang, X. Chen, Dynamic R-CNN: Towards high quality object detection via dynamic training, in Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, (2020), 260–275. https://doi.org/10.1007/978-3-030-58555-6_16 |
[39] | Z. Liu, X. Gu, H. Yang, L. Wang, Y. Chen, D. Wang, Novel YOLOv3 model with structure and hyperparameter optimization for detection of pavement concealed cracks in GPR images, IEEE Trans. Intell. Transp. Syst., 23 (2022), 22258–22268. https://doi.org/10.1109/TITS.2022.3174626 doi: 10.1109/TITS.2022.3174626 |
[40] | D. Ma, H. Fang, N. Wang, C. Zhang, J. Dong, H. Hu, Automatic detection and counting system for pavement cracks based on PCGAN and YOLO-MF, IEEE Trans. Intell. Transp. Syst., 23 (2022), 22166–22178. https://doi.org/10.1109/TITS.2022.3161960 doi: 10.1109/TITS.2022.3161960 |
[41] | J. Li, C. Yuan, X. Wang, Real-time instance-level detection of asphalt pavement distress combining space-to-depth (SPD) YOLO and omni-scale network (OSNet), Autom. Constr., 155 (2023), 105062. https://doi.org/10.1016/j.autcon.2023.105062 doi: 10.1016/j.autcon.2023.105062 |
[42] | Q. Qiu, D. Lau, Real-time detection of cracks in tiled sidewalks using YOLO-based method applied to unmanned aerial vehicle (UAV) images, Autom. Constr., 147 (2023), 104745. https://doi.org/10.1016/j.autcon.2023.104745 doi: 10.1016/j.autcon.2023.104745 |
[43] | H. Yao, Y. Liu, H. Lv, J. Huyan, Z. You, Y. Hou, Encoder-decoder with pyramid region attention for pixel‐level pavement crack recognition, Comput.‐Aided Civil Infrastruct. Eng., 2023. https://doi.org/10.1111/mice.13128 doi: 10.1111/mice.13128 |
[44] | R. Li, Y. Wu, Improved YOLO v5 wheat ear detection algorithm based on attention mechanism, Electronics, 11 (2022), 1673. https://doi.org/10.3390/electronics11111673 doi: 10.3390/electronics11111673 |
[45] | J. Sun, H. Ge, Z. Zhang, AS-YOLO: an improved YOLOv4 based on attention mechanism and SqueezeNet for person detection, in 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 5 (2021), 1451–1456. https://doi.org/10.1109/IAEAC50856.2021.9390855 |
[46] | J. Li, H. Wang, Y. Xu, F. Liu, Road object detection of YOLO algorithm with attention mechanism, Front. Signal Process., (2021), 9–16. https://doi.org/10.22606/fsp.2021.51002 doi: 10.22606/fsp.2021.51002 |
[47] | Y. Yuan, L. Huang, J. Guo, C. Zhang, X. Chen, J. Wang, Ocnet: Object context network for scene parsing, preprint, arXiv: 1809.00916. https://doi.org/https://doi.org/10.48550/arXiv.1809.00916 |
[48] | Q. Wang, T. Wu, H. Zheng, G. Guo, Hierarchical pyramid diverse attention networks for face recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 8326–8335. https://doi.org/10.1109/CVPR42600.2020.00835 |
[49] | O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, et al., Attention u-net: Learning where to look for the pancreas, preprint, arXiv: 1804.03999. https://doi.org/10.48550/arXiv.1804.03999 |
[50] | M. H. Guo, J. X. Cai, Z. N. Liu, T. J. Mu, R. R. Martin, S. M. Hu, Pct: Point cloud transformer, Comput. Visual Media, 7 (2021), 187–199. https://doi.org/10.1007/s41095-021-0229-5 doi: 10.1007/s41095-021-0229-5 |
[51] | H. Yao, Y. Liu, X. Li, Z. You, Y. Feng, W. Lu, A detection method for pavement cracks combining object detection and attention mechanism, IEEE Trans. Intell. Transp. Syst., 23 (2022), 22179–22189. https://doi.org/10.1109/TITS.2022.3177210 doi: 10.1109/TITS.2022.3177210 |
[52] | F. J. Du, S. J. Jiao, Improvement of lightweight convolutional neural network model based on YOLO algorithm and its research in pavement defect detection, Sensors, 22 (2022), 3537. https://doi.org/10.3390/s22093537 doi: 10.3390/s22093537 |
[53] | D. Wang, Z. Liu, X. Gu, W. Wu, Y. Chen, L. Wang, Automatic detection of pothole distress in asphalt pavement using improved convolutional neural networks, Remote Sens., 14 (2022), 3892. https://doi.org/10.3390/rs14163892 doi: 10.3390/rs14163892 |
[54] | M. Nie, C. Wang, Pavement crack detection based on yolo v3, in 2019 2nd International Conference on Safety Produce Informatization (IICSPI), (2019), 327–330. https://doi.org/10.1109/IICSPI48186.2019.9095956 |
[55] | D. Zhou, J. Fang, X. Song, C. Guan, J. Yin, Y. Dai, et al., IoU loss for 2d/3d object detection, in 2019 International Conference on 3D Vision (3DV), (2019), 85–94. https://doi.org/10.1109/3DV.2019.00019 |
[56] | C. Han, T. Ma, L. Gu, J. Cao, X. Shi, W. Huang, et al., Asphalt pavement health prediction based on improved transformer network, IEEE Trans. Intell. Transp. Syst., 24 (2022), 4482–4493. https://doi.org/10.1109/TITS.2022.3229326 doi: 10.1109/TITS.2022.3229326 |
[57] | Z. Tong, T. Ma, W. Zhang, J. Huyan, Evidential transformer for pavement distress segmentation, Comput.‐Aided Civil Infrastruct. Eng., 2023. https://doi.org/10.1111/mice.13018 doi: 10.1111/mice.13018 |