Fruits require different planting techniques at different growth stages. Traditionally, the maturity stage of fruit is judged visually, which is time-consuming and labor-intensive. Fruits differ in size and color, and sometimes leaves or branches occult some of fruits, limiting automatic detection of growth stages in a real environment. Based on YOLOV4-Tiny, this study proposes a GCS-YOLOV4-Tiny model by (1) adding squeeze and excitation (SE) and the spatial pyramid pooling (SPP) modules to improve the accuracy of the model and (2) using the group convolution to reduce the size of the model and finally achieve faster detection speed. The proposed GCS-YOLOV4-Tiny model was executed on three public fruit datasets. Results have shown that GCS-YOLOV4-Tiny has favorable performance on mAP, Recall, F1-Score and Average IoU on Mango YOLO and Rpi-Tomato datasets. In addition, with the smallest model size of 20.70 MB, the mAP, Recall, F1-score, Precision and Average IoU of GCS-YOLOV4-Tiny achieve 93.42 ± 0.44, 91.00 ± 1.87, 90.80 ± 2.59, 90.80 ± 2.77 and 76.94 ± 1.35%, respectively, on F. margarita dataset. The detection results outperform the state-of-the-art YOLOV4-Tiny model with a 17.45% increase in mAP and a 13.80% increase in F1-score. The proposed model provides an effective and efficient performance to detect different growth stages of fruits and can be extended for different fruits and crops for object or disease detections.
Citation: Mei-Ling Huang, Yi-Shan Wu. GCS-YOLOV4-Tiny: A lightweight group convolution network for multi-stage fruit detection[J]. Mathematical Biosciences and Engineering, 2023, 20(1): 241-268. doi: 10.3934/mbe.2023011
Fruits require different planting techniques at different growth stages. Traditionally, the maturity stage of fruit is judged visually, which is time-consuming and labor-intensive. Fruits differ in size and color, and sometimes leaves or branches occult some of fruits, limiting automatic detection of growth stages in a real environment. Based on YOLOV4-Tiny, this study proposes a GCS-YOLOV4-Tiny model by (1) adding squeeze and excitation (SE) and the spatial pyramid pooling (SPP) modules to improve the accuracy of the model and (2) using the group convolution to reduce the size of the model and finally achieve faster detection speed. The proposed GCS-YOLOV4-Tiny model was executed on three public fruit datasets. Results have shown that GCS-YOLOV4-Tiny has favorable performance on mAP, Recall, F1-Score and Average IoU on Mango YOLO and Rpi-Tomato datasets. In addition, with the smallest model size of 20.70 MB, the mAP, Recall, F1-score, Precision and Average IoU of GCS-YOLOV4-Tiny achieve 93.42 ± 0.44, 91.00 ± 1.87, 90.80 ± 2.59, 90.80 ± 2.77 and 76.94 ± 1.35%, respectively, on F. margarita dataset. The detection results outperform the state-of-the-art YOLOV4-Tiny model with a 17.45% increase in mAP and a 13.80% increase in F1-score. The proposed model provides an effective and efficient performance to detect different growth stages of fruits and can be extended for different fruits and crops for object or disease detections.
[1] | R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81 |
[2] | J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, Selective search for object recognition, Int. J. Comput. Vision, 104 (2013), 154–171. https://doi.org/10.1007/s11263-013-0620-5 doi: 10.1007/s11263-013-0620-5 |
[3] | H. Jiang, C. Zhang, Y. Qiao, Z. Zhang, W. Zhang, C. Song, CNN feature based graph convolutional network for weed and crop recognition in smart farming, Comput. Electron. Agric., 174 (2020), 105450. https://doi.org/10.1016/j.compag.2020.105450 doi: 10.1016/j.compag.2020.105450 |
[4] | R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169 |
[5] | K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, in Computer Vision – ECCV 2014, 8691 (2014), 346–361. https://doi.org/10.1007/978-3-319-10578-9_23 |
[6] | S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE T. Pattern Anal., 39 (2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031 |
[7] | X. Li, T. Lai, S. Wang, Q. Chen, C. Yang, R. Chen, et al., Feature pyramid networks for object detection, in 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), (2019), 1500–1504. https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00217 |
[8] | R. Ghosh, A Faster R-CNN and recurrent neural network based approach of gait recognition with and without carried objects, Expert Syst. Appl., 205 (2022), 117730. https://doi.org/10.1016/j.eswa.2022.117730 doi: 10.1016/j.eswa.2022.117730 |
[9] | M. Chen, L.Yu, C. Zhi, R. Sun, S. Zhu, Z. Gao, et al., Improved faster R-CNN for fabric defect detection based on gabor filter with genetic algorithm optimization, Comput. Ind., 134 (2022), 103551. https://doi.org/10.1016/j.compind.2021.103551 doi: 10.1016/j.compind.2021.103551 |
[10] | D. Miao, W. Pedrycz, D. Ślezak, G. Peters, Q. Hu, R. Wang, Rough Sets and Knowledge Technology, in RSKT: International Conference on Rough Sets and Knowledge Technology, (2014), 364–375. https://doi.org/10.1007/978-3-319-11740-9 |
[11] | F. Cui, M. Ning, J. Shen, X. Shu, Automatic recognition and tracking of highway layer-interface using Faster R-CNN, J. Appl. Geophys., 196 (2022), 104477. https://doi.org/10.1016/j.jappgeo.2021.104477 doi: 10.1016/j.jappgeo.2021.104477 |
[12] | Y. Su, D. Li, X. Chen, Lung Nodule Detection based on Faster R-CNN Framework, Comput. Meth. Prog. Bio., 200 (2021), 105866. https://doi.org/10.1016/j.cmpb.2020.105866 doi: 10.1016/j.cmpb.2020.105866 |
[13] | M. D. Zeiler. and R.Fergus, Visualizing and Understanding Convolutional Networks, preprint, arXiv: 1311.2901 |
[14] | W. Yang, Z. Li, C. Wang, J. Li, A multi-task Faster R-CNN method for 3D vehicle detection based on a single image, Appl. Soft Comput. J., 95 (2020), 106533. https://doi.org/10.1016/j.asoc.2020.106533 doi: 10.1016/j.asoc.2020.106533 |
[15] | J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779–788. https://doi.org/10.1109/CVPR.2016.91 |
[16] | C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 1–9. https://doi.org/10.1109/CVPR.2015.7298594 |
[17] | J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6517–6525. https://doi.org/10.1109/CVPR.2017.690 |
[18] | J. Redmon, A. Farhadi, YOLOv3: An incremental improvement, preprint, arXiv: 1804.02767 |
[19] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[20] | A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao, YOLOv4: Optimal Speed and Accuracy of Object Detection, preprint, arXiv: 2004.10934 |
[21] | C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, I. H. Yeh, CSPNet: A new backbone that can enhance learning capability of CNN, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2020), 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203 |
[22] | Y. Lin, R. Cai, P. Lin, S. Cheng, A detection approach for bundled log ends using K-median clustering and improved YOLOv4-Tiny network, Comput. Electron. Agr., 194 (2022), 106700. https://doi.org/10.1016/j.compag.2022.106700 doi: 10.1016/j.compag.2022.106700 |
[23] | A. Kumar, A. Kalia, A. Kalia, ETL-YOLO v4: A face mask detection algorithm in era of COVID-19 pandemic, Optik., 259 (2022), 169051. https://doi.org/10.1016/j.ijleo.2022.169051 doi: 10.1016/j.ijleo.2022.169051 |
[24] | Y. Wang, G. Yan, Q. Meng, T. Yao, J. Han, B. Zhang, DSE-YOLO: Detail semantics enhancement YOLO for multi-stage strawberry detection, Comput. Electron. Agr., 198 (2022), 107057. https://doi.org/10.1016/j.compag.2022.107057 doi: 10.1016/j.compag.2022.107057 |
[25] | Y. Su, Q. Liu, W. Xie, P. Hu, YOLO-LOGO: A transformer-based YOLO segmentation model for breast mass detection and segmentation in digital mammograms, Comput. Meth. Prog. Bio., 221 (2022), 106903. https://doi.org/10.1016/j.cmpb.2022.106903 doi: 10.1016/j.cmpb.2022.106903 |
[26] | P. Wu, H. Li, N. Zeng, F. Li, FMD-Yolo: An efficient face mask detection method for COVID-19 prevention and control in public, Image Vision Comput., 117 (2022), 104341. https://doi.org/10.1016/j.imavis.2021.104341 doi: 10.1016/j.imavis.2021.104341 |
[27] | X. Wang, Q. Zhao, P. Jiang, Y. Zheng, L. Yuan, P. Yuan, LDS-YOLO: A lightweight small object detection method for dead trees from shelter forest, Comput. Electron. Agr., 198 (2022), 107035. https://doi.org/10.1016/j.compag.2022.107035 doi: 10.1016/j.compag.2022.107035 |
[28] | S. Zhao, S. Zhang, J. Lu, H. Wang, Y. Feng, C. Shi, et al., A lightweight dead fish detection method based on deformable convolution and YOLOV4, Comput. Electron. Agr., 198 (2022), 107098. https://doi.org/10.1016/j.compag.2022.107098 doi: 10.1016/j.compag.2022.107098 |
[29] | Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, Z. Liang, Apple detection during different growth stages in orchards using the improved YOLO-V3 model, Comput. Electron. Agr., 157 (2019), 417–426. https://doi.org/10.1016/j.compag.2019.01.012 doi: 10.1016/j.compag.2019.01.012 |
[30] | H. Mirhaji, M. Soleymani, A. Asakereh, S. Abdanan Mehdizadeh, Fruit detection and load estimation of an orange orchard using the YOLO models through simple approaches in different imaging and illumination conditions, Comput. Electron. Agr., 191 (2021), 106533. https://doi.org/10.1016/j.compag.2021.106533 doi: 10.1016/j.compag.2021.106533 |
[31] | M. O. Lawal, Tomato detection based on modified YOLOv3 framework. Sci. Rep., 1447 (2021). https://doi.org/10.1038/s41598-021-81216-5 doi: 10.1038/s41598-021-81216-5 |
[32] | A. M. Roy, R. Bose, J. Bhaduri, A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Computing and Applications, 34 (2022), 3895–3921. https://doi.org/10.1007/s00521-021-06651-x doi: 10.1007/s00521-021-06651-x |
[33] | A. M. Roy, J. Bhaduri, A Deep Learning Enabled Multi-Class Plant Disease Detection Model Based on Computer Vision. AI, 2(2021), 413-428. https://doi.org/10.3390/ai2030026 doi: 10.3390/ai2030026 |
[34] | A. M. Roy, J. Bhaduri, Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4, Comput. Electron. Agr., 193 (2022), 106694. https://doi.org/10.1016/j.compag.2022.106694 doi: 10.1016/j.compag.2022.106694 |
[35] | H. Li, C. Li, G. Li, L. Chen, A real-time table grape detection method based on improved YOLOv4-tiny network in complex background, Biosyst. Eng., 212 (2021), 347–359. https://doi.org/10.1016/j.biosystemseng.2021.11.011 doi: 10.1016/j.biosystemseng.2021.11.011 |
[36] | X. Li, J. D. Pan, F. P. Xie, J. P. Zeng, Q. Li, X. J. Huang, et al., Fast and accurate green pepper detection in complex backgrounds via an improved Yolov4-tiny model, Comput. Electron. Agr., 191 (2021), 106503. https://doi.org/10.1016/j.compag.2021.106503 doi: 10.1016/j.compag.2021.106503 |
[37] | C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, Scaled-YOLOv4: Scaling Cross Stage Partial Network, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 13024-13033. https://doi.org/10.1109/CVPR46437.2021.01283 |
[38] | Y. Zhang, J. Yu, Y. Chen, W. Yang, W. Zhang, Y. He, Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): An edge AI application, Comput. Electron. Agr., 192 (2022), 106586. https://doi.org/10.1016/j.compag.2021.106586 doi: 10.1016/j.compag.2021.106586 |
[39] | Y. Yao, L. Han, C. Du, X. Xu, X. Xu, Traffic sign detection algorithm based on improved YOLOv4-Tiny, Signal Process.-Image, 107 (2022), 116783. https://doi.org/10.1016/j.image.2022.116783 doi: 10.1016/j.image.2022.116783 |
[40] | G. Han, M. He, F. Zhao, Z. Xu, M.Zhang, L.Qin, Insulator detection and damage identification based on improved lightweight YOLOv4 network, Energy Rep., 7 (2021), 187–197. https://doi.org/10.1016/j.egyr.2021.10.039 doi: 10.1016/j.egyr.2021.10.039 |
[41] | Q. Zhang, X. Bao, B. Wu, X. Tu, Y. Jin, Y. Luo, et al., Water meter pointer reading recognition method based on target-key point detection, Flow Meas. Instrum., 81 (2021), 102012. https://doi.org/10.1016/j.flowmeasinst.2021.102012 doi: 10.1016/j.flowmeasinst.2021.102012 |
[42] | J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and-Excitation Networks, IEEE T. Pattern Anal., 42 (2020), 2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372 doi: 10.1109/TPAMI.2019.2913372 |
[43] | J. Wang, P. Lv, H. Wang, C. Shi, SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in Computed Tomography, Comput. Meth. Prog. Bio., 208 (2021), 106268. https://doi.org/10.1016/j.cmpb.2021.106268 doi: 10.1016/j.cmpb.2021.106268 |
[44] | H. Ma, G. Han, L. Peng, L. Zhu, J. Shu, Rock thin sections identification based on improved squeeze-and-Excitation Networks model, Comput. Geosci., 152 (2021), 104780. https://doi.org/10.1016/j.cageo.2021.104780 doi: 10.1016/j.cageo.2021.104780 |
[45] | M. M. Khan, M. S. Uddin, M. Z. Parvez, L. Nahar, A squeeze and excitation ResNeXt-based deep learning model for Bangla handwritten compound character recognition, J. King Saud Univ.-Com., 34 (2022), 3356–3364. https://doi.org/10.1016/j.jksuci.2021.01.021 doi: 10.1016/j.jksuci.2021.01.021 |
[46] | B. N. Naik, R. Malmathanraj, P. Palanisamy, Detection and classification of chilli leaf disease using a squeeze-and-excitation-based CNN model, Ecol. Inform., 69 (2022), 101663. https://doi.org/10.1016/j.ecoinf.2022.101663 doi: 10.1016/j.ecoinf.2022.101663 |
[47] | G. Huang, Z. Wan, X. Liu, J. Hui, Z. Wang, Z. Zhang, Ship detection based on squeeze excitation skip-connection path networks for optical remote sensing images, Neurocomputing, 332 (2019), 215–223. https://doi.org/10.1016/j.neucom.2018.12.050 doi: 10.1016/j.neucom.2018.12.050 |
[48] | T. Alsarhan, U. Ali, H. Lu, Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition, Comput. Vis. Image Und., 216 (2022), 103348. https://doi.org/10.1016/j.cviu.2021.103348 doi: 10.1016/j.cviu.2021.103348 |
[49] | P. S. Yee, K. M. Lim, C. P. Lee, DeepScene: Scene classification via convolutional neural network with spatial pyramid pooling, Expert Syst. Appl., 193 (2022), 116382. https://doi.org/10.1016/j.eswa.2021.116382 doi: 10.1016/j.eswa.2021.116382 |
[50] | E. Prasetyo, N. Suciati, C. Fatichah, Yolov4-tiny with wing convolution layer for detecting fish body part, Comput. Electron. Agr., 198 (2022), 107023. https://doi.org/10.1016/j.compag.2022.107023 doi: 10.1016/j.compag.2022.107023 |
[51] | J. Li, G. Xu, X. Cheng, Combining spatial pyramid pooling and long short-term memory network to predict PM2.5 concentration, Atmos. Pollut. Res., 13 (2022), 101309. https://doi.org/10.1016/j.apr.2021.101309 doi: 10.1016/j.apr.2021.101309 |
[52] | Z. Li, G. Zhou, T. Zhang, Interleaved group convolutions for multitemporal multisensor crop classification, Infrared Phys. Techn, 102 (2019), 103023. https://doi.org/10.1016/j.infrared.2019.103023 doi: 10.1016/j.infrared.2019.103023 |
[53] | A. Yang, B. Yang, Z. Ji, Y. Pang, L. Shao, Lightweight group convolutional network for single image super-resolution, Inf. Sci., 516 (2020), 220–233. https://doi.org/10.1016/j.ins.2019.12.057 doi: 10.1016/j.ins.2019.12.057 |
[54] | C. Tian, Y. Yuan, S. Zhang, C. Lin, W. Zuo, D. Zhang, Image super-resolution with an enhanced group convolutional neural network, Neural Networks, 153 (2022), 373-385. https://doi.org/10.1016/j.neunet.2022.06.009 doi: 10.1016/j.neunet.2022.06.009 |
[55] | A. Koirala, K. B. Walsh, Z. Wang, C. McCarthy, Deep learning–Method overview and review of use for fruit detection and yield estimation, Comput. Electron. Agr., 162 (2019), 219–234. https://doi.org/10.1016/j.compag.2019.04.017 doi: 10.1016/j.compag.2019.04.017 |
[56] | G. Moreira, S. A. Magalhães, T. Pinho, F. N. DosSantos, M. Cunha, Benchmark of Deep Learning and a Proposed HSV Colour Space Models for the Detection and Classification of Greenhouse Tomato, Agronomy, 12 (2022), 356. https://doi.org/10.3390/agronomy12020356 doi: 10.3390/agronomy12020356 |
[57] | M. L. Huang, Y. S. Wu, A dataset of fortunella margarita images for object detection of deep learning based methods, Data Brief, 38 (2021), 107293. https://doi.org/10.1016/j.dib.2021.107293 doi: 10.1016/j.dib.2021.107293 |
[58] | Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better learning for bounding box regression, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999 |
[59] | F. A. Kateb, M. M. Monowar, M. A. Hamid, A. Q. Ohi, M. F. Mridha, FruitDet: Attentive feature aggregation for real-time fruit detection in orchards, Agronomy, 11 (2021), 2440. https://doi.org/10.3390/agronomy11122440 doi: 10.3390/agronomy11122440 |