Object detection is a fundamental aspect of computer vision, with numerous generic object detectors proposed by various researchers. The proposed work presents a novel single-stage rotation detector that can detect oriented and multi-scale objects accurately from diverse scenarios. This detector addresses the challenges faced by current rotation detectors, such as the detection of arbitrary orientations, objects that are densely arranged, and the issue of loss discontinuity. First, the detector also adopts a progressive regression form (coarse-to-fine-grained approach) that uses both horizontal anchors (speed and higher recall) and rotating anchors (oriented objects) in cluttered backgrounds. Second, the proposed detector includes a feature refinement module that helps minimize the problems related to feature angulation and reduces the number of bounding boxes generated. Finally, to address the issue of loss discontinuity, the proposed detector utilizes a newly formulated adjustable loss function that can be extended to both single-stage and two-stage detectors. The proposed detector shows outstanding performance on benchmark datasets and significantly outperforms other state-of-the-art methods in terms of speed and accuracy.
Citation: Deepika Roselind Johnson, Rhymend Uthariaraj Vaidhyanathan. Detection and localization of multi-scale and oriented objects using an enhanced feature refinement algorithm[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 15219-15243. doi: 10.3934/mbe.2023681
Object detection is a fundamental aspect of computer vision, with numerous generic object detectors proposed by various researchers. The proposed work presents a novel single-stage rotation detector that can detect oriented and multi-scale objects accurately from diverse scenarios. This detector addresses the challenges faced by current rotation detectors, such as the detection of arbitrary orientations, objects that are densely arranged, and the issue of loss discontinuity. First, the detector also adopts a progressive regression form (coarse-to-fine-grained approach) that uses both horizontal anchors (speed and higher recall) and rotating anchors (oriented objects) in cluttered backgrounds. Second, the proposed detector includes a feature refinement module that helps minimize the problems related to feature angulation and reduces the number of bounding boxes generated. Finally, to address the issue of loss discontinuity, the proposed detector utilizes a newly formulated adjustable loss function that can be extended to both single-stage and two-stage detectors. The proposed detector shows outstanding performance on benchmark datasets and significantly outperforms other state-of-the-art methods in terms of speed and accuracy.
[1] | X. Chen, J. Yu, S. Kong, Z. Wu, L. Wen, Dual refinement networks for accurate and fast object detection in real-world scenes, preprint, arXiv: 1807.08638. https://doi.org/10.48550/arXiv.1807.08638 |
[2] | G. Zhang, S. Lu, W. Zhang, CAD-Net: A context-aware detection network for objects in remote sensing imagery, IEEE Trans. Geosci. Remote Sensing, 57 (2019), 10015–10024. https://doi.org/10.1109/TGRS.2019.2930982 doi: 10.1109/TGRS.2019.2930982 |
[3] | H. D. Jang, S. Woo, P. Benz, J. Park, I. S. Kweon, Propose-and-attend single shot detector, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2020), 815–824. |
[4] | C. Chi, S. Zhang, J. Xing, Z. Lei, S. Z. Li, X. Zou, Selective refinement network for high performance face detection, in Proceedings of the AAAI conference on artificial intelligence, 33 (2019), 8231–8238. https://doi.org/10.1609/aaai.v33i01.33018231 |
[5] | K. Fu, Z. Chang, Y. Zhang, G. Xu, K. Zhang, X. Sun, Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images, ISPRS J. Photogramm. Remote Sensing, 161 (2020), 294–308. https://doi.org/10.1016/j.isprsjprs.2020.01.025 doi: 10.1016/j.isprsjprs.2020.01.025 |
[6] | W. Qian, X. Yang, S. Peng, J. Yan, Y. Guo, Learning modulated loss for rotated object detection, in Proceedings of the AAAI conference on artificial intelligence, 35 (2021), 2458–2466. https://doi.org/10.1609/aaai.v35i3.16347 |
[7] | R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014), 580–587. https://doi.org/10.1109/CVPR.2014.81 |
[8] | R. Girshick, Fast R-CNN, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2015), 1440–1448. https://doi.org/10.1109/ICCV.2015.169 |
[9] | J. Dai, Y. Li, K. He, J. Sun, R-FCN: Object detection via region-based fully convolutional networks, Adv. Neural Inf. Process. Syst., 2016 (2016), 29. |
[10] | S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst., 39 (2015), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031 |
[11] | P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: Integrated recognition, localization and detection using convolutional networks, preprint, arXiv: 1312.6229. https://doi.org/10.48550/arXiv.1312.6229 |
[12] | W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, et al., Single shot multibox detector, in Computer Vision–ECCV 2016: 14th European Conference, (2016), 21–37. https://doi.org/10.1007/978-3-319-46448-0_2 |
[13] | J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 7263–7271. https://doi.org/10.1109/CVPR.2017.690 |
[14] | T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 2117–2125. |
[15] | C. Y. Fu, W. Liu, A. Ranga, A. Tyagi, A. C. Berg, DSSD: Deconvolutional single shot detector, preprint, arXiv: 1701.06659. https://doi.org/10.48550/arXiv.1701.06659 |
[16] | Z. Cai, N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 6154–6162. https://doi.org/10.1109/CVPR.2018.00644 |
[17] | K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, et al., Hybrid task cascade for instance segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 4974–4983. https://doi.org/10.1109/CVPR.2019.00511 |
[18] | L. Hou, K. Lu, J. Xue, L. Hao, Cascade detector with feature fusion for arbitrary-oriented objects in remote sensing images, in 2020 IEEE International Conference on Multimedia and Expo (ICME), (2020), 1–6. https://doi.org/10.1109/ICME46284.2020.9102807 |
[19] | Z. Tian, C. Shen, H. Chen, T. He, FCOS: Fully convolutional one-stage object detection, in IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 9626–9635. https://doi.org/10.1109/ICCV.2019.00972 |
[20] | T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, J. Shi, FoveaBox: Beyond anchor-based object detection, IEEE Trans. Image Process., 29 (2020), 7389–7398. https://doi.org/10.1109/TIP.2020.3002345 doi: 10.1109/TIP.2020.3002345 |
[21] | Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, Reppoints: Point set representation for object detection, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 9657–9666. https://doi.org/10.1109/ICCV.2019.00975 |
[22] | J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, et al., Arbitrary-oriented scene text detection via rotation proposals, IEEE Trans. Multimedia, 20 (2017), 3111–3122. https://doi.org/10.1109/TMM.2018.2818020 doi: 10.1109/TMM.2018.2818020 |
[23] | X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, et al., SCRDet: Towards more robust detection for small, cluttered and rotated objects, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 8232–8241. |
[24] | Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, et al., R2CNN: Rotational region CNN for orientation robust scene text detection, preprint, arXiv: 1706.09579. https://doi.org/10.48550/arXiv.1706.09579 |
[25] | M. Liao, B. Shi, X. Bai, TextBoxes++: A single-shot oriented scene text detector, IEEE Trans. Image Process., 27 (2018), 3676–3690. https://doi.org/10.1109/TIP.2018.2825107 doi: 10.1109/TIP.2018.2825107 |
[26] | S. M. Azimi, E. Vig, R. Bahmanyar, M. Körner, P. Reinartz, Towards multi-class object detection in unconstrained remote sensing imagery, in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, (2019), 150–165. https://doi.org/10.1007/978-3-030-20893-6_10 |
[27] | H. Rezatofighi, N. Tsoi, J. Y. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A metric and a loss for bounding box regression, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 658–666. https://doi.org/10.1109/CVPR.2019.00075 |
[28] | H. Wei, Y. Zhang, Z. Chang, H. Li, H. Wang, X. Sun, Oriented objects as pairs of middle lines, preprint, arXiv: 1912.10694. https://doi.org/10.48550/arXiv.1912.10694 |
[29] | S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, Single-shot refinement neural network for object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 4203–4212. https://doi.org/10.1109/CVPR.2018.00442 |
[30] | X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, et al., Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks, Remote Sensing, 10 (2018), 132. https://doi.org/10.3390/rs10010132 doi: 10.3390/rs10010132 |
[31] | W. He, X. Y. Zhang, F. Yin, C. L. Liu, Deep direct regression for multi-oriented scene text detection, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017), 745–753. |
[32] | M. Liao, Z. Zhu, B. Shi, G. Xia, X. Bai, Rotation-sensitive regression for oriented scene text detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 5909–5918. https://doi.org/10.1109/CVPR.2018.00619 |
[33] | Y. Xu, M. Fu, Q. Wang, Y. Wang, K. Chen, G. S. Xia, et al., Gliding vertex on the horizontal bounding box for multi-oriented object detection, IEEE Trans. Pattern Anal. Mach. Intell., 43 (2019), 1452–1459. https://doi.org/10.1109/TPAMI.2020.2974745 doi: 10.1109/TPAMI.2020.2974745 |
[34] | J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, ImageNet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248–255. https://doi.org/10.1109/CVPR.2009.5206848 |
[35] | J. C. Niebles, C. W. Chen, F. F. Li, Modeling temporal structure of decomposable motion segments for activity classification, in Computer Vision–ECCV 2010, (2010), 392–405. https://doi.org/10.1007/978-3-642-15552-9_29 |
[36] | S. M. Safdarnejad, X. Liu, L. Udpa, B. Andrus, J. Wood, D. Craven, Sports videos in the wild (SVW): A video dataset for sports analysis, in 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1 (2015), 1–7. https://doi.org/10.1109/FG.2015.7163105 |
[37] | C. Li, C. Xu, Z. Cui, D. Wang, T. Zhang, J. Yang, Feature-attentioned object detection in remote sensing imagery, in 2019 IEEE International Conference on Image Processing (ICIP), (2019), 3886–3890. https://doi.org/10.1109/ICIP.2019.8803521 |
[38] | H. Zhang, H. Chang, B. Ma, S. Shan, X. Chen, Cascade RetinaNet: Maintaining consistency for single-stage object detection, preprint, arXiv: 1907.06881. https://doi.org/10.48550/arXiv.1907.06881 |
[39] | X. Pan, Y. Ren, K. Sheng, W. Dong, H. Yuan, X. Guo, et al., Dynamic refinement network for oriented and densely packed object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 11207–11216. https://doi.org/10.1109/CVPR42600.2020.01122 |
[40] | L. Liu, Z. Pan, G. Chen, Y. Gao, Drbox family: A group of object detection techniques for remote sensing images, in IGARSS 2019 IEEE International Geoscience and Remote Sensing Symposium, (2019), 1446–1449. |
[41] | Y. Lin, P. Feng, J. Guan, W. Wang, J. Chambers, IENet: Interacting embranchment one stage anchor free detector for orientation aerial object detection, preprint, arXiv: 1912.00969. https://doi.org/10.48550/arXiv.1912.00969 |
[42] | L. Zhou, H. Wei, H. Li, W. Zhao, Y. Zhang, Objects detection for remote sensing images based on polar coordinates, preprint, arXiv: 2001.02988. |
[43] | Z. Chen, K. Chen, W. Lin, J. See, H. Yu, Y. Ke, et al., PIoU loss: Towards accurate oriented object detection in complex environments, in Computer Vision–ECCV 2020: 16th European Conference, (2020), 195–211. https://doi.org/10.1007/978-3-030-58558-7_12 |
[44] | J. Ding, N. Xue, Y. Long, G. S. Xia, Q. Lu, Learning RoI transformer for oriented object detection in aerial images, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 2849–2858. https://doi.org/10.1109/CVPR.2019.00296 |
[45] | J. Wang, W. Yang, H. C. Li, H. Zhang, G. S. Xia, Learning center probability map for detecting objects in aerial images, IEEE Trans. Geosci. Remote Sensing, 59 (2020), 4307–4323. https://doi.org/10.1109/TGRS.2020.3010051 doi: 10.1109/TGRS.2020.3010051 |
[46] | H. Liu, L. Jiao, R. Wang, C. Xie, J. Du, H. Chen, et al., WSRD-Net: A convolutional neural network-based arbitrary-oriented wheat stripe rust detection method, Front. Plant Sci., 13 (2022), 876069. https://doi.org/10.3389/fpls.2022.876069 doi: 10.3389/fpls.2022.876069 |
[47] | T. Zhang, Y. Zhuang, G. Wang, S. Dong, H. Chen, L. Li, Multiscale semantic fusion-guided fractal convolutional object detection network for optical remote sensing imagery, IEEE Trans. Geosci. Remote Sensing, 60 (2022), 1–20. https://doi.org/10.1109/TGRS.2021.3108476 doi: 10.1109/TGRS.2021.3108476 |
[48] | P. Wu, Z. Wang, B. Zheng, H. Li, F. E. Alsaadi, N. Zeng, AGGN: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion, Comput. Biol. Med., 152 (2023), 106457. https://doi.org/10.1016/j.compbiomed.2022.106457 doi: 10.1016/j.compbiomed.2022.106457 |
[49] | N. Zeng, P. Wu, Z. Wang, H. Li, W. Liu, X. Liu, A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection, IEEE Trans. Instrum. Meas., 71 (2022), 1–14. https://doi.org/10.1109/TIM.2022.3153997 doi: 10.1109/TIM.2022.3153997 |
[50] | H. Li, N. Zeng, P. Wu, K. Clawson, Cov-Net: A computer-aided diagnosis method for recognizing COVID-19 from chest X-ray images via machine vision, Exp. Syst. Appl., 207 (2022), 118029. https://doi.org/10.1016/j.eswa.2022.118029 doi: 10.1016/j.eswa.2022.118029 |
[51] | D. R. Johnson, V. R. Uthariaraj, A novel parameter initialization technique using RBM-NN for human action recognition, Comput. Intell. Neurosci., 2020 (2020). https://doi.org/10.1155/2020/8852404 doi: 10.1155/2020/8852404 |
[52] | G. S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, et al., DOTA: A A large-scale dataset for object detection in aerial images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 3974–3983. |
[53] | W. Yu, B. Lei, M. K. Ng, A. C. Cheung, Y. Shen, S. Wang, Tensorizing GAN with high-order pooling for Alzheimer's disease assessment, IEEE Trans. Neural Networks Learn. Syst., 33 (2020), 4945–4959. https://doi.org/10.1109/TNNLS.2021.3063516 doi: 10.1109/TNNLS.2021.3063516 |
[54] | R. Yang, Y. Yu, Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis, Front. Oncol., 11 (2021), 638182. https://doi.org/10.3389/fonc.2021.638182 doi: 10.3389/fonc.2021.638182 |
[55] | S. Inthiyaz, S. K. H. Ahammad, A. S. Krishna, V. Bhargavi, D. Govardhan, V. Rajesh, YOLO (YOU ONLY LOOK ONCE) making object detection work in medical imaging on convolution detection system, Int. J. Pharm. Res., 12 (2020), 312–326. https://doi.org/10.31838/ijpr/2020.12.02.0003 doi: 10.31838/ijpr/2020.12.02.0003 |
[56] | A. Kaur, Y. Singh, N. Neeru, L. Kaur, A. Singh, A survey on deep learning approaches to medical images and a systematic look up into real-time object detection, Arch. Comput. Methods Eng., 29 (2021), 2071–2111. https://doi.org/10.1007/s11831-021-09649-9 doi: 10.1007/s11831-021-09649-9 |
[57] | S. Jaiswal, R. Yadav, J. D. Roselind, Emotion detection using natural language process, Int. J. Sci. Methods Intell. Eng. Networks, 2023 (2023). |