To address the limited resources of mobile devices and embedded platforms, we propose a lightweight pose recognition network named HR-LiteNet. Built upon a high-resolution architecture, the network incorporates depthwise separable convolutions, Ghost modules, and the Convolutional Block Attention Module to construct L_block and L_basic modules, aiming to reduce network parameters and computational complexity while maintaining high accuracy. Experimental results demonstrate that on the MPII validation dataset, HR-LiteNet achieves an accuracy of 83.643% while reducing the parameter count by approximately 26.58 M and lowering computational complexity by 8.04 GFLOPs compared to the HRNet network. Moreover, HR-LiteNet outperforms other lightweight models in terms of parameter count and computational requirements while maintaining high accuracy. This design provides a novel solution for pose recognition in resource-constrained environments, striking a balance between accuracy and lightweight demands.
Citation: Zhiming Cai, Liping Zhuang, Jin Chen, Jinhua Jiang. Lightweight high-performance pose recognition network: HR-LiteNet[J]. Electronic Research Archive, 2024, 32(2): 1145-1159. doi: 10.3934/era.2024055
To address the limited resources of mobile devices and embedded platforms, we propose a lightweight pose recognition network named HR-LiteNet. Built upon a high-resolution architecture, the network incorporates depthwise separable convolutions, Ghost modules, and the Convolutional Block Attention Module to construct L_block and L_basic modules, aiming to reduce network parameters and computational complexity while maintaining high accuracy. Experimental results demonstrate that on the MPII validation dataset, HR-LiteNet achieves an accuracy of 83.643% while reducing the parameter count by approximately 26.58 M and lowering computational complexity by 8.04 GFLOPs compared to the HRNet network. Moreover, HR-LiteNet outperforms other lightweight models in terms of parameter count and computational requirements while maintaining high accuracy. This design provides a novel solution for pose recognition in resource-constrained environments, striking a balance between accuracy and lightweight demands.
[1] | S. Wu, Z. Wang, B. Shen, J. Wang, D. Li, Human-computer interaction based on machine vision of a smart assembly workbench, Assem. Autom., 40 (2020), 475–482. https://doi.org/10.1108/AA-10-2018-0170 doi: 10.1108/AA-10-2018-0170 |
[2] | B. Debnath, M. O'brien, M. Yamaguchi, A. Behera, A review of computer vision-based approaches for physical rehabilitation and assessment, Multimedia Syst., 28 (2022), 209–239. https://doi.org/10.1007/s00530-021-00815-4 doi: 10.1007/s00530-021-00815-4 |
[3] | N. Lyons, Deep learning-based computer vision algorithms, immersive analytics and simulation software, and virtual reality modeling tools in digital twin-driven smart manufacturing, Econ. Manage. Financ. Mark., 17 (2022), 67–81. https://doi.org/10.22381/emfm17220224 doi: 10.22381/emfm17220224 |
[4] | Q. Kha, Q. Ho, N. Q. K. Le, Identifying snare proteins using an alignment-free method based on multiscan convolutional neural network and PSSM profiles, J. Chem. Inf. Model., 62 (2022), 4820–4826. https://doi.org/10.1021/acs.jcim.2c01034 doi: 10.1021/acs.jcim.2c01034 |
[5] | Z. Zhao, J. Gui, A. Yao, N. Q. K. Le, M. C. H. Chua, Improved prediction model of protein and peptide toxicity by integrating channel attention into a convolutional neural network and gated recurrent units, ACS Omega, 7 (2022), 40569–40577. https://doi.org/10.1021/acsomega.2c05881 doi: 10.1021/acsomega.2c05881 |
[6] | Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neural networks: analysis, applications, and prospects, IEEE Trans. Neural Networks Learn. Syst., 33 (2022), 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827 doi: 10.1109/TNNLS.2021.3084827 |
[7] | C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, Z. Ding, 3D human pose estimation with spatial and temporal transformers, preprint, arXiv: 2103.10455. |
[8] | C. Li, G. H. Lee, Generating multiple hypotheses for 3D human pose estimation with mixture density network, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 9879–9887. https://doi.org/10.1109/CVPR.2019.01012 |
[9] | A. Toshev, C. Szegedy, Deeppose: Human pose estimation via deep neural networks, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, (2014), 1653–1660. https://doi.org/10.1109/CVPR.2014.214 |
[10] | J. Tompson, A. Jain, Y. LeCun, C. Bregler, Joint training of a convolutional network and a graphical model for human pose estimation, preprint, arXiv: 1406.2984. |
[11] | S. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh, Convolutional pose machines, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 4724–4732. https://doi.org/10.1109/CVPR.2016.511 |
[12] | Y. Chen, Y. Tian, M. He, Monocular human pose estimation: A survey of deep learning-based methods, Comput. Vision Image Understanding, 192 (2020), 102897. https://doi.org/10.1016/j.cviu.2019.102897 doi: 10.1016/j.cviu.2019.102897 |
[13] | C. Zheng, W. Wu, C. Chen, T. Yang, S. Zhu, J. Shen, et al., Deep learning-based human pose estimation: A survey, ACM Comput. Surv., 56 (2023), 1–37. https://doi.org/10.1145/3603618 doi: 10.1145/3603618 |
[14] | G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, et al., Towards accurate multi-person pose estimation in the wild, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 4903–4911. https://doi.org/10.1109/CVPR.2017.395 |
[15] | S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1137–1149. https://doi.org/10.1109/tpami.2016.2577031 doi: 10.1109/tpami.2016.2577031 |
[16] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[17] | K. Sun, B. Xiao, D. Liu, J. Wang, Deep high-resolution representation learning for human pose estimation, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 5693–5703. https://doi.org/10.1109/CVPR.2019.00584 |
[18] | L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, et al., Deepcut: Joint subset partition and labeling for multi person pose estimation, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 4929–4937. https://doi.org/10.1109/CVPR.2016.533 |
[19] | Z. Cao, T. Simon, S. Wei, Y. Sheikh, Realtime multi-person 2D pose estimation using part affinity fields, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1302–1310. https://doi.org/10.1109/CVPR.2017.143 |
[20] | F. Zhang, X. Zhu, M. Ye, Fast human pose estimation, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 3512–3521. https://doi.org/10.1109/CVPR.2019.00363 |
[21] | D. Xu, R. Zhang, L. Guo, C. Feng, S. Gao, LDNet: Lightweight dynamic convolution network for human pose estimation, Adv. Eng. Inf., 54 (2022), 101785. https://doi.org/10.1016/j.aei.2022.101785 doi: 10.1016/j.aei.2022.101785 |
[22] | C. Yu, B. Xiao, C. Gao, L. Yuan, L. Zhang, N. Sang, et al., Lite-HRNet: A lightweight high-resolution network, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 10435–10445. https://doi.org/10.1109/CVPR46437.2021.01030 |
[23] | S. Woo, J. Park, J. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in European Conference on Computer Vision, 11211 (2018), 3–19. https://doi.org/10.1007/978-3-030-01234-2_1 |
[24] | A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., MobileNets: Efficient convolutional neural networks for mobile vision applications, preprint, arXiv: 1704.04861. |
[25] | K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, Ghostnet: More features from cheap operations, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 1577–1586. https://doi.org/10.1109/CVPR42600.2020.00165 |
[26] | M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, 2D human pose estimation: New benchmark and state of the art analysis, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, (2014), 3686–3693. https://doi.org/10.1109/CVPR.2014.471 |
[27] | N. Ma, X. Zhang, H. Zheng, J. Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, in European Conference on Computer Vision, 11218 (2018), 122–138. https://doi.org/10.1007/978-3-030-01264-9_8 |
[28] | M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 4510–4520. https://doi.org/10.1109/CVPR.2018.00474 |
[29] | M Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, preprint, arXiv: 1905.11946. |