An end-to-end stereo matching algorithm based on improved convolutional neural network

Yan Liu; Bingxue Lv; Yuheng Wang; Wei Huang; Yan Liu; Bingxue Lv; Yuheng Wang; Wei Huang

doi:10.3934/mbe.2020396

Mathematical Biosciences and Engineering

2020, Volume 17, Issue 6: 7787-7803. doi: 10.3934/mbe.2020396

Previous Article Next Article

Research article Special Issues

An end-to-end stereo matching algorithm based on improved convolutional neural network

College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 45000, China

Received: 28 July 2020 Accepted: 19 October 2020 Published: 06 November 2020

Deep end-to-end learning based stereo matching methods have achieved great success as witnessed by the leaderboards across different benchmarking datasets. Depth information in stereo vision systems are obtained by a dense and accurate disparity map, which is computed by a robust stereo matching algorithm. However, previous works adopt network layer with the same size to train the feature parameters and get an unsatisfactory efficiency, which cannot be satisfied for the real scenarios by existing methods. In this paper, we present an end-to-end stereo matching algorithm based on "downsize" convolutional neural network (CNN) for autonomous driving scenarios. Firstly, the road images are feed into the designed CNN to get the depth information. And then the "downsize" full-connection layer combined with subsequent network optimization is employed to improve the accuracy of the algorithm. Finally, the improved loss function is utilized to approximate the similarity of positive and negative samples in a more relaxed constraint to improve the matching effect of the output. The loss function error of the proposed method for KITTI 2012 and KITTI 2015 datasets are reduced to 2.62 and 3.26% respectively, which also reduces the runtime of the proposed algorithm. Experimental results illustrate that the proposed end-to-end algorithm can obtain a dense disparity map and the corresponding depth information can be used for the binocular vision system in autonomous driving scenarios. In addition, our method also achieves better performance when the size of the network is compressed compared with previous methods.
- image sensor,
- stereo matching,
- binocular vision,
- convolutional neural network
Citation: Yan Liu, Bingxue Lv, Yuheng Wang, Wei Huang. An end-to-end stereo matching algorithm based on improved convolutional neural network[J]. Mathematical Biosciences and Engineering, 2020, 17(6): 7787-7803. doi: 10.3934/mbe.2020396

Related Papers:

Abstract

Deep end-to-end learning based stereo matching methods have achieved great success as witnessed by the leaderboards across different benchmarking datasets. Depth information in stereo vision systems are obtained by a dense and accurate disparity map, which is computed by a robust stereo matching algorithm. However, previous works adopt network layer with the same size to train the feature parameters and get an unsatisfactory efficiency, which cannot be satisfied for the real scenarios by existing methods. In this paper, we present an end-to-end stereo matching algorithm based on "downsize" convolutional neural network (CNN) for autonomous driving scenarios. Firstly, the road images are feed into the designed CNN to get the depth information. And then the "downsize" full-connection layer combined with subsequent network optimization is employed to improve the accuracy of the algorithm. Finally, the improved loss function is utilized to approximate the similarity of positive and negative samples in a more relaxed constraint to improve the matching effect of the output. The loss function error of the proposed method for KITTI 2012 and KITTI 2015 datasets are reduced to 2.62 and 3.26% respectively, which also reduces the runtime of the proposed algorithm. Experimental results illustrate that the proposed end-to-end algorithm can obtain a dense disparity map and the corresponding depth information can be used for the binocular vision system in autonomous driving scenarios. In addition, our method also achieves better performance when the size of the network is compressed compared with previous methods.

References

[1]	S. He, Z. Li, Y. Tang, Z. Liao, F. Li, S. Lim, Parameters compressing in deep learning, Comput. Mater. Continua, 62 (2020), 321-336. doi: 10.32604/cmc.2020.06130
[2]	D. Zeng, Y. Dai, F. Li, J. Wang, A. K. Sangaiah, Aspect based sentiment analysis by a linguistically regularized CNN with gated mechanism, J. Intell. Fuzzy Syst., 36 (2019), 1-10. doi: 10.3233/JIFS-17063
[3]	R. Meng, S. G. Rice, J. Wang, X. Sun, A fusion steganographic algorithm based on faster R-CNN, Comput. Mater. Continua, 55 (2018), 1-16.
[4]	S. Zhou, M. Ke, P. Luo, Multi-camera transfer GAN for person re-identification, J. Visual Commun. Image Repres., 59 (2019), 393-400. doi: 10.1016/j.jvcir.2019.01.029
[5]	Y. Song, G. Yang, H. Xie, D. Zhang, X. Sun, Residual domain dictionary learning for compressed sensing video recovery, Multimedia Tools Appl., 76 (2017), 10083-10096. doi: 10.1007/s11042-016-3599-4
[6]	W. Huang, Y. Xu, X. Hu, Compressive hyperspectral image reconstruction based on spatial‑spectral residual dense network, IEEE Geoence Remote Sens. Lett., 17 (2020), 884-888.
[7]	J. Zhang, X. Jin, J. Sun, J. Wang, A. K. Sangaiah, Spatial and semantic convolutional features for robust visual object tracking, Multimedia Tools Appl., 79 (2020), 15095-15115. doi: 10.1007/s11042-018-6562-8
[8]	Z. Lu, B. Xu, L. Sun, T. Zhan, S. Tang, 3D channel and spatial attention based multi-scale spatial spectral residual network for hyperspectral image classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 13 (2020), 1-1. doi: 10.1109/JSTARS.2020.2968831
[9]	L. Sun, C. Ma, Y. Chen, Y. Zheng, H. J. Shim, Z. Wu, Low rank component induced spatial-spectral kernel method for hyperspectral image classification, IEEE Trans. Circuits Syst. Video Technol., 30 (2019), 1-1.
[10]	W. Huang, Y. Huang, H. Wang, Local binary patterns and superpixel-based multiple kernels for hyperspectral image classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 13 (2020), 4550-4563. doi: 10.1109/JSTARS.2020.3014492
[11]	N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, et al., The Limits and potentials of deep learning for robotics, Int. J. Rob. Res., 37 (2018), 405-420. doi: 10.1177/0278364918770733
[12]	G. Giordano, M. Segata, F. Blanchini, R. L. Cigno, O. Altintas, C. Casetti, et al., A joint network/control design for cooperative automatic driving, IEEE Access, 2017.
[13]	S. Belciug, F. Gorunescu, Error-correction learning for artificial neural networks using the bayesian paradigm. Application to automated medical diagnosis, J. Biomed. Inf., 52 (2014), 329-337. doi: 10.1016/j.jbi.2014.07.013
[14]	B. Sayed, I. Traoré, A. Abdelhalim, IF-Transpiler: inlining of hybrid flow-sensitive security monitor for javascript, Comput. Secur., 75 (2018), S0167404818300397.
[15]	D. I. D. Han, M. C. T. Dieck, T. Jung, Augmented Reality Smart Glasses (ARSG) visitor adoption in cultural tourism, Leisure Stud., 38 (2019), 1-16. doi: 10.1080/02614367.2018.1515241
[16]	N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, The limits and potentials of deep learning for robotics, Int. J. Rob. Res., 37 (2018), 405-420. doi: 10.1177/0278364918770733
[17]	L. Matthies, R. Brockers, Y. Kuwata, S. Weiss, Stereo vsion-based obstacle avoidance for micro air vehicles using disparity space, IEEE Int. Conf. Rob. Automation, (2014), 3242-3249
[18]	S. O. Escolano, C. Rhemann, S. R. Fanello, W. Chang, A. Kowdle, Y. Degtyarev, et al., Holoportation: virtual 3D teleportation in real-time, User Interface Software Technol. (2016), 741-754.
[19]	C. Guindel, D. Martín, J. M. Armingol, Traffic scene awareness for intelligent vehicles using ConvNets and stereo vision, Rob. Auton. Syst., 112 (2019), 109-122. doi: 10.1016/j.robot.2018.11.010
[20]	Y. J. Lee, M. W. Park, 3D tracking of multiple onsite workers based on stereo vision, Autom.Constr., 98 (2019), 146-159. doi: 10.1016/j.autcon.2018.11.017
[21]	V. Gabelica, S. Livet, F. Rosu, Optimizing native ion mobility Q-TOF in helium and nitrogen for very fragile noncovalent interactions, J. Am. Soc. Mass Spectrom., 29 (2018), 2189-2198. doi: 10.1007/s13361-018-2029-4
[22]	N. Aslani, G. Janbabaei, M. Abastabar, J. F Meis, M. Babaeian, S. Khodavaisy, et al., Identification of uncommon oral yeasts from cancer patients by MALDI-TOF mass spectrometry, Bmc Infect. Dis., 18 (2018), 24. doi: 10.1186/s12879-017-2916-5
[23]	Z. Song, High-speed 3D shape measurement with structured light methods: A review, Opt. Lasers Eng., 106 (2018), 119-131.
[24]	C. Jiang, B. Lim, S. Zhang, Three-dimensional shape measurement using a structured light system with dual projectors, Appl. Opt., 57 (2018), 3983. doi: 10.1364/AO.57.003983
[25]	L. Gang, H. Song, L. Chan, Matching algorithm and parallax extraction based on binocular stereo vision, Smart Innovations Commun. Comput. Sci., (2019), 347-355
[26]	A. L. Webber, J. M. Wood, B. Thompson, E. E Birch, From suppression to stereoacuity: a composite binocular function score for clinical research, Ophthalmic Physiol. Op., 39 (2019), 53-62. doi: 10.1111/opo.12599
[27]	Q. Xie, Q. Long, L. Zhang, Z. Sun, A robust real-time computing-based environment sensing system for intelligent vehicle, Comput. Vision Pattern Recognit., 2020.
[28]	G. Zhang, D. Zhu, W. Shi, X.Ye, J. Li, X.Zhang, et al., Multi-dimensional residual dense attention network for stereo matching, IEEE Access, 7 (2019), 1-1. doi: 10.1109/ACCESS.2018.2876146
[29]	H. Y. Lai, Y. H. Tsai, W. C. Chiu, Bridging stereo matching and optical flow via spatiotemporal correspondence, IEEE Conf. Comput. Vision Pattern Recognit., (2019), 1890-1899.
[30]	M. Ye, J. Li, A. J. Ma, L. Zheng, P. C. Yuen, Dynamic graph co-matching for unsupervised video-based person re-identification, IEEE Trans. Image Process., 28 (2019), 1-1. doi: 10.1109/TIP.2018.2877829
[31]	C. Le, L. Xin, Sparse3D: A new global model for matching sparse RGB-D dataset with small inter-frame overlap, Comput.-Aided Des., 102 (2018), S0010448518302276.
[32]	A. Klaus, M. Sormann, K. Karner, Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure, IEEE Int. Conf. Pattern Recognit., 3 (2006), 15-18.
[33]	D. Wang, H. Liu, X. Cheng, A miniature binocular endoscope with local feature matching and stereo matching for 3D measurement and 3D reconstruction, Sensing, 18 (2018), 2243.
[34]	C. L. Mills, R. Garg, J. S. Lee, Functional classification of protein structures by local structure matching in graph representation, Protein Sci. Publ. Protein Soc., 27 (2018), 1125-1135. doi: 10.1002/pro.3416
[35]	Y. Anisimov, O. Wasenmüller, D. Stricker, Rapid light field depth estimation with semi-global matching, Comput. Vision Pattern Recognit., 2019.
[36]	T. Y. Chuang, H. W. Ting, J. J. Jaw, Dense stereo matching with edge-constrained penalty tuning, IEEE Geosci. Remote Sens. Lett., 15 (2018), 1-5. doi: 10.1109/LGRS.2017.2781679
[37]	A. Seki, M. Pollefeys, Sgm-nets: Ssemi-global matching with neural networks, IEEE Conf. Comput. Vision Pattern Recognit., (2017), 231-240.
[38]	J. bontar, Y. Lecun, Computing the stereo matching cost with a convolutional neural network, IEEE Conf. Comput. Vision Pattern Recognit., 2015.
[39]	J. Žbontar, Y. Lecun, Stereo matching by training a convolutional neural network to compare image patches, Comput. Vision Pattern Recognit., 2015.
[40]	N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A.Dosovitskiy, et al., A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, IEEE Comput. Vision Pattern Recognit., 2016.
[41]	Y. Feng, Z. Liang, H. Liu, Efficient deep learning for stereo matching with larger image patches, Int. Congr. Image Signal Process. BioMed. Eng. Inf., 2017.
[42]	A. Shaked, L. Wolf, Improved stereo matching with constant highway networks and reflective confidence learning, Comput. Vision Pattern Recognit., (2017), 6901-6910.
[43]	A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, et al., End-to-End learning of geometry and context for deep stereo regression, Comput. Vision Pattern Recognit., 2017.
[44]	F. Guney, A. Geiger, Displets: Resolving stereo ambiguities using object knowledge, IEEE Conf. Comput. Vision Pattern Recognit., 2015.
[45]	P. Fischer, A. Dosovitskiy, E. Ilg, P. Häusser, C. Hazırbaş, V. Golkov, et al., FlowNet: Learning optical flow with convolutional networks, Deep Learn. Inverse Prob., 2015.
[46]	E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, T. Brox, FlowNet 2.0: Evolution of optical flow estimation with deep networks, IEEE Conf. Comput. Vision Pattern Recognit., 2017.
[47]	X. Guo, K. Yang, W. Yang, X. Wang, H. Li, Group-wise correlation stereo network, Comput. Vision Pattern Recognit., 2019.
[48]	J. Pang, W. Sun, J. S. Ren, C. Yang, Q. Yan, Cascade residual learning: A two-stage convolutional neural network for stereo matching, Comput. Vision Pattern Recognit., 2017.
[49]	Z. Liang, Y. Guo, Y. Feng, W. Chen, L. Qiao, L. Zhou, et al., Stereo matching using multi-level cost volume and multi-scale feature constancy, IEEE Trans. Pattern Anal. Mach. Intell., 99 (2019), 1-1.
[50]	J. Schmidt-Hieber, Nonparametric regression using deep neural networks with ReLU activation function, Stat. Theory, 48 (2017), 1875-1897.
[51]	S. Woo, C. L. Lee, Decision boundary formation of deep convolution networks with ReLU, IEEE Comput. Vision Pattern Recognit., 2018.
[52]	E. Özarslan, C. Yolcu, M. Herberthson, H. Knutsson, C. Westin, Influence of the size and curvedness of neural projections on the orientationally averaged diffusion MR signal, Front. Phys., 6 (2018), 17. doi: 10.3389/fphy.2018.00017
[53]	Z. Chen, L. Deng, B. Wang, G. Li, Y. Xie, A comprehensive and modularized statistical framework for gradient norm equality in deep neural networks, IEEE Tran. Pattern Anal. Mach. Intell., 2020.
[54]	A. Geiger, M. Roser, R. Urtasun, Efficient large-scale stereo matching, Comput. Vision, (2010), 25-38.
[55]	H. Hirschmuller, D. Scharstein, Evaluation of stereo matching costs on images with radiometric differences, IEEE Trans. Pattern Anal. Mach. Intell., 31 (2009), 1582-1599. doi: 10.1109/TPAMI.2008.221
[56]	K. Yamaguchi, D. McAllester, R. Urtasun, Efficient joint segmentation, occlusion labeling, stereo and flow estimation, Comput. Vision - ECCV, (2014), 756-771.
[57]	S. Ren, K. He, R. Girshick, J. Sun, Faster R-Cnn: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2015), 91-99.

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)