International Journal of Control, Automation and Systems 2019; 17(10): 2586-2596
Published online July 6, 2019
https://doi.org/10.1007/s12555-018-0926-8
© The International Journal of Control, Automation, and Systems
We take formulate structure from motion as a learning problem, and propose an end-to-end learning framework to calculate the image depth, optical flow, and the camera motion. This framework is composed of multiple encoder-ecoder networks. The key part of the network structure is the FlowNet, which can improve the accuracy of the estimated camera ego-motion and depth. As with recent studies, we use an end-to-end learning approach with multi-view synthesis as a variety of supervision, and proposes multi-view consistency losses to constrain both depth and camera ego-motion, requiring only monocular video sequences for training. Compared to the recently popular depth-estimation-networks using a single image, our network learns to use motion parallax correction depth. Although MuDeepNet training requires the use of two adjacent frames to obtain motion parallax, it is tested by using a single image. Thus, MuDeepNet is a monocular system. The experiments on KITTI dataset show our MuDeepNet outperforms other methods.
Keywords Deep learning, depth consistency loss, depth estimation, optical flow, optical flow consistency loss, visual odometry (VO).
International Journal of Control, Automation and Systems 2019; 17(10): 2586-2596
Published online October 1, 2019 https://doi.org/10.1007/s12555-018-0926-8
Copyright © The International Journal of Control, Automation, and Systems.
Jun-Ning Zhang, Qun-Xing Su, Peng-Yuan Liu*, Hong-Yu Ge, and Ze-Feng Zhang
Army Engineering University Shijiazhuang Campus
We take formulate structure from motion as a learning problem, and propose an end-to-end learning framework to calculate the image depth, optical flow, and the camera motion. This framework is composed of multiple encoder-ecoder networks. The key part of the network structure is the FlowNet, which can improve the accuracy of the estimated camera ego-motion and depth. As with recent studies, we use an end-to-end learning approach with multi-view synthesis as a variety of supervision, and proposes multi-view consistency losses to constrain both depth and camera ego-motion, requiring only monocular video sequences for training. Compared to the recently popular depth-estimation-networks using a single image, our network learns to use motion parallax correction depth. Although MuDeepNet training requires the use of two adjacent frames to obtain motion parallax, it is tested by using a single image. Thus, MuDeepNet is a monocular system. The experiments on KITTI dataset show our MuDeepNet outperforms other methods.
Keywords: Deep learning, depth consistency loss, depth estimation, optical flow, optical flow consistency loss, visual odometry (VO).
Vol. 22, No. 12, pp. 3545~3811
Yaoxian Song*, Jun Wen, Dongfang Liu, and Changbin Yu
International Journal of Control, Automation and Systems 2022; 20(1): 243-254Sang Jun Lee, Heeyoul Choi, and Sung Soo Hwang*
International Journal of Control, Automation and Systems 2020; 18(1): 206-216Indah Monisa Firdiantika and Sungho Kim*
International Journal of Control, Automation, and Systems 2024; 22(11): 3295-3302