Regular Papers

International Journal of Control, Automation and Systems 2019; 17(10): 2586-2596

Published online July 6, 2019

https://doi.org/10.1007/s12555-018-0926-8

© The International Journal of Control, Automation, and Systems

MuDeepNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose Using Multi-view Consistency Loss

Jun-Ning Zhang, Qun-Xing Su, Peng-Yuan Liu*, Hong-Yu Ge, and Ze-Feng Zhang

Army Engineering University Shijiazhuang Campus

Abstract

We take formulate structure from motion as a learning problem, and propose an end-to-end learning framework to calculate the image depth, optical flow, and the camera motion. This framework is composed of multiple encoder-ecoder networks. The key part of the network structure is the FlowNet, which can improve the accuracy of the estimated camera ego-motion and depth. As with recent studies, we use an end-to-end learning approach with multi-view synthesis as a variety of supervision, and proposes multi-view consistency losses to constrain both depth and camera ego-motion, requiring only monocular video sequences for training. Compared to the recently popular depth-estimation-networks using a single image, our network learns to use motion parallax correction depth. Although MuDeepNet training requires the use of two adjacent frames to obtain motion parallax, it is tested by using a single image. Thus, MuDeepNet is a monocular system. The experiments on KITTI dataset show our MuDeepNet outperforms other methods.

Keywords Deep learning, depth consistency loss, depth estimation, optical flow, optical flow consistency loss, visual odometry (VO).

Article

Regular Papers

International Journal of Control, Automation and Systems 2019; 17(10): 2586-2596

Published online October 1, 2019 https://doi.org/10.1007/s12555-018-0926-8

Copyright © The International Journal of Control, Automation, and Systems.

MuDeepNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose Using Multi-view Consistency Loss

Jun-Ning Zhang, Qun-Xing Su, Peng-Yuan Liu*, Hong-Yu Ge, and Ze-Feng Zhang

Army Engineering University Shijiazhuang Campus

Abstract

We take formulate structure from motion as a learning problem, and propose an end-to-end learning framework to calculate the image depth, optical flow, and the camera motion. This framework is composed of multiple encoder-ecoder networks. The key part of the network structure is the FlowNet, which can improve the accuracy of the estimated camera ego-motion and depth. As with recent studies, we use an end-to-end learning approach with multi-view synthesis as a variety of supervision, and proposes multi-view consistency losses to constrain both depth and camera ego-motion, requiring only monocular video sequences for training. Compared to the recently popular depth-estimation-networks using a single image, our network learns to use motion parallax correction depth. Although MuDeepNet training requires the use of two adjacent frames to obtain motion parallax, it is tested by using a single image. Thus, MuDeepNet is a monocular system. The experiments on KITTI dataset show our MuDeepNet outperforms other methods.

Keywords: Deep learning, depth consistency loss, depth estimation, optical flow, optical flow consistency loss, visual odometry (VO).

IJCAS
December 2024

Vol. 22, No. 12, pp. 3545~3811

Stats or Metrics

Share this article on

  • line

Related articles in IJCAS

IJCAS

eISSN 2005-4092
pISSN 1598-6446