Monocular Depth Estimation Using Encoder-Decoder Architecture and Transfer Learning from Single RGB Image

Abstract

Depth estimation from a single RGB image has been one of the most important research topics in recent days as it has several important applications in self-supervised driving in autonomous cars, image reconstruction, and scene segmentation. Depth estimation from a single monocular image has been challenging as compared to stereo images due to the lack of spatio-temporal features per frame that makes 3D depth perception easier. Existing models and solutions in monocular depth estimation often resulted in low resolution and blurry depth maps and often fail to identify small object boundaries. In this paper, we propose a simple encoder-decoder based network that can predict high-quality depth images from single RGB images using transfer learning. We have utilized important features extracted from pre-trained networks, and after initializing the encoder with fine-tuning and important augmentation strategies, the network decoder part computes the high-end depth maps. The network has fewer trainable parameters and small iterations, though it outperforms the existing state-of-the-art methods and captures accurate boundaries when evaluated on two standard datasets, KITTI, and NYU Depth V2.

References

Page 1

	Year	Citations

Page 1