Depth data is indispensable for reconstructing or understanding 3D scenes. It serves as a key ingredient for applications such as synthetic defocus, autonomous driving, and augmented reality.
Although active 3D sensors (e.g., Lidar, ToF, and structured-light 3D scanner) can be employed, retrieving depth from monocular/stereo cameras is typically a more cost-effective approach.
However, estimating depth from images is inherently under-determined, to regularize the problem, one typically needs handcrafted models characterizing the properties of depth data or scene geometry.
As the recent advances in deep learning, depth estimation is cast as a learning task, leading to state-of-the-art performance. In this talk, I will present our new progress on depth estimation with convolutional neural networks (CNN).
Particularly, I will first introduce cascade residual learning (CRL), our two-stage deep architecture on stereo matching producing high-quality disparity estimates. Observations with CRL inspires us to propose a domain-adaptation approach---zoom and learn (ZOLE)---for training a deep stereo matching algorithm without the ground-truth data of a target domain.
By combining a view synthesis network and the first stage of CRL, we propose single view stereo matching (SVS) for single image depth estimation, with a performance superior to the classic stereo block matching method taking two images as inputs.
Finally, I will present our endeavours when applying our core techniques to the depth-of-field effects on dual-lens smart phones.