March 28, 2024, 4:42 a.m. | Suraj Patni, Aradhye Agarwal, Chetan Arora

cs.LG updates on

arXiv:2403.18807v1 Announce Type: cross
Abstract: In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we …

arxiv cs.lg diffusion diffusion models type

