all AI news
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models. (arXiv:2203.01922v1 [cs.CV])
cs.CL updates on arXiv.org arxiv.org
This paper presents a comprehensive survey of vision-language (VL)
intelligence from the perspective of time. This survey is inspired by the
remarkable progress in both computer vision and natural language processing,
and recent trends shifting from single modality processing to multiple modality
comprehension. We summarize the development in this field into three time
periods, namely task-specific methods, vision-language pre-training (VLP)
methods, and larger models empowered by large-scale weakly-labeled data. We
first take some common VL tasks as examples to introduce …
arxiv cv intelligence language language intelligence learning representation representation learning vision