Nov. 12, 2022, 11:55 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2210.09263](https://arxiv.org/abs/2210.09263)

Abstract:

>This paper surveys vision-language pre-training (VLP) methods for **multimodal intelligence** that have been developed in the last few years. We group these approaches into three categories: (*i*) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; (*ii*) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and (*iii*) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each …

basics future language machinelearning microsoft pre-training training trends vision

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Automated Greenhouse Expert - Phenotyping & Data Analysis (all genders)

@ Bayer | Frankfurt a.M., Hessen, DE

Machine Learning Scientist II

@ Expedia Group | India - Bengaluru

Data Engineer/Senior Data Engineer, Bioinformatics

@ Flagship Pioneering, Inc. | Cambridge, MA USA

Intern (AI lab)

@ UL Solutions | Dublin, Co. Dublin, Ireland

Senior Operations Research Analyst / Predictive Modeler

@ LinQuest | Colorado Springs, Colorado, United States