Web: http://arxiv.org/abs/2105.11333

Sept. 22, 2022, 1:14 a.m. | Jong Hak Moon, Hyungyung Lee, Woncheol Shin, Young-Hak Kim, Edward Choi

cs.CV updates on arXiv.org arxiv.org

Recently a number of studies demonstrated impressive performance on diverse
vision-language multi-modal tasks such as image captioning and visual question
answering by extending the BERT architecture with multi-modal pre-training
objectives. In this work we explore a broad set of multi-modal representation
learning tasks in the medical domain, specifically using radiology images and
the unstructured report. We propose Medical Vision Language Learner (MedViLL),
which adopts a BERT-based architecture combined with a novel multi-modal
attention masking scheme to maximize generalization performance for …

arxiv images language medical pre-training text training understanding vision

More from arxiv.org / cs.CV updates on arXiv.org

Machine Learning Product Manager (Canada, Remote)

@ FreshBooks | Canada

Data Engineer

@ Amazon.com | Irvine, California, USA

Senior Autonomy Behavior II, Performance Assessment Engineer

@ Cruise LLC | San Francisco, CA

Senior Data Analytics Engineer

@ Intercom | Dublin, Ireland

Data Analyst Intern

@ ADDX | Singapore

Data Science Analyst - Consumer

@ Yelp | London, England, United Kingdom

Senior Data Analyst - Python+Hadoop

@ Capco | India - Bengaluru

DevOps Engineer, Data Team

@ SingleStore | Hyderabad, India

Software Engineer (Machine Learning, AI Platform)

@ Phaidra | Remote

Sr. UI/UX Designer - Artificial Intelligence (ID:1213)

@ Truelogic Software | Remote, anywhere in LATAM

Analytics Engineer

@ carwow | London, England, United Kingdom

HRIS Data Analyst

@ SecurityScorecard | Remote