Web: http://arxiv.org/abs/2209.07118

Sept. 16, 2022, 1:16 a.m. | Zhihong Chen, Guanbin Li, Xiang Wan

cs.CL updates on arXiv.org arxiv.org

Medical vision-and-language pre-training (Med-VLP) has received considerable
attention owing to its applicability to extracting generic vision-and-language
representations from medical images and texts. Most existing methods mainly
contain three elements: uni-modal encoders (i.e., a vision encoder and a
language encoder), a multi-modal fusion module, and pretext tasks, with few
studies considering the importance of medical domain expert knowledge and
explicitly exploiting such knowledge to facilitate Med-VLP. Although there
exist knowledge-enhanced vision-and-language pre-training (VLP) methods in the
general domain, most require off-the-shelf …

arxiv knowledge language medical pre-training training vision

