April 2, 2024, 7:46 p.m. | Tongkun Su, Jun Li, Xi Zhang, Haibo Jin, Hao Chen, Qiong Wang, Faqin Lv, Baoliang Zhao, Yin Hu

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.00226v1 Announce Type: new
Abstract: Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. To the best of our knowledge, we are the first to utilize Visual Question Answering (VQA) for multimodal pre-training to guide the framework focusing on targeted pathological features. In this …

abstract annotations arxiv clinicians cs.cl cs.cv design domain extra features guide however learn medical multimodal pre-training question question answering reports tasks them training type visual

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Snowflake Analytics Engineer - Technology Sector

@ Winning | Lisbon, Lisbon

Business Data Analyst

@ RideCo | Waterloo, Ontario, Canada

Senior Data Scientist, Payment Risk

@ Block | Boston, MA, United States

Research Scientist, Data Fusion (Climate TRACE)

@ WattTime | Remote

Technical Analyst (Data Analytics)

@ Contact Government Services | Fayetteville, AR