June 24, 2022, 1:12 a.m. | İlker Kesen, Ozan Arkan Can, Erkut Erdem, Aykut Erdem, Deniz Yuret

cs.CV updates on arXiv.org arxiv.org

How to best integrate linguistic and perceptual processing in multi-modal
tasks that involve language and vision is an important open problem. In this
work, we argue that the common practice of using language in a top-down manner,
to direct visual attention over high-level visual features, may not be optimal.
We hypothesize that the use of language to also condition the bottom-up
processing from pixels to high-level features can provide benefits to the
overall performance. To support our claim, we propose …

arxiv cv filters language processing top visual processing

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne