ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks | allainews.com

Feb. 28, 2024, 5:46 a.m. | Yang Liu, Xiaomin Yu, Gongyu Zhang, Christos Bergeles, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.17298v1 Announce Type: new
Abstract: In this study, we address the challenging task of bridging the modality gap between learning from language and inference for visual tasks, including Visual Question Answering (VQA), Image Captioning (IC) and Visual Entailment (VE). We train models for these tasks in a zero-shot cross-modal transfer setting, a domain where the previous state-of-the-art method relied on the fixed scale noise injection, often compromising the semantic content of the original modality embedding. To combat it, we propose …

abstract arxiv captioning cs.cv gap image inference language modal noise question question answering study tasks train type visual vqa zero-shot

More from arxiv.org / cs.CV updates on arXiv.org

Towards Arbitrary-Scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework via Implicit Self-texture Enhancement 6 hours ago | arxiv.org

abstract acquisition arxiv clinical +20

REBUS: A Robust Evaluation Benchmark of Understanding Symbols 6 hours ago | arxiv.org

abstract arxiv benchmark cities +23

Dreaming of Electrical Waves: Generative Modeling of Cardiac Excitation Waves using Diffusion Models 6 hours ago | arxiv.org

abstract arxiv cs.cv data +20

ASCNet: Asymmetric Sampling Correction Network for Infrared Image Destriping 6 hours ago | arxiv.org

arxiv cs.cv image network +3

Exposing Lip-syncing Deepfakes from Mouth Inconsistencies 6 hours ago | arxiv.org

arxiv cs.cv deepfakes replace +1

SSFlowNet: Semi-supervised Scene Flow Estimation On Point Clouds With Pseudo Label 6 hours ago | arxiv.org

abstract arxiv balance blend +11

CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels 6 hours ago | arxiv.org

abstract arxiv challenges cs.ai +16

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation 6 hours ago | arxiv.org

abstract alignment apply arxiv +21

FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models 6 hours ago | arxiv.org

abstract arxiv avatar capabilities +14

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Senior Machine Learning Engineer

@ BlackStone eIT | Egypt - Remote

View on ai-jobs.net

Machine Learning Engineer - 2

@ Parspec | Bengaluru, India

View on ai-jobs.net