Feb. 14, 2024, 5:41 a.m. | Yun-Da Tsai Ting-Yu Yen Pei-Fu Guo Zhe-Yan Li Shou-De Lin

cs.LG updates on arXiv.org arxiv.org

This research paper addresses the challenge of modality mismatch in multimodal learning, where the modalities available during inference differ from those available at training. We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach, an innovative method that utilizes Large Language Models (LLMs) with in-context learning and foundation models to enhance the generalizability of multimodal systems under these conditions. By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and …

alignment challenge context cs.cl cs.cv cs.lg foundation in-context learning inference language language models large language large language models llms multimodal multimodal learning paper research research paper text training

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Scientist (Database Development)

@ Nasdaq | Bengaluru-Affluence