June 28, 2024, 4:47 a.m. | Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li

cs.CV updates on arXiv.org arxiv.org

arXiv:2309.15785v2 Announce Type: replace
Abstract: The recent progress in Large Language Models (LLM) has spurred various advancements in image-language conversation agents, while how to build a proficient video-based dialogue system is still under exploration. Considering the extensive scale of LLM and visual backbone, minimal GPU memory is left for facilitating effective temporal modeling, which is crucial for comprehending and providing feedback on videos. To this end, we propose Branching Temporal Adapter (BT-Adapter), a novel method for extending image-language pretrained models …

abstract adapter agents arxiv build conversation cs.cv dialogue exploration gpu image instruction tuning language language models large language large language models llm memory progress replace scale tuning type video visual while

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India

Systems Software Engineer, Graphics

@ Parallelz | Vancouver, British Columbia, Canada - Remote

Engineering Manager - Geo Engineering Team (F/H/X)

@ AVIV Group | Paris, France

Data Analyst

@ Microsoft | San Antonio, Texas, United States

Azure Data Engineer

@ TechVedika | Hyderabad, India

Senior Data & AI Threat Detection Researcher (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel