all AI news
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
June 28, 2024, 4:47 a.m. | Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li
cs.CV updates on arXiv.org arxiv.org
Abstract: The recent progress in Large Language Models (LLM) has spurred various advancements in image-language conversation agents, while how to build a proficient video-based dialogue system is still under exploration. Considering the extensive scale of LLM and visual backbone, minimal GPU memory is left for facilitating effective temporal modeling, which is crucial for comprehending and providing feedback on videos. To this end, we propose Branching Temporal Adapter (BT-Adapter), a novel method for extending image-language pretrained models …
abstract adapter agents arxiv build conversation cs.cv dialogue exploration gpu image instruction tuning language language models large language large language models llm memory progress replace scale tuning type video visual while
More from arxiv.org / cs.CV updates on arXiv.org
PlaNet-S: Automatic Semantic Segmentation of Placenta
2 days, 10 hours ago |
arxiv.org
Continuous 3D Myocardial Motion Tracking via Echocardiography
2 days, 10 hours ago |
arxiv.org
Optimal Transport Aggregation for Visual Place Recognition
2 days, 10 hours ago |
arxiv.org
AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation
2 days, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Scientist
@ Ford Motor Company | Chennai, Tamil Nadu, India
Systems Software Engineer, Graphics
@ Parallelz | Vancouver, British Columbia, Canada - Remote
Engineering Manager - Geo Engineering Team (F/H/X)
@ AVIV Group | Paris, France
Data Analyst
@ Microsoft | San Antonio, Texas, United States
Azure Data Engineer
@ TechVedika | Hyderabad, India
Senior Data & AI Threat Detection Researcher (Cortex)
@ Palo Alto Networks | Tel Aviv-Yafo, Israel