all AI news
Neuro-Symbolic Video Search
March 19, 2024, 4:48 a.m. | Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali
cs.CV updates on arXiv.org arxiv.org
Abstract: The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. …
abstract art arxiv cs.ai cs.cv data extract foundation key long-term neuro production reasoning retrieval search semantic state systems tasks temporal tools type understanding video video data videos
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
1 day, 16 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Intern Large Language Models Planning (f/m/x)
@ BMW Group | Munich, DE
Data Engineer Analytics
@ Meta | Menlo Park, CA | Remote, US