all AI news
A Survey of Video Datasets for Grounded Event Understanding
June 17, 2024, 4:46 a.m. | Kate Sanders, Benjamin Van Durme
cs.CV updates on arXiv.org arxiv.org
Abstract: While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, video benchmark tasks have implicitly tested for this ability (e.g., video captioning, in which models describe visual events with natural language), but they do not …
abstract ai systems arxiv benchmarks cs.ai cs.cv datasets event human identify multimodal multimodal ai perception question reasoning retrieval sense survey systems tasks temporal things type understanding video visual while
More from arxiv.org / cs.CV updates on arXiv.org
PlaNet-S: Automatic Semantic Segmentation of Placenta
1 day, 6 hours ago |
arxiv.org
Continuous 3D Myocardial Motion Tracking via Echocardiography
1 day, 6 hours ago |
arxiv.org
Optimal Transport Aggregation for Visual Place Recognition
1 day, 6 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Senior Clinical Data Scientist
@ Novartis | Home Worker
R&D Senior Data Scientist 1
@ Jotun | Sandefjord
Data Scientist - Corporate Audit, Officer
@ State Street | Toronto, Ontario
Senior Manager, Data Science & Analytics Solutions - Safety
@ Hyundai Motor America | Fountain Valley, CA, US, 92708
Data Science Working Student (all genders)
@ Merck Group | Darmstadt, Hessen, DE, 64293
Senior Data Scientist (m/f/d)
@ BASF | Limburgerhof, DE