all AI news
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
May 6, 2024, 4:45 a.m. | Sanjoy Kundu, Shubham Trehan, Sathyanarayanan N. Aakur
cs.CV updates on arXiv.org arxiv.org
Abstract: Learning to infer labels in an open world, i.e., in an environment where the target ``labels'' are unknown, is an important characteristic for achieving autonomy. Foundation models, pre-trained on enormous amounts of data, have shown remarkable generalization skills through prompting, particularly in zero-shot inference. However, their performance is restricted to the correctness of the target label's search space, i.e., candidate labels provided in the prompt. This target search space can be unknown or exceptionally large …
abstract arxiv autonomy commonsense cs.cv data environment foundation labels novel object prompting reasoning skills through type videos visual world
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
2 days, 3 hours ago |
arxiv.org
Mirror-Aware Neural Humans
2 days, 3 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US