ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition | allainews.com

June 11, 2024, 4:50 a.m. | Sanjoy Kundu, Shubham Trehan, Sathyanarayanan N. Aakur

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.05722v1 Announce Type: new
Abstract: Learning to infer labels in an open world, i.e., in an environment where the target "labels" are unknown, is an important characteristic for achieving autonomy. Foundation models pre-trained on enormous amounts of data have shown remarkable generalization skills through prompting, particularly in zero-shot inference. However, their performance is restricted to the correctness of the target label's search space. In an open world, this target search space can be unknown or exceptionally large, which severely restricts …

abstract action action recognition arxiv autonomy commonsense cs.cv data environment foundation however important inference labels object open-world prompting reasoning recognition skills through type visual world zero-shot

More from arxiv.org / cs.CV updates on arXiv.org

InstantGroup: Instant Template Generation for Scalable Group of Brain MRI Registration 13 hours ago | arxiv.org

abstract arxiv brain costs +15

Visual Odometry with Neuromorphic Resonator Networks 13 hours ago | arxiv.org

abstract arxiv cs.ai cs.cv +15

CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video 13 hours ago | arxiv.org

arxiv cs.cv dynamic neural radiance field +4

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models 13 hours ago | arxiv.org

arxiv cs.cv instruction-tuned language +6

Towards Training-free Open-world Segmentation via Image Prompt Foundation Models 13 hours ago | arxiv.org

abstract arxiv computer computer vision +33

Re-initialization-free Level Set Method via Molecular Beam Epitaxy Equation Regularization for Image Segmentation 13 hours ago | arxiv.org

abstract arxiv become continuity +15

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer 13 hours ago | arxiv.org

arxiv cs.ai cs.cv cs.cy +9

Unsupervised Open-Vocabulary Object Localization in Videos 13 hours ago | arxiv.org

abstract advances arxiv attention +21

Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network 13 hours ago | arxiv.org

arxiv compensation cs.cv eess.iv +6

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

PhD Student AI simulation electric drive (f/m/d)

@ Volkswagen Group | Kassel, DE, 34123

View on ai-jobs.net

AI Privacy Research Lead

@ Leidos | 6314 Remote/Teleworker US

View on ai-jobs.net

Senior Platform System Architect, Silicon

@ Google | New Taipei, Banqiao District, New Taipei City, Taiwan

View on ai-jobs.net

Fabrication Hardware Litho Engineer, Quantum AI

@ Google | Goleta, CA, USA

View on ai-jobs.net