GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition | allainews.com

May 14, 2024, 4:47 a.m. | Guangzhao Dai, Xiangbo Shu, Wenhao Wu, Rui Yan, Jiachao Zhang

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.10039v2 Announce Type: replace
Abstract: Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We propose a refined approach for ZS-EAR using VLMs, emphasizing fine-grained concept-description alignment that capitalizes on the rich semantic and contextual details in egocentric …

abstract action recognition advancement arxiv cs.cv datasets ear global language language models leads performance pre-trained models recognition replace scale tasks text the way type video vision vision-language vision-language models visual vlms zero-shot

More from arxiv.org / cs.CV updates on arXiv.org

Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception 1 day, 22 hours ago | arxiv.org

arxiv cs.cv fusion hybrid +5

DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling 1 day, 22 hours ago | arxiv.org

3d reconstruction abstract arxiv context +15

UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion 1 day, 22 hours ago | arxiv.org

abstract arxiv challenges cs.cv +19

DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image 1 day, 22 hours ago | arxiv.org

3d object abstract alignment annotations +15

POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization 1 day, 22 hours ago | arxiv.org

abstract action arxiv challenge +15

Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets 1 day, 22 hours ago | arxiv.org

3d reconstruction abstract accuracy aerial +22

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model 1 day, 22 hours ago | arxiv.org

arxiv cs.cv free object +10

MVTN: Learning Multi-View Transformations for 3D Understanding 1 day, 22 hours ago | arxiv.org

abstract arxiv cs.cv cs.gr +10

Automated Dominative Subspace Mining for Efficient Neural Architecture Search 1 day, 22 hours ago | arxiv.org

abstract architecture architectures arxiv +11

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Data Modeler

@ Synechron | Richmond, VA

View on ai-jobs.net

AI Applications Engineer in Data&AI (She/He/They)

@ Accenture | Warsaw, Sienna 39

View on ai-jobs.net

Master Data Specialist

@ Convatec | LBN-Lisbon

View on ai-jobs.net

Senior Analytics Data Specialist/Especialista sénior en Análisis de Datos

@ Workday | Costa Rica

View on ai-jobs.net