all AI news
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Feb. 27, 2024, 5:47 a.m. | Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng
cs.CV updates on arXiv.org arxiv.org
Abstract: Despite progress in video-language modeling, the computational challenge of interpreting long-form videos in response to task-specific linguistic queries persists, largely due to the complexity of high-dimensional video data and the misalignment between language and visual cues over space and time. To tackle this issue, we introduce a novel approach called Language-guided Spatial-Temporal Prompt Learning (LSTP). This approach features two key components: a Temporal Prompt Sampler (TPS) with optical flow prior that leverages temporal information to …
abstract arxiv challenge complexity computational cs.cl cs.cv data form issue language modeling progress prompt prompt learning queries space space and time spatial temporal text text understanding type understanding video video data videos visual visual cues
More from arxiv.org / cs.CV updates on arXiv.org
Multi-View Spectrogram Transformer for Respiratory Sound Classification
2 days, 22 hours ago |
arxiv.org
GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation
2 days, 22 hours ago |
arxiv.org
OTMatch: Improving Semi-Supervised Learning with Optimal Transport
2 days, 22 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Senior Applied Data Scientist
@ dunnhumby | London
Principal Data Architect - Azure & Big Data
@ MGM Resorts International | Home Office - US, NV