all AI news
Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition
April 1, 2024, 4:44 a.m. | Mingxing Rao, Yinhong Qin, Soheil Kolouri, Jie Ying Wu, Daniel Moyer
cs.CV updates on arXiv.org arxiv.org
Abstract: Purpose: Surgical video is an important data stream for gesture recognition. Thus, robust visual encoders for those data-streams is similarly important. Methods: Leveraging the Bridge-Prompt framework, we fine-tune a pre-trained vision-text model (CLIP) for gesture recognition in surgical videos. This can utilize extensive outside video data such as text, but also make use of label meta-data and weakly supervised contrastive losses. Results: Our experiments show that prompt-based video encoder outperforms standard encoders in surgical gesture …
abstract arxiv bridge clip cs.cv data data stream encoder framework gesture recognition prompt recognition robust text type video video data videos vision visual zero-shot
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Software Engineer, Data Tools - Full Stack
@ DoorDash | Pune, India
Senior Data Analyst
@ Artsy | New York City