[R] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | allainews.com

March 2, 2024, 4:56 p.m. | /u/Successful-Western27

Machine Learning www.reddit.com

Training AI to understand and describe video content requires datasets which are expensive for humans to annotate manually. Now researchers from Snap, UC Merced, and the University of Trento have put together a new dataset called Panda-70M that aims to help.

This new dataset has 70 million high-res YouTube clips paired with descriptive captions. The key is they used an automated pipeline with multiple cross-modal "teacher" AI models to generate captions based on different inputs like video, subtitles, images, etc. …

captioning dataset datasets humans machinelearning multiple researchers snap teachers together training training ai university video videos youtube

More from www.reddit.com / Machine Learning

[R] Training-free Graph Neural Networks and the Power of Labels as Features 6 hours ago | www.reddit.com

features free graph graph neural networks +6

[D] Modern best coding practices for Pytorch (for research)? 9 hours ago | www.reddit.com

coding config example good +14

[P] I reproduced Anthropic's recent interpretability research 12 hours ago | www.reddit.com

anthropic attention basic capabilities +8

[R] KAN: Kolmogorov-Arnold Networks 13 hours ago | www.reddit.com

abstract every function functions +11

[D] Looking for a recent study/paper/article that showed that an alternate model with a similar … 13 hours ago | www.reddit.com

article conversation machinelearning nothing +4

[2404.10667] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time 14 hours ago | www.reddit.com

audio generated machinelearning vasa +1

[D] Is RPE still a valid approach, or is RoPE entirely superior? 18 hours ago | www.reddit.com

attention datasets embed information +8

[D] TensorDock — GPU Cloud Marketplace, H100s from $2.49/hr 20 hours ago | www.reddit.com

building cloud cloud gpu gpu +17

How does freezing a model work? [D] 23 hours ago | www.reddit.com

clip encoder guides inputs +9

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

View on ai-jobs.net

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru

View on ai-jobs.net