May 27, 2023, 3:49 p.m. | /u/AvvYaa

Computer Vision www.reddit.com

I just uploaded a video on my Youtube covering all the major techniques and challenges for training multi-modal models that can combine multiple input sources like images, text, audio, etc to perform amazing cross-modal tasks like text-image retrieval, multimodal vector arithmetic, visual question answering, and language modelling. So many amazing results of the past few years have left my jaws on the floor.

I thought it was a good time to make a video about this topic since more and …

audio challenges computervision etc image images language language models major modelling multimodal multiple question answering retrieval text text-image training vector video youtube

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US