May 27, 2023, 3:48 p.m. | /u/AvvYaa

Machine Learning www.reddit.com

I just uploaded a video on my Youtube covering all the major techniques and challenges for training multi-modal models that can combine multiple input sources like images, text, audio, etc to perform amazing cross-modal tasks like text-image retrieval, multimodal vector arithmetic, visual question answering, and language modelling.

I thought it was a good time to make a video about this topic since more and more recent LLMs are moving away from text-only into visual-language domains (GPT-4, PaLM-2, etc). So in …

audio challenges etc good image images language language models machinelearning major modelling multimodal multiple question answering retrieval text text-image thought training vector video youtube

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Social Insights & Data Analyst (Freelance)

@ Media.Monks | Jakarta

Cloud Data Engineer

@ Arkatechture | Portland, ME, USA