Dec. 5, 2023, 3:21 p.m. | AssemblyAI

AssemblyAI www.youtube.com

Multimodality is the ability of an AI model to work with different types (or "modalities") of data, like text, audio, and images. Multimodality is what allows for a model like GPT-4 to write code given a diagram, and models like DALL-E 3 to generate an image given a description.

In this video, we'll learn about how multimodality works in AI, and the distinction between multimodal models and multimodal interfaces.

Links:

Intro repository: https://github.com/AssemblyAI-Examples/chatgpt-image-interface
Introduction to Diffusion Models: https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction/
How DALL-E …

ai model ai models audio code dall dall-e dall-e 3 data generate gpt gpt-4 image images multimodal multimodal ai multimodality simple text types video work

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AI Engineering Manager

@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain