Scaling multimodal understanding to long videos | allainews.com

Nov. 14, 2023, 8:28 p.m. | Google AI (noreply@blogger.com)

Google AI Blog ai.googleblog.com

Posted by Isaac Noble, Software Engineer, Google Research, and Anelia Angelova, Research Scientist, Google DeepMind

When building machine learning models for real-life applications, we need to consider inputs from multiple modalities in order to capture various aspects of the world around us. For example, audio, video, and text all provide varied and complementary information about a visual input. However, building multimodal models is challenging due to the heterogeneity of the modalities. Some of the modalities might be well synchronized in …

ai applications audio building computer vision deepmind engineer example google google deepmind google research information life machine machine learning machine learning models multimodal multimodal learning multiple research research scientist scaling software software engineer text understanding video video analysis videos world

More from ai.googleblog.com / Google AI Blog

Generative AI to quantify uncertainty in weather forecasting 1 month ago | ai.googleblog.com

climate decisions engineer example +17

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks 1 month ago | ai.googleblog.com

bayesian data economic engineer +23

Computer-aided diagnosis for lung cancer screening 1 month, 1 week ago | ai.googleblog.com

cancer cancer screening computer diagnosis +16

Using AI to expand global access to reliable flood forecasts 1 month, 1 week ago | ai.googleblog.com

billion disaster engineering environment +13

ScreenAI: A visual language model for UI and visually-situated language understanding 1 month, 2 weeks ago | ai.googleblog.com

charts communication design diagrams +24

SCIN: A new resource for representative dermatology images 1 month, 2 weeks ago | ai.googleblog.com

crowd-sourcing dataset datasets dermatology +14

MELON: Reconstructing 3D objects from images with unknown poses 1 month, 2 weeks ago | ai.googleblog.com

3d objects capacity computer vision engineer +16

HEAL: A framework for health equity assessment of machine learning performance 1 month, 2 weeks ago | ai.googleblog.com

assessment clinical core differences +17

Cappy: Outperforming and boosting large multi-task language models with a small scorer 1 month, 2 weeks ago | ai.googleblog.com

boosting engineers framework google +25

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net