Visual captions: Using large language models to augment video conferences with dynamic visuals | allainews.com

June 6, 2023, 4:58 p.m. | Google AI (noreply@blogger.com)

Google AI Blog ai.googleblog.com

Posted by Ruofei Du, Research Scientist, and Alex Olwal, Senior Staff Research Scientist, Google Augmented Reality

Recent advances in video conferencing have significantly improved remote video communication through features like live captioning and noise cancellation. However, there are various situations where dynamic visual augmentation would be useful to better convey complex and nuanced information. For example, when discussing what to order at a Japanese restaurant, your friends could share visuals that would help you feel more confident about ordering the …

alex augmentation augmented reality captioning communication conferences conferencing deep learning dynamic features google hci language language models large language models natural-language understanding noise reality research staff through video video conferencing visuals

More from ai.googleblog.com / Google AI Blog

Generative AI to quantify uncertainty in weather forecasting 1 month ago | ai.googleblog.com

climate decisions engineer example +17

AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks 1 month ago | ai.googleblog.com

bayesian data economic engineer +23

Computer-aided diagnosis for lung cancer screening 1 month, 1 week ago | ai.googleblog.com

cancer cancer screening computer diagnosis +16

Using AI to expand global access to reliable flood forecasts 1 month, 1 week ago | ai.googleblog.com

billion disaster engineering environment +13

ScreenAI: A visual language model for UI and visually-situated language understanding 1 month, 1 week ago | ai.googleblog.com

charts communication design diagrams +24

SCIN: A new resource for representative dermatology images 1 month, 1 week ago | ai.googleblog.com

crowd-sourcing dataset datasets dermatology +14

MELON: Reconstructing 3D objects from images with unknown poses 1 month, 1 week ago | ai.googleblog.com

3d objects capacity computer vision engineer +16

HEAL: A framework for health equity assessment of machine learning performance 1 month, 2 weeks ago | ai.googleblog.com

assessment clinical core differences +17

Cappy: Outperforming and boosting large multi-task language models with a small scorer 1 month, 2 weeks ago | ai.googleblog.com

boosting engineers framework google +25

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Research Scientist (Computer Science)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

Intern - Sales Data Management

@ Deliveroo | Dubai, UAE (Main Office)

View on ai-jobs.net