all AI news
LLM-grounded Video Diffusion Models
May 7, 2024, 4:48 a.m. | Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li
cs.CV updates on arXiv.org arxiv.org
Abstract: Text-conditioned diffusion models have emerged as a promising tool for neural video generation. However, current models still struggle with intricate spatiotemporal prompts and often generate restricted or incorrect motion. To address these limitations, we introduce LLM-grounded Video Diffusion (LVD). Instead of directly generating videos from the text inputs, LVD first leverages a large language model (LLM) to generate dynamic scene layouts based on the text inputs and subsequently uses the generated layouts to guide a …
abstract arxiv cs.ai cs.cl cs.cv current diffusion diffusion models generate however inputs limitations llm prompts struggle text tool type video video diffusion video generation videos
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
2 days, 17 hours ago |
arxiv.org
Mirror-Aware Neural Humans
2 days, 17 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US