May 7, 2024, 4:48 a.m. | Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li

cs.CV updates on arXiv.org arxiv.org

arXiv:2309.17444v3 Announce Type: replace
Abstract: Text-conditioned diffusion models have emerged as a promising tool for neural video generation. However, current models still struggle with intricate spatiotemporal prompts and often generate restricted or incorrect motion. To address these limitations, we introduce LLM-grounded Video Diffusion (LVD). Instead of directly generating videos from the text inputs, LVD first leverages a large language model (LLM) to generate dynamic scene layouts based on the text inputs and subsequently uses the generated layouts to guide a …

abstract arxiv cs.ai cs.cl cs.cv current diffusion diffusion models generate however inputs limitations llm prompts struggle text tool type video video diffusion video generation videos

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US