all AI news
Pixel-wise RL on Diffusion Models: Reinforcement Learning from Rich Feedback
April 9, 2024, 4:41 a.m. | Mo Kordzanganeh, Danial Keshvary, Nariman Arian
cs.LG updates on arXiv.org arxiv.org
Abstract: Latent diffusion models are the state-of-the-art for synthetic image generation. To align these models with human preferences, training the models using reinforcement learning on human feedback is crucial. Black et. al 2024 introduced denoising diffusion policy optimisation (DDPO), which accounts for the iterative denoising nature of the generation by modelling it as a Markov chain with a final reward. As the reward is a single value that determines the model's performance on the entire image, …
abstract art arxiv cs.cv cs.lg denoising diffusion diffusion models feedback human human feedback image image generation iterative latent diffusion models nature optimisation pixel policy reinforcement reinforcement learning state synthetic training type wise
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US