Feb. 8, 2024, 4:49 p.m. | /u/ExaminationNo8522

Machine Learning www.reddit.com

I was looking at training a diffusion model using RLHF, and was looking at this paper [kvablack/ddpo-pytorch: DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support (github.com)](https://github.com/kvablack/ddpo-pytorch/tree/main), but the code itself just seems to be backpropagating the unet based on a fancy(and differentiable at first glance!) loss function. What distinguishes reinforcement learning from just normal model training? Are the two the same and is it merely a matter of terminology?

Copying the relevant code here?

for i, sample …

accelerator code config false list machinelearning negative prompts sample torch tqdm train training unet

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training

@ Amazon.com | Cupertino, California, USA