[D] What makes PPO reinforcement learning and not just having a fancy loss function? | allainews.com

Feb. 8, 2024, 4:49 p.m. | /u/ExaminationNo8522

Machine Learning www.reddit.com

I was looking at training a diffusion model using RLHF, and was looking at this paper [kvablack/ddpo-pytorch: DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support (github.com)](https://github.com/kvablack/ddpo-pytorch/tree/main), but the code itself just seems to be backpropagating the unet based on a fancy(and differentiable at first glance!) loss function. What distinguishes reinforcement learning from just normal model training? Are the two the same and is it merely a matter of terminology?

Copying the relevant code here?

for i, sample …

accelerator code config false list machinelearning negative prompts sample torch tqdm train training unet

More from www.reddit.com / Machine Learning

[D] ICLR Outstanding Paper Awards. Congratulations! 3 hours ago | www.reddit.com

abstract feature identify images +12

[D] Where does the term "feature" come from? 4 hours ago | www.reddit.com

call engineering feature features +8

[R] AlphaMath Almost Zero: process Supervision without process 11 hours ago | www.reddit.com

abstract code errors however +15

[D] ECCV 2024 Review Discussion 11 hours ago | www.reddit.com

center conferences eccv machinelearning +5

[D] Is it a good idea for a 3rd year PhD student to start a … 13 hours ago | www.reddit.com

academic extra good hearing +7

[D] Use VQ-VAEs for SSL? 14 hours ago | www.reddit.com

create diffusion diffusion models embedding +10

[D] Matrix Profile vs. Deep Learning for Multivariate Time Series 16 hours ago | www.reddit.com

context curiosity data deep learning +16

[D] Reviewers you all need to stop being so lazy dog. Why are reviewers doing … 18 hours ago | www.reddit.com

authors check conference dog +8

[Research] Adaptable and Intelligent Generative AI through Advanced Information Lifecycle (AIL) 22 hours ago | www.reddit.com

abstract accuracy adaptability advanced +17

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net