[D] what's the proper way of doing direct preference optimization (DPO) and why? | allainews.com

Jan. 29, 2024, 5:30 a.m. | /u/aaaprocrastinating

Machine Learning www.reddit.com

For some reason I just could not wrap my mind around the data distribution problem with DPO. In the paper it says:

https://preview.redd.it/6c9z61o4bbfc1.png?width=2164&format=png&auto=webp&s=c6b5ed46937da04e5912023e2f46ae7821a9a446

My question is: why does it matter so much that the preference data distribution aligns with the reference model output distribution? My understanding is that during training, the parameters of the sft are updated such that chosen responses (y\_w) have a higher probability of being generated, and rejected responses (y\_l) have a lower probability of being generated, …

data direct preference optimization distribution machinelearning matter mind optimization paper question reason reference understanding

More from www.reddit.com / Machine Learning

[R] AlphaMath Almost Zero: process Supervision without process 7 hours ago | www.reddit.com

abstract code errors however +15

[D] ECCV 2024 Review Discussion 7 hours ago | www.reddit.com

center conferences eccv machinelearning +5

[D] Is it a good idea for a 3rd year PhD student to start a … 9 hours ago | www.reddit.com

academic extra good hearing +7

[D] Use VQ-VAEs for SSL? 10 hours ago | www.reddit.com

create diffusion diffusion models embedding +10

[D] Matrix Profile vs. Deep Learning for Multivariate Time Series 12 hours ago | www.reddit.com

context curiosity data deep learning +16

[D] Reviewers you all need to stop being so lazy dog. Why are reviewers doing … 14 hours ago | www.reddit.com

authors check conference dog +8

[Research] Adaptable and Intelligent Generative AI through Advanced Information Lifecycle (AIL) 18 hours ago | www.reddit.com

abstract accuracy adaptability advanced +17

[Research] Consistency LLMs: converting LLMs to parallel decoders accelerates inference 3.5x 20 hours ago | www.reddit.com

check decoding deployment family +17

[D] Tips and tricks for performing large model checkpointing 20 hours ago | www.reddit.com

big challenge good job +10

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net