all AI news
[D] Question about Direct Preference Optimization (DPO) equation
Jan. 16, 2024, 9:05 a.m. | /u/erap129
Machine Learning www.reddit.com
https://preview.redd.it/6ubjn8ekprcc1.png?width=1324&format=png&auto=webp&s=c932f5c030c2fb6b5f0f136934b047bc364d1dcc
I don't understand the division by pi\\\_ref (both for y\\\_w and for y\\\_l). I know the purpose is that the finetuned model won't stray too far away from the reference model, but Just looking at it mathematically - why should pi\\\_ref(y\\\_w|x) be close to pi\\\_theta(y\\\_w|x)?
At least for y\\\_w it seems like the loss would benefit from pi\\\_ref(y\\\_w|x) being as close as possible to 0 because we want to …
direct preference optimization equation loss machinelearning optimization question reference
More from www.reddit.com / Machine Learning
[D] Strange Loss Curve while training
11 hours ago |
www.reddit.com
Non Technical ML Podcasts? [D]
1 day, 6 hours ago |
www.reddit.com
[D] PEFT techniques actually used in the industry
1 day, 9 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US