[D] DPO Paper Potential Derivation Issue | allainews.com

Jan. 17, 2024, 7:02 p.m. | /u/Puzzleheaded_Stay_62

Machine Learning www.reddit.com

I wanted to point out a potential error in the derivation of gradient for the DPO Loss function.

Loss function in Equation 7 states:

https://preview.redd.it/n2y68o10t1dc1.png?width=1130&format=png&auto=webp&s=6ec12ea6f75edc2fabee51e35c799d2c549611f6

whereas for the derivation in the appendix in Equation 21 we see that the negative sign is reversed as shown below.

https://preview.redd.it/v68pyfy1t1dc1.png?width=1218&format=png&auto=webp&s=08401b5f0c3a49d5ce97f781a89fc10e9991e1f5

However, the overall gradient used in the main section of the paper is correct and seems like only an issue with the appendix.

Please let me know if my understanding is correct (A …

derivation equation error function gradient issue loss machinelearning negative paper

More from www.reddit.com / Machine Learning

[D] Why do juniors (undergraduates or first- to second-year PhD students) have so many papers … an hour ago | www.reddit.com

academic conferences etc hello +12

[R] Training-free Graph Neural Networks and the Power of Labels as Features 12 hours ago | www.reddit.com

features free graph graph neural networks +6

[D] Modern best coding practices for Pytorch (for research)? 15 hours ago | www.reddit.com

coding config example good +14

[R] Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic … 18 hours ago | www.reddit.com

breaking data machinelearning model collapse +3

[P] I reproduced Anthropic's recent interpretability research 19 hours ago | www.reddit.com

anthropic attention basic capabilities +8

[R] KAN: Kolmogorov-Arnold Networks 19 hours ago | www.reddit.com

abstract every function functions +11

[D] Looking for a recent study/paper/article that showed that an alternate model with a similar … 20 hours ago | www.reddit.com

article conversation machinelearning nothing +4

[2404.10667] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time 20 hours ago | www.reddit.com

audio generated machinelearning vasa +1

[D] Is RPE still a valid approach, or is RoPE entirely superior? 1 day ago | www.reddit.com

attention datasets embed information +8

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

View on ai-jobs.net

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India

View on ai-jobs.net