Jan. 17, 2024, 7:02 p.m. | /u/Puzzleheaded_Stay_62

Machine Learning www.reddit.com

I wanted to point out a potential error in the derivation of gradient for the DPO Loss function. 

Loss function in Equation 7 states:

https://preview.redd.it/n2y68o10t1dc1.png?width=1130&format=png&auto=webp&s=6ec12ea6f75edc2fabee51e35c799d2c549611f6

whereas for the derivation in the appendix in Equation 21 we see that the negative sign is reversed as shown below.

https://preview.redd.it/v68pyfy1t1dc1.png?width=1218&format=png&auto=webp&s=08401b5f0c3a49d5ce97f781a89fc10e9991e1f5

However, the overall gradient used in the main section of the paper is correct and seems like only an issue with the appendix.

Please let me know if my understanding is correct (A …

derivation equation error function gradient issue loss machinelearning negative paper

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India