[D] Question on the loss function in DeepMind's Beyond Human Data paper. Why use reward-weighted loss if the reward is only ever 1 or 0, as opposed to just training on successes? | allainews.com

Dec. 31, 2023, 3:12 p.m. | /u/30299578815310

Machine Learning www.reddit.com

In the [paper](https://arxiv.org/pdf/2312.06585.pdf), they say that they assign binary rewards of 1 and 0 to the model's outputs. If the code ran successfully, or the math problem was solved, or w/e, then the reward is 1. Otherwise it is 0.

Later in the paper they say use reward-weighted negative log-likelihood loss for training.

If the reward is only ever 0 or 1 though, isn't this just normal negative log-likelihood loss, but where you only train on the success (the gradient …

complexity extra gradient isn likelihood loss machinelearning mods negative normal paper success train training

More from www.reddit.com / Machine Learning

[D] Is there a more systematic way of choosing the layers or how deep the … 3 hours ago | www.reddit.com

architecture deep learning least machinelearning +6

[D] Where does the real value of a data scientist come from? 7 hours ago | www.reddit.com

code companies data data scientist +11

[D] NVIDIA GPU Benchmarks & Comparison 10 hours ago | www.reddit.com

a100 ada cards cloud +15

[R] A Careful Examination of Large Language Model Performance on Grade School Arithmetic 11 hours ago | www.reddit.com

abstract benchmark benchmarks claim +21

[D] [R] Are there any methods/works that enable extracting high-quality dense feature map from CLIP/OpenCLIP … 13 hours ago | www.reddit.com

clip compute feature finetuning +8

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 18 hours ago | www.reddit.com

apps devices edge embed +15

[D] Any-dimensional equivariant neural networks 20 hours ago | www.reddit.com

abstract assumptions authors cases +18

[D] Geometrical meaning of Layer Normalization 1 day ago | www.reddit.com

hyperplane layer machinelearning mean +4

How are large network attack datasets made? [p] 1 day ago | www.reddit.com

attacks datasets detection free +5

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net