Rethinking the Role of PPO in RLHF | allainews.com

Oct. 16, 2023, 9 a.m. |

The Berkeley Artificial Intelligence Research Blog bair.berkeley.edu

Rethinking the Role of PPO in RLHF

TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning phase, which optimizes a single, non-comparative reward. What if we performed RL in a comparative way?

Figure 1:
This diagram illustrates the difference between reinforcement learning from absolute feedback and relative feedback. By incorporating a new component - pairwise policy gradient, we can unify the reward modeling stage and …

difference figure fine-tuning form human ppo rlhf role

More from bair.berkeley.edu / The Berkeley Artificial Intelligence Research Blog

Modeling Extremely Large Images with xT 1 month, 1 week ago | bair.berkeley.edu

big block cameras computer +12

Modeling Extremely Large Images with xT 1 month, 1 week ago | bair.berkeley.edu

big block cameras computer +12

2024 BAIR Graduate Directory 1 month, 3 weeks ago | bair.berkeley.edu

academia ai research artificial artificial intelligence +15

2024 BAIR Graduate Directory 1 month, 3 weeks ago | bair.berkeley.edu

academia ai research artificial artificial intelligence +15

The Shift from Models to Compound AI Systems 2 months, 2 weeks ago | bair.berkeley.edu

ai application ai systems application attention +17

The Shift from Models to Compound AI Systems 2 months, 2 weeks ago | bair.berkeley.edu

ai application ai systems application attention +16

Ghostbuster: Detecting Text Ghostwritten by Large Language Models 5 months, 2 weeks ago | bair.berkeley.edu

ai-generated text art ban become +11

Ghostbuster: Detecting Text Ghostwritten by Large Language Models 5 months, 2 weeks ago | bair.berkeley.edu

ai-generated text art ban become +10

Asymmetric Certified Robustness via Feature-Convex Neural Networks 5 months, 2 weeks ago | bair.berkeley.edu

adversarial classifiers feature figure +6

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Senior Machine Learning Engineer

@ Samsara | Canada - Remote

View on ai-jobs.net