[D] Can Direct Preference Optimization (DPO) be used to replace any type of RL for LLMs, or is it better suited for just scenarios like RLHF? | allainews.com

Oct. 16, 2023, 2:22 p.m. | /u/30299578815310

Machine Learning www.reddit.com

[DPO Paper](https://arxiv.org/abs/2305.18290)

I read a really fascinating paper where RL was used on LLMs to make them better at interacting in embodied environments. [https://arxiv.org/abs/2310.08588](https://arxiv.org/abs/2310.08588)

The technique was called Reinforcement Learning with Environmental Feedback (RLEF).

In the paper PPO was used, but I'm wondering if DPO could be used to replace it?

environmental feedback machinelearning paper ppo reinforcement reinforcement learning rlef

More from www.reddit.com / Machine Learning

[N] AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits … 11 hours ago | www.reddit.com

ai tools article artificial artificial intelligence +17

[D] software to design figures 13 hours ago | www.reddit.com

algorithms alphatensor alphazero create +11

[D] How to train a text detection model that will detect it's orientation (rotation) ranging … 14 hours ago | www.reddit.com

case convention detection image +6

[R] HGRN2: Gated Linear RNNs with State Expansion 18 hours ago | www.reddit.com

abstract attention expansion however +15

[R] A Primer on the Inner Workings of Transformer-based Language Models 18 hours ago | www.reddit.com

abstract advanced authors insights +9

[D] Fine-tune Phi-3 model for domain specific data - seeking advice and insights 21 hours ago | www.reddit.com

accuracy advice benchmark data +11

[R] Iterative Reasoning Preference Optimization 1 day, 1 hour ago | www.reddit.com

iterative machinelearning optimization reasoning

[D] Good strategies / resources to improve MLOps skills as a PhD student / researcher 1 day, 6 hours ago | www.reddit.com

eventually good index industry +12

[Discussion] Should I go to ICML and present my paper? 1 day, 7 hours ago | www.reddit.com

academia data data scientist future +10

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)

@ takealot.com | Cape Town

View on ai-jobs.net