all AI news
[D] Can Direct Preference Optimization (DPO) be used to replace any type of RL for LLMs, or is it better suited for just scenarios like RLHF?
Oct. 16, 2023, 2:22 p.m. | /u/30299578815310
Machine Learning www.reddit.com
I read a really fascinating paper where RL was used on LLMs to make them better at interacting in embodied environments. [https://arxiv.org/abs/2310.08588](https://arxiv.org/abs/2310.08588)
The technique was called Reinforcement Learning with Environmental Feedback (RLEF).
In the paper PPO was used, but I'm wondering if DPO could be used to replace it?
environmental feedback machinelearning paper ppo reinforcement reinforcement learning rlef
More from www.reddit.com / Machine Learning
[D] software to design figures
13 hours ago |
www.reddit.com
[Discussion] Should I go to ICML and present my paper?
1 day, 7 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)
@ takealot.com | Cape Town