all AI news
[D] Reinforced Self-Training (ReST) for Language Modeling (Video Paper Discussion)
Sept. 3, 2023, 12:16 p.m. | /u/ykilcher
Machine Learning www.reddit.com
ReST uses a bootsrap-like method to produce its own extended dataset and trains on ever higher-quality subsets of it to improve its own reward. The method allows for re-using the same generated data multiple times and thus has an efficiency advantage with respect to Online RL techniques like PPO.
Paper: [https://arxiv.org/abs/2308.08998](https://arxiv.org/abs/2308.08998)
Abstract:
Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a …
abstract data dataset efficiency feedback generated human human feedback machinelearning multiple ppo quality reinforcement reinforcement learning rest rlhf trains
More from www.reddit.com / Machine Learning
[D] software to design figures
13 hours ago |
www.reddit.com
[Discussion] Should I go to ICML and present my paper?
1 day, 6 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Modeler
@ Sherwin-Williams | Cleveland, OH, United States