all AI news
Reinforced Self-Training (ReST) for Language Modeling (Paper Explained)
Sept. 3, 2023, 12:06 p.m. | Yannic Kilcher
Yannic Kilcher www.youtube.com
ReST uses a bootsrap-like method to produce its own extended dataset and trains on ever higher-quality subsets of it to improve its own reward. The method allows for re-using the same generated data multiple times and thus has an efficiency advantage with respect to Online RL techniques like PPO.
Paper: https://arxiv.org/abs/2308.08998
Abstract:
Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a …
data dataset efficiency explained generated language llm modeling multiple paper ppo quality rest rlhf self-training training trains
More from www.youtube.com / Yannic Kilcher
[ML News] Chips, Robots, and Models
1 day, 21 hours ago |
www.youtube.com
TransformerFAM: Feedback attention is working memory
3 days, 19 hours ago |
www.youtube.com
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South