all AI news
Topic: ppo
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
6 days, 10 hours ago |
arxiv.org
Teaching Large Language Models to Reason with Reinforcement Learning
1 month, 2 weeks ago |
arxiv.org
[D] OpenRLHF - A Ray-based High-performance RLHF framework
4 months, 4 weeks ago |
www.reddit.com
Reinforcement Learning from Human Feedback (RLHF)
5 months, 3 weeks ago |
pub.towardsai.net
[P] The N Implementation Details of RLHF with PPO
5 months, 4 weeks ago |
www.reddit.com
The N Implementation Details of RLHF with PPO
5 months, 4 weeks ago |
huggingface.co
Rethinking the Role of PPO in RLHF
6 months, 1 week ago |
bair.berkeley.edu
Rethinking the Role of PPO in RLHF
6 months, 1 week ago |
bair.berkeley.edu
How Does PPO With Clipping Work?
6 months, 2 weeks ago |
towardsdatascience.com
Reinforced Self-Training (ReST) for Language Modeling (Paper Explained)
7 months, 2 weeks ago |
www.youtube.com
How to Code RLHF on LLama2 w/ LoRA, 4-bit, TRL, DPO
7 months, 3 weeks ago |
www.youtube.com
[D] How to actually do the final PPO with a reward model in RLHF?
8 months, 4 weeks ago |
www.reddit.com
Research Focus: Week of July 17, 2023
9 months ago |
www.microsoft.com
Direct Preference Optimization: Forget RLHF (PPO)
10 months, 2 weeks ago |
www.youtube.com
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
6 days, 10 hours ago |
arxiv.org
Items published with this topic over the last 90 days.
Latest
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
6 days, 10 hours ago |
arxiv.org
Teaching Large Language Models to Reason with Reinforcement Learning
1 month, 2 weeks ago |
arxiv.org
[D] OpenRLHF - A Ray-based High-performance RLHF framework
4 months, 4 weeks ago |
www.reddit.com
Reinforcement Learning from Human Feedback (RLHF)
5 months, 3 weeks ago |
pub.towardsai.net
[P] The N Implementation Details of RLHF with PPO
5 months, 4 weeks ago |
www.reddit.com
The N Implementation Details of RLHF with PPO
5 months, 4 weeks ago |
huggingface.co
Rethinking the Role of PPO in RLHF
6 months, 1 week ago |
bair.berkeley.edu
Rethinking the Role of PPO in RLHF
6 months, 1 week ago |
bair.berkeley.edu
How Does PPO With Clipping Work?
6 months, 2 weeks ago |
towardsdatascience.com
Reinforced Self-Training (ReST) for Language Modeling (Paper Explained)
7 months, 2 weeks ago |
www.youtube.com
How to Code RLHF on LLama2 w/ LoRA, 4-bit, TRL, DPO
7 months, 3 weeks ago |
www.youtube.com
[D] How to actually do the final PPO with a reward model in RLHF?
8 months, 4 weeks ago |
www.reddit.com
Research Focus: Week of July 17, 2023
9 months ago |
www.microsoft.com
Direct Preference Optimization: Forget RLHF (PPO)
10 months, 2 weeks ago |
www.youtube.com
Topic trend (last 90 days)
Top (last 7 days)
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
6 days, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
[Job - 14823] Senior Data Scientist (Data Analyst Sr)
@ CI&T | Brazil
Data Engineer
@ WorldQuant | Hanoi
ML Engineer / Toronto
@ Intersog | Toronto, Ontario, Canada
Analista de Business Intelligence (Industry Insights)
@ NielsenIQ | Cotia, Brazil