all AI news
[P] Reproducing the "Self-Rewarding Language Models" Paper by MetaAI
March 15, 2024, 8:42 p.m. | /u/FallMindless3563
Machine Learning www.reddit.com
After reading the Self-Rewarding Language Models paper by the team at Meta, it felt very approachable and reproducible, so we spent some time implementing it.
The scripts provided take any base model and put it in a loop of :
1) Supervised fine-tuning on an initial dataset
2) Generating new prompts using the SFT
3) Generating N responses per prompt
4) Scoring the generated responses 1-5
5) Running DPO on the rewards from the model itself.
…
dataset felt fine-tuning hey language language models loop machinelearning meta paper reading scripts supervised fine-tuning team
More from www.reddit.com / Machine Learning
[D] Does DSPy actually change the LM weights?
1 day, 5 hours ago |
www.reddit.com
[D] Culture of Recycling Old Conference Submissions in ML
1 day, 8 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US