all AI news
ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
May 1, 2024, 3:03 p.m. | Yannic Kilcher
Yannic Kilcher www.youtube.com
Abstract:
While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional …
abstract algorithms alignment building context convergence fine-tuning foundation language language models paper results role sft study style supervised fine-tuning while
More from www.youtube.com / Yannic Kilcher
[ML News] Chips, Robots, and Models
2 weeks, 2 days ago |
www.youtube.com
TransformerFAM: Feedback attention is working memory
2 weeks, 4 days ago |
www.youtube.com
[ML News] Llama 3 changes the game
3 weeks, 2 days ago |
www.youtube.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US