all AI news
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
Dec. 22, 2023, noon | AI Coffee Break with Letitia
AI Coffee Break with Letitia www.youtube.com
📜 Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward model." arXiv preprint arXiv:2305.18290 (2023). https://arxiv.org/abs/2305.18290
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Kshitij
Outline:
00:00 DPO motivation
00:53 Finetuning …
direct preference optimization explained language language model llms optimization paper papers reinforcement reinforcement learning support
More from www.youtube.com / AI Coffee Break with Letitia
Stealing Part of a Production LLM | API protect LLMs no more
3 weeks, 2 days ago |
www.youtube.com
MAMBA and State Space Models explained | SSM explained
2 months, 2 weeks ago |
www.youtube.com
Why is DALL-E 3 better at following Text Prompts? — DALL-E 3 explained
5 months, 3 weeks ago |
www.youtube.com
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-
@ JPMorgan Chase & Co. | Wilmington, DE, United States
Senior ML Engineer (Speech/ASR)
@ ObserveAI | Bengaluru