Jan. 23, 2024, midnight | schmidphilipp1995@gmail.com (Philipp Schmid)

philschmid blog www.philschmid.de

In this blog post you will learn how to align LLMs using Hugging Face TRL and RLHF through Direct Preference Optimization (DPO).

blog direct preference optimization face generativeai hugging face huggingface learn llm llms optimization rlhf through will

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York