July 26, 2023, 2:46 p.m. | /u/Emergency_Apricot_77

Machine Learning www.reddit.com

Hi,

I want to get hands-on with the RLHF pipeline. I found an online reward model that can be potentially used [https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2](https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2)

One thing that's unclear is how can I use this model for fine-tuning something like GPTNeoX-20B? My end goal is currently just a one-shot answering model (not necessarily a chat)

chat fine-tuning machinelearning ppo rlhf something

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston