Aug. 30, 2023, 1:47 a.m. | /u/seventh_day123

Machine Learning www.reddit.com

We've done some experiments recently,

see the tech report: [https://arxiv.org/abs/2308.12050v1](https://arxiv.org/abs/2308.12050v1)

We train an SFT model and an RM model, then align the LLM with DT/MLE with filtering (ReST) + RM /SFT datasets/SFT model-generated samples

https://preview.redd.it/195op5q636lb1.png?width=1081&format=png&auto=webp&s=a9fa862e8a9ab05819484af8619f73d918fdc26a

DT is the Decision Transformer alignment

MLE is the ReST-like alignment

https://preview.redd.it/u6x28fook5lb1.png?width=1118&format=png&auto=webp&s=4a87898129c1238c00071d43809f5daf440b26d8

alignment datasets decision deepmind filtering generated llm machinelearning mle rest transformer

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States