June 24, 2022, 1:12 a.m. | Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine

cs.CL updates on arXiv.org arxiv.org

Large language models distill broad knowledge from text corpora. However,
they can be inconsistent when it comes to completing user specified tasks. This
issue can be addressed by finetuning such models via supervised learning on
curated datasets, or via reinforcement learning. In this work, we propose a
novel offline RL motivated method, implicit language Q-learning (ILQL),
designed for use on language models, that combines both the flexible utility
optimization framework of traditional RL algorithms with supervised learning's
ability to leverage …

arxiv generation language language generation learning natural natural language natural language generation rl

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Engineer

@ Parker | New York City

Sr. Data Analyst | Home Solutions

@ Three Ships | Raleigh or Charlotte, NC