Web: http://arxiv.org/abs/2206.11871

June 24, 2022, 1:10 a.m. | Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine

cs.LG updates on arXiv.org arxiv.org

Large language models distill broad knowledge from text corpora. However,
they can be inconsistent when it comes to completing user specified tasks. This
issue can be addressed by finetuning such models via supervised learning on
curated datasets, or via reinforcement learning. In this work, we propose a
novel offline RL motivated method, implicit language Q-learning (ILQL),
designed for use on language models, that combines both the flexible utility
optimization framework of traditional RL algorithms with supervised learning's
ability to leverage …

arxiv generation language language generation learning natural natural language natural language generation rl

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY