all AI news
Offline RL for Natural Language Generation with Implicit Language Q Learning. (arXiv:2206.11871v1 [cs.CL])
cs.LG updates on arXiv.org arxiv.org
Large language models distill broad knowledge from text corpora. However,
they can be inconsistent when it comes to completing user specified tasks. This
issue can be addressed by finetuning such models via supervised learning on
curated datasets, or via reinforcement learning. In this work, we propose a
novel offline RL motivated method, implicit language Q-learning (ILQL),
designed for use on language models, that combines both the flexible utility
optimization framework of traditional RL algorithms with supervised learning's
ability to leverage …
arxiv generation language language generation learning natural natural language natural language generation rl