Feb. 14, 2024, 5:42 a.m. | Yang Liu Peng Sun Hang Li

cs.LG updates on arXiv.org arxiv.org

By formally defining the training processes of large language models (LLMs), which usually encompasses pre-training, supervised fine-tuning, and reinforcement learning with human feedback, within a single and unified machine learning paradigm, we can glean pivotal insights for advancing LLM technologies. This position paper delineates the parallels between the training methods of LLMs and the strategies employed for the development of agents in two-player games, as studied in game theory, reinforcement learning, and multi-agent systems. We propose a re-conceptualization of LLM …

agents cs.cl cs.lg feedback fine-tuning games glean human human feedback insights language language models large language large language models llm llms machine machine learning paper paradigm pivotal pre-training processes reinforcement reinforcement learning supervised fine-tuning technologies training

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York