PERL: Parameter Efficient Reinforcement Learning from Human Feedback | allainews.com

March 19, 2024, 4:41 a.m. | Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Simral

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.10704v1 Announce Type: new
Abstract: Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu et al. [2021]. We investigate the setup of "Parameter Efficient Reinforcement Learning" (PERL), in which …

abstract arxiv cs.ai cs.cl cs.lg feedback human human feedback language language models large language large language models llms perl process reinforcement reinforcement learning rlhf study training training models type work

More from arxiv.org / cs.LG updates on arXiv.org

Gland Segmentation Via Dual Encoders and Boundary-Enhanced Attention 7 hours ago | arxiv.org

abstract arxiv attention automated +8

Sliced Wasserstein with Random-Path Projecting Directions 7 hours ago | arxiv.org

abstract applications arxiv cs.ai +12

TIM: An Efficient Temporal Interaction Module for Spiking Transformer 7 hours ago | arxiv.org

arxiv cs.cv cs.lg cs.ne +3

Accuracy vs Memory Advantage in the Quantum Simulation of Stochastic Processes 7 hours ago | arxiv.org

abstract accuracy arxiv assumptions +20

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure 7 hours ago | arxiv.org

abstract arxiv biology cs.lg +18

Large Language Models can Strategically Deceive their Users when Put Under Pressure 7 hours ago | arxiv.org

abstract agent arxiv behavior +11

Learning Extrinsic Dexterity with Parameterized Manipulation Primitives 7 hours ago | arxiv.org

arxiv cs.lg cs.ro manipulation +1

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People 7 hours ago | arxiv.org

arxiv cs.lg cs.ro localization +3

Diffusion Models as Stochastic Quantization in Lattice Field Theory 7 hours ago | arxiv.org

abstract arxiv cs.lg diffusion +15

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net