MetaRM: Shifted Distributions Alignment via Meta-Learning | allainews.com

May 2, 2024, 4:42 a.m. | Shihan Dou, Yan Liu, Enyu Zhou, Tianlong Li, Haoxiang Jia, Limao Xiong, Xin Zhao, Junjie Ye, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.00438v1 Announce Type: new
Abstract: The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can …

abstract alignment arxiv capability cs.cl cs.lg distribution feedback however human human feedback language language model meta meta-learning policy process reinforcement reinforcement learning responses reward model rlhf success training type via

More from arxiv.org / cs.LG updates on arXiv.org

Transforming gradient-based techniques into interpretable methods 21 hours ago | arxiv.org

abstract arxiv challenges cnn +20

ChatQA: Surpassing GPT-4 on Conversational QA and RAG 21 hours ago | arxiv.org

arxiv conversational cs.ai cs.cl +7

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers 21 hours ago | arxiv.org

abstract arxiv cs.ai cs.cv +22

Calibrating Wireless Ray Tracing for Digital Twinning using Local Phase Error Estimates 21 hours ago | arxiv.org

abstract access arxiv construct +22

Graph Network Surrogate Model for Subsurface Flow Optimization 21 hours ago | arxiv.org

abstract arxiv co2 cs.lg +16

Double Machine Learning for Static Panel Models with Fixed Effects 21 hours ago | arxiv.org

abstract advances algorithms arxiv +20

Dynamic Adversarial Attacks on Autonomous Driving Systems 21 hours ago | arxiv.org

abstract adversarial adversarial attacks arxiv +22

BioCLIP: A Vision Foundation Model for the Tree of Life 21 hours ago | arxiv.org

arxiv cs.cl cs.cv cs.lg +7

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms 21 hours ago | arxiv.org

abstract algorithms arxiv building +12

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net