all AI news for `reward model` | allainews.com

Self-Instruct Framework, Explained 6 days ago | towardsdatascience.com

alignment challenges dall explained +24

Soft Preference Optimization: Aligning Language Models to Expert Distributions 6 days, 2 hours ago | arxiv.org

abstract arxiv cs.ai cs.lg +17

MetaRM: Shifted Distributions Alignment via Meta-Learning 1 week ago | arxiv.org

abstract alignment arxiv capability +22

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling 2 weeks, 2 days ago | arxiv.org

abstract arxiv behavior cs.lg +9

Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration 2 weeks, 3 days ago | arxiv.org

abstract arxiv collaboration cs.cl +17

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment 2 weeks, 6 days ago | arxiv.org

abstract alignment arxiv cross-lingual +20

NEW WizardLM-2 8x22B: Fine-tune & Stage-DPO align 3 weeks, 2 days ago | www.youtube.com

huggingface llm microsoft mistral +9

Towards Understanding the Influence of Reward Margin on Preference Model Performance 1 month ago | arxiv.org

abstract arxiv challenges cs.ai +20

Aligning Diffusion Models by Optimizing Human Utility 1 month ago | arxiv.org

abstract alignment arxiv cs.cv +13

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data 1 month ago | arxiv.org

abstract arxiv cs.ai cs.cl +19

Asymptotics of Language Model Alignment 1 month ago | arxiv.org

abstract alignment arxiv cs.it +12

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment 1 month ago | arxiv.org

abstract alignment arxiv cs.ai +14

Prior Constraints-based Reward Model Training for Aligning Large Language Models 1 month ago | arxiv.org

abstract arxiv comparison constraints +20

Alibaba Researchers Propose Reward Learning on Policy (RLP): An Unsupervised AI Framework that Refines a … 1 month, 1 week ago | www.marktechpost.com

ai framework ai paper summary ai shorts alibaba +29

Preference as Reward, Maximum Preference Optimization with Importance Sampling 1 month, 1 week ago | arxiv.org

abstract algorithm arxiv cs.ai +19

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model 1 month, 1 week ago | arxiv.org

arxiv cs.ai cs.cv cs.lg +7

[D] Is DPO still the best way to affordably fine-tune a model? 1 month, 2 weeks ago | www.reddit.com

direct preference optimization human language language model +6

Google AI Proposes PERL: A Parameter Efficient Reinforcement Learning Technique that can Train a Reward … 1 month, 2 weeks ago | www.reddit.com

google language language model lora +7

Google AI Proposes PERL: A Parameter Efficient Reinforcement Learning Technique that can Train a Reward … 1 month, 2 weeks ago | www.marktechpost.com

ai paper summary ai shorts alignment applications +29

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback 1 month, 2 weeks ago | arxiv.org

abstract agent agents annotation +12

Reward Model Ensembles Help Mitigate Overoptimization 1 month, 3 weeks ago | arxiv.org

abstract arxiv cs.lg feedback +17

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation 1 month, 4 weeks ago | arxiv.org

abstract adversarial arxiv cs.ai +21

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences 2 months ago | arxiv.org

abstract analysis arxiv comparative analysis +16

Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization 2 months, 1 week ago | arxiv.org

abstract arxiv become case +27

Understanding Direct Preference Optimization 2 months, 2 weeks ago | towardsdatascience.com

ai author blog dall +15

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model 2 months, 2 weeks ago | arxiv.org

arxiv cs.ai cs.cl exploration +9

This AI Paper from Google AI Proposes Online AI Feedback (OAIF): A Simple and Effective … 2 months, 2 weeks ago | www.marktechpost.com

advantages ai paper ai shorts alignment +25

SemiReward: A General Reward Model for Semi-supervised Learning 2 months, 2 weeks ago | arxiv.org

arxiv cs.ai cs.lg general +5

Bayesian Reward Models for LLM Alignment 2 months, 2 weeks ago | arxiv.org

abstract alignment arxiv bayesian +17

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations 2 months, 2 weeks ago | arxiv.org

abstract annotations arxiv breaking +19

Direct Preference Optimization with an Offset 2 months, 2 weeks ago | arxiv.org

abstract arxiv binary cs.ai +19

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences 2 months, 3 weeks ago | arxiv.org

abstract alignment arxiv cs.ai +21

Mitigating Reward Hacking via Information-Theoretic Reward Modeling 2 months, 3 weeks ago | arxiv.org

abstract arxiv challenge cs.ai +19

Enhancing Language Model Alignment through Reward Transformation and Multi-Objective Optimization 2 months, 3 weeks ago | www.marktechpost.com

accuracy ai shorts alignment applications +22

Direct Language Model Alignment from Online AI Feedback 3 months ago | arxiv.org

alignment cs.ai cs.cl cs.hc +11

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models 3 months ago | arxiv.org

beyond cs.lg divergence feedback +13

Preference Poisoning Attacks on Reward Model Learning 3 months ago | arxiv.org

application attacks cs.ai cs.cl +12

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking 3 months ago | www.youtube.com

abstract alignment discuss distribution +12

Advancing AI Alignment with Human Values Through WARM 3 months ago | www.unite.ai

ai alignment ai systems algorithms alignment +21

Self-Instruct Framework, Explained 6 days ago | towardsdatascience.com

alignment challenges dall explained +24

Items published with this topic over the last 90 days.

Latest

Self-Instruct Framework, Explained 6 days ago | towardsdatascience.com

alignment challenges dall explained +24

Soft Preference Optimization: Aligning Language Models to Expert Distributions 6 days, 2 hours ago | arxiv.org

abstract arxiv cs.ai cs.lg +17

MetaRM: Shifted Distributions Alignment via Meta-Learning 1 week ago | arxiv.org

abstract alignment arxiv capability +22

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling 2 weeks, 2 days ago | arxiv.org

abstract arxiv behavior cs.lg +9

Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration 2 weeks, 3 days ago | arxiv.org

abstract arxiv collaboration cs.cl +17

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment 2 weeks, 6 days ago | arxiv.org

abstract alignment arxiv cross-lingual +20

NEW WizardLM-2 8x22B: Fine-tune & Stage-DPO align 3 weeks, 2 days ago | www.youtube.com

huggingface llm microsoft mistral +9

Towards Understanding the Influence of Reward Margin on Preference Model Performance 1 month ago | arxiv.org

abstract arxiv challenges cs.ai +20

Aligning Diffusion Models by Optimizing Human Utility 1 month ago | arxiv.org

abstract alignment arxiv cs.cv +13

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data 1 month ago | arxiv.org

abstract arxiv cs.ai cs.cl +19

Asymptotics of Language Model Alignment 1 month ago | arxiv.org

abstract alignment arxiv cs.it +12

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment 1 month ago | arxiv.org

abstract alignment arxiv cs.ai +14

Prior Constraints-based Reward Model Training for Aligning Large Language Models 1 month ago | arxiv.org

abstract arxiv comparison constraints +20

Alibaba Researchers Propose Reward Learning on Policy (RLP): An Unsupervised AI Framework that Refines a … 1 month, 1 week ago | www.marktechpost.com

ai framework ai paper summary ai shorts alibaba +29

Preference as Reward, Maximum Preference Optimization with Importance Sampling 1 month, 1 week ago | arxiv.org

abstract algorithm arxiv cs.ai +19

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model 1 month, 1 week ago | arxiv.org

arxiv cs.ai cs.cv cs.lg +7

[D] Is DPO still the best way to affordably fine-tune a model? 1 month, 2 weeks ago | www.reddit.com

direct preference optimization human language language model +6

Google AI Proposes PERL: A Parameter Efficient Reinforcement Learning Technique that can Train a Reward … 1 month, 2 weeks ago | www.reddit.com

google language language model lora +7

Google AI Proposes PERL: A Parameter Efficient Reinforcement Learning Technique that can Train a Reward … 1 month, 2 weeks ago | www.marktechpost.com

ai paper summary ai shorts alignment applications +29

Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback 1 month, 2 weeks ago | arxiv.org

abstract agent agents annotation +12

Reward Model Ensembles Help Mitigate Overoptimization 1 month, 3 weeks ago | arxiv.org

abstract arxiv cs.lg feedback +17

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation 1 month, 4 weeks ago | arxiv.org

abstract adversarial arxiv cs.ai +21

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences 2 months ago | arxiv.org

abstract analysis arxiv comparative analysis +16

Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization 2 months, 1 week ago | arxiv.org

abstract arxiv become case +27

Understanding Direct Preference Optimization 2 months, 2 weeks ago | towardsdatascience.com

ai author blog dall +15

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model 2 months, 2 weeks ago | arxiv.org

arxiv cs.ai cs.cl exploration +9

This AI Paper from Google AI Proposes Online AI Feedback (OAIF): A Simple and Effective … 2 months, 2 weeks ago | www.marktechpost.com

advantages ai paper ai shorts alignment +25

SemiReward: A General Reward Model for Semi-supervised Learning 2 months, 2 weeks ago | arxiv.org

arxiv cs.ai cs.lg general +5

Bayesian Reward Models for LLM Alignment 2 months, 2 weeks ago | arxiv.org

abstract alignment arxiv bayesian +17

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations 2 months, 2 weeks ago | arxiv.org

abstract annotations arxiv breaking +19

Direct Preference Optimization with an Offset 2 months, 2 weeks ago | arxiv.org

abstract arxiv binary cs.ai +19

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences 2 months, 3 weeks ago | arxiv.org

abstract alignment arxiv cs.ai +21

Mitigating Reward Hacking via Information-Theoretic Reward Modeling 2 months, 3 weeks ago | arxiv.org

abstract arxiv challenge cs.ai +19

Enhancing Language Model Alignment through Reward Transformation and Multi-Objective Optimization 2 months, 3 weeks ago | www.marktechpost.com

accuracy ai shorts alignment applications +22

Direct Language Model Alignment from Online AI Feedback 3 months ago | arxiv.org

alignment cs.ai cs.cl cs.hc +11

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models 3 months ago | arxiv.org

beyond cs.lg divergence feedback +13

Preference Poisoning Attacks on Reward Model Learning 3 months ago | arxiv.org

application attacks cs.ai cs.cl +12

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking 3 months ago | www.youtube.com

abstract alignment discuss distribution +12

Advancing AI Alignment with Human Values Through WARM 3 months ago | www.unite.ai

ai alignment ai systems algorithms alignment +21

Topic trend (last 90 days)

Top (last 7 days)

Self-Instruct Framework, Explained 6 days ago | towardsdatascience.com

alignment challenges dall explained +24

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net