April 30, 2024, 4:50 a.m. | Aaron J. Li, Satyapriya Krishna, Himabindu Lakkaraju

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.18870v1 Announce Type: new
Abstract: The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power. Despite the effectiveness of preference learning algorithms like Reinforcement Learning From Human Feedback (RLHF) in aligning human preferences, their assumed improvements on model trustworthiness haven't been thoroughly testified. Toward this end, this study investigates how models that have been …

abstract alignment arxiv cognitive cs.ai cs.cl development exploit human impact language language model language models large language large language models llms performance power rlhf tasks trust type values

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US