April 9, 2024, 4:51 a.m. | Qiusi Zhan, Richard Fang, Rohan Bindu, Akul Gupta, Tatsunori Hashimoto, Daniel Kang

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.05553v3 Announce Type: replace
Abstract: As large language models (LLMs) have increased in their capabilities, so does their potential for dual use. To reduce harmful outputs, produces and vendors of LLMs have used reinforcement learning with human feedback (RLHF). In tandem, LLM vendors have been increasingly enabling fine-tuning of their most powerful models. However, concurrent work has shown that fine-tuning can remove RLHF protections. We may expect that the most powerful models currently available (GPT-4) are less susceptible to fine-tuning …

abstract arxiv capabilities cs.ai cs.cl enabling feedback fine-tuning gpt gpt-4 human human feedback language language models large language large language models llm llms reduce reinforcement reinforcement learning rlhf type vendors via

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Science Analyst- ML/DL/LLM

@ Mayo Clinic | Jacksonville, FL, United States

Machine Learning Research Scientist, Robustness and Uncertainty

@ Nuro, Inc. | Mountain View, California (HQ)