Feb. 29, 2024, 5:48 a.m. | Shuo Yang, Gjergji Kasneci

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.18284v1 Announce Type: new
Abstract: Wide usage of ChatGPT has highlighted the potential of reinforcement learning from human feedback. However, its training pipeline relies on manual ranking, a resource-intensive process. To reduce labor costs, we propose a self-supervised text ranking approach for applying Proximal-Policy-Optimization to fine-tune language models while eliminating the need for human annotators. Our method begins with probabilistic sampling to encourage a language model to generate diverse responses for each input. We then employ TextRank and ISODATA algorithms …

abstract arxiv bank breaking chatgpt cost costs crowdsourcing cs.ai cs.cl feedback fine-tuning human human feedback labor language language models optimization pipeline policy process ranking reduce reinforcement reinforcement learning text text ranking training training pipeline type usage

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US