all AI news
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles
Feb. 16, 2024, 5:43 a.m. | Zhiwei Tang, Dmitry Rybin, Tsung-Hui Chang
cs.LG updates on arXiv.org arxiv.org
Abstract: In this study, we delve into an emerging optimization challenge involving a black-box objective function that can only be gauged via a ranking oracle-a situation frequently encountered in real-world scenarios, especially when the function is evaluated by human judges. Such challenge is inspired from Reinforcement Learning with Human Feedback (RLHF), an approach recently employed to enhance the performance of Large Language Models (LLMs) using human guidance. We introduce ZO-RankSGD, an innovative zeroth-order optimization algorithm designed …
abstract arxiv box challenge cs.ai cs.lg feedback function human human feedback judges optimization oracle ranking reinforcement study type via world
More from arxiv.org / cs.LG updates on arXiv.org
Efficient Data-Driven MPC for Demand Response of Commercial Buildings
2 days, 17 hours ago |
arxiv.org
Testing the Segment Anything Model on radiology data
2 days, 17 hours ago |
arxiv.org
Calorimeter shower superresolution
2 days, 17 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US