all AI news
[R] Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision - OpenAI Superalignment Team 2023
Dec. 14, 2023, 10:10 p.m. | /u/APaperADay
Machine Learning www.reddit.com
**Direct paper link**: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf](https://cdn.openai.com/papers/weak-to-strong-generalization.pdf)
**Code**: [https://github.com/openai/weak-to-strong](https://github.com/openai/weak-to-strong)
**Abstract**:
>Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak …
abstract alignment behavior example feedback future generated human human feedback humans machinelearning model behavior reinforcement reinforcement learning rlhf superhuman will
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US