Dec. 14, 2023, 10:10 p.m. | /u/APaperADay

Machine Learning www.reddit.com

**Blog post**: [https://openai.com/research/weak-to-strong-generalization](https://openai.com/research/weak-to-strong-generalization)

**Direct paper link**: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf](https://cdn.openai.com/papers/weak-to-strong-generalization.pdf)

**Code**: [https://github.com/openai/weak-to-strong](https://github.com/openai/weak-to-strong)

**Abstract**:

>Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak …

abstract alignment behavior example feedback future generated human human feedback humans machinelearning model behavior reinforcement reinforcement learning rlhf superhuman will

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US