[R] Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision - OpenAI Superalignment Team 2023 | allainews.com

Dec. 14, 2023, 10:10 p.m. | /u/APaperADay

Machine Learning www.reddit.com

**Blog post**: [https://openai.com/research/weak-to-strong-generalization](https://openai.com/research/weak-to-strong-generalization)

**Direct paper link**: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf](https://cdn.openai.com/papers/weak-to-strong-generalization.pdf)

**Code**: [https://github.com/openai/weak-to-strong](https://github.com/openai/weak-to-strong)

**Abstract**:

>Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak …

abstract alignment behavior example feedback future generated human human feedback humans machinelearning model behavior reinforcement reinforcement learning rlhf superhuman will

More from www.reddit.com / Machine Learning

[P] N-way-attention 4 hours ago | www.reddit.com

algorithm attention concept every +12

[D] Is it possible to train ViTMAE with Hyperspectral Satellite Images? 14 hours ago | www.reddit.com

encoder format images learn +4

[D] Mamba Convergence speed 17 hours ago | www.reddit.com

class convergence dataset example +10

[Project] Tabletop HandyBot: low-cost robotic arm assistant for tabletop tasks 21 hours ago | www.reddit.com

arm assistant cost functional +9

[R] Grounding DINO 1.5 Release: the most capable open-set detection model 22 hours ago | www.reddit.com

building dataset detection foundation +12

[D] Foundational Time Series Models Overrated? 22 hours ago | www.reddit.com

chronos domain etc example +13

[project] YOLOv8 quantized in INT8 22 hours ago | www.reddit.com

fps github jetson jetson orin +5

[R] Do Llamas Work in English? On the Latent Language of Multilingual Transformers 22 hours ago | www.reddit.com

abstract bias colab english +19

[R] Robust agents learn causal world models 23 hours ago | www.reddit.com

abstract agent agents biases +14

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net