[R] Headless Language Models: Learning without Predicting with Contrastive Weight Tying | allainews.com

Sept. 19, 2023, 9:40 p.m. | /u/nthngdy

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2309.08351](https://arxiv.org/abs/2309.08351)

>Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT). We apply this approach to pretrain Headless Language Models in both monolingual and multilingual contexts. Our method offers practical advantages, substantially reducing training computational requirements by up to 20 times, while simultaneously …

apply embeddings fashion headless language language models machinelearning multilingual prediction pre-training probability study token training

More from www.reddit.com / Machine Learning

[R] A Careful Examination of Large Language Model Performance on Grade School Arithmetic 3 hours ago | www.reddit.com

abstract benchmark benchmarks claim +21

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 10 hours ago | www.reddit.com

apps devices edge embed +15

[D] Any-dimensional equivariant neural networks 11 hours ago | www.reddit.com

abstract assumptions authors cases +18

[D] Geometrical meaning of Layer Normalization 15 hours ago | www.reddit.com

hyperplane layer machinelearning mean +4

How are large network attack datasets made? [p] 16 hours ago | www.reddit.com

attacks datasets detection free +5

A Multi-Agent game where LLMs must trick each other as humans until one gets caught … 18 hours ago | www.reddit.com

agent fun game humans +7

[D] How reliable is RAG currently? 19 hours ago | www.reddit.com

context context window documents machinelearning +5

[N] New Challenges in DIAMBRA Arena: 3 epic additions to our lineup of RL environments! 19 hours ago | www.reddit.com

arena challenges environments epic +1

[R] An Analysis of Linear Time Series Forecasting Models 21 hours ago | www.reddit.com

abstract analysis forecasting form +9

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net