Why transformers with causal mask perform better without mask when overfitting training data? | allainews.com

April 16, 2024, 11 a.m. | /u/cephtahrioh

Deep Learning www.reddit.com

I'm training a transformer model without a causal mask on the WikiText-2 Dataset to understand how a transformer would use future tokens in predicting the next token. However, based on my tests, a transformer without a causal mask is performing worse than one with a causal mask. Intuitively, this shouldn't be the case because the model has access to future tokens (and the next token itself) that should have statistical significance in predicting the next token.

For these results, I …

causal data dataset deeplearning future however next overfitting tests token tokens training training data transformer transformer model transformers

More from www.reddit.com / Deep Learning

Best Resources to Learn Computer Vision in 2024 8 hours ago | www.reddit.com

computer computer vision deeplearning learn +2

Any tips how to start DL? 1 day, 4 hours ago | www.reddit.com

artificial artificial intelligence data data science +10

How Netflix Uses Machine Learning To Decide What Content To Create Next For Its 260M … 2 days, 6 hours ago | www.reddit.com

create deeplearning embeddings guide +8

What amount of data makes up a tensor? 2 days, 13 hours ago | www.reddit.com

current data deeplearning functions +8

What are the best websites to find state-of-the-art (SOTA) deep learning models at the moment? 3 days, 9 hours ago | www.reddit.com

art classification deep learning deeplearning +8

Why does IA still struggle with colorization of old movies. 3 days, 17 hours ago | www.reddit.com

colorization data deeplearning look +7

how to utilize my time? 3 days, 23 hours ago | www.reddit.com

basics computer computer vision deep learning +7

Training an Small Language Model 4 days, 3 hours ago | www.reddit.com

architecture dataset deeplearning language +8

[Advice] Master in AI or Math (if you are bad at math) 4 days, 7 hours ago | www.reddit.com

advice computer computer science deep learning +7

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net