Why transformers with causal mask perform better without mask when overfitting training data? | allainews.com

April 16, 2024, 11 a.m. | /u/cephtahrioh

Deep Learning www.reddit.com

I'm training a transformer model without a causal mask on the WikiText-2 Dataset to understand how a transformer would use future tokens in predicting the next token. However, based on my tests, a transformer without a causal mask is performing worse than one with a causal mask. Intuitively, this shouldn't be the case because the model has access to future tokens (and the next token itself) that should have statistical significance in predicting the next token.

For these results, I …

causal data dataset deeplearning future however next overfitting tests token tokens training training data transformer transformer model transformers

More from www.reddit.com / Deep Learning

How would one write the following loss function in python? I am currently stuck on … 13 hours ago | www.reddit.com

deeplearning function loss python

Tensorflow vs pytorch 17 hours ago | www.reddit.com

deep learning deeplearning hey library +5

What is best practice of augmentation on Imbalance dataset? 1 day, 10 hours ago | www.reddit.com

apply articles augmentation case +12

Serving fastchat on single GPU and 5 models! 1 day, 12 hours ago | www.reddit.com

a100 deeplearning gpu instance +9

Cheapest gpu to dip my toes into Ai. training? 1 day, 16 hours ago | www.reddit.com

advice deeplearning gpu investment +3

Can anyone suggest a good Cloud Computing service for me? 2 days, 5 hours ago | www.reddit.com

cloud cloud computing computing deeplearning +11

A visual deep dive into Uber's ML system to solve the billion dollar problem of … 2 days, 9 hours ago | www.reddit.com

algorithms attention billion deep dive +11

What are explainable neural networks? 2 days, 23 hours ago | www.reddit.com

algorithms attention basic biases +14

Stable LM 2 runs Offline on Android (Open Source) 3 days, 14 hours ago | www.reddit.com

android deeplearning offline open source +2

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Engineer - New Graduate

@ Applied Materials | Milan,ITA

View on ai-jobs.net

Lead Machine Learning Scientist

@ Biogen | Cambridge, MA, United States

View on ai-jobs.net