May 13, 2022, 1:33 a.m. | /u/Novel_Cucumber_1588

Deep Learning www.reddit.com

I'm training an autoregressive model (transformer encoder + decoder)

where a text is given as input and output is also text(both tokenized).

I've been using nll loss, and even though the nll loss decreased significantly, the predictions are just a bunch of repetition of a single token.

for example,

input: hello world

output: aaaaaaaaaaaaaaaaa

​

I've been looking in to the model architecture and loss function, but can't catch any bugs in it yet.

​

Could you suggest any tips …

autoregressive model deeplearning training

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Management Assistant

@ World Vision | Amman Office, Jordan

Cloud Data Engineer, Global Services Delivery, Google Cloud

@ Google | Buenos Aires, Argentina