Web: https://www.reddit.com/r/deeplearning/comments/uogjud/q_predicting_only_one_token_in_autoregressive/

May 13, 2022, 1:33 a.m. | /u/Novel_Cucumber_1588

Deep Learning reddit.com

I'm training an autoregressive model (transformer encoder + decoder)

where a text is given as input and output is also text(both tokenized).

I've been using nll loss, and even though the nll loss decreased significantly, the predictions are just a bunch of repetition of a single token.

for example,

input: hello world

output: aaaaaaaaaaaaaaaaa


I've been looking in to the model architecture and loss function, but can't catch any bugs in it yet.


Could you suggest any tips …

autoregressive model deeplearning model training

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California