Nov. 5, 2023, 6:47 a.m. | Siyu Ren, Zhiyong Wu, Kenny Q. Zhu

cs.CL updates on arXiv.org arxiv.org

Neural language models are probabilistic models of human text. They are
predominantly trained using maximum likelihood estimation (MLE), which is
equivalent to minimizing the forward cross-entropy between the empirical data
distribution and the model distribution. However, various degeneration
phenomena are still widely observed when decoding from the distributions
learned by such models. We establish that the forward cross-entropy is
suboptimal as a distance metric for aligning human and model distribution due
to its (1) recall-prioritization (2) negative diversity ignorance and …

arxiv auto cross-entropy data decoding distribution earth entropy human language language models likelihood maximum likelihood estimation mle modeling mover optimization text

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120