all AI news
H2O-Danube-1.8B Technical Report. (arXiv:2401.16818v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
We present H2O-Danube-1.8B, a 1.8B language model trained on 1T tokens
following the core principles of LLama 2 and Mistral. We leverage and refine
various techniques for pre-training large language models. Although our model
is trained on significantly fewer total tokens compared to reference models of
similar size, it exhibits highly competitive metrics across a multitude of
benchmarks. We additionally release a chat model trained with supervised
fine-tuning followed by direct preference optimization. We make H2O-Danube-1.8B
openly available under Apache …
arxiv core cs.cl h2o language language model language models large language large language models llama llama 2 metrics mistral pre-training reference refine report technical tokens total training