March 16, 2022, 4:38 p.m. | /u/Thomjazz

Machine Learning www.reddit.com

The \[BigScience project\]([https://bigscience.huggingface.co](https://bigscience.huggingface.co)) has just started the training of its main model and the training can be **followed live** here: [https://twitter.com/BigScienceLLM](https://twitter.com/BigScienceLLM) and here: [https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard#scalars&tagFilter=loss](https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard#scalars&tagFilter=loss)

Here are more information on the model, dataset, engineering, training and hardware:

1. **The model**:

* 176B parameters decoder-only architecture (GPT-like)
* 70 layers - 112 attention heads per layers - hidden dimensionality of 14336 - 2048 tokens sequence length
* ALiBi positional embeddings - GeLU activation function
* **Read more**:
* Blog post summarizing how …

bigscience language language model machinelearning training

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Senior Applied Data Scientist

@ dunnhumby | London

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV