March 16, 2022, 4:38 p.m. | /u/Thomjazz

Machine Learning www.reddit.com

The \[BigScience project\]([https://bigscience.huggingface.co](https://bigscience.huggingface.co)) has just started the training of its main model and the training can be **followed live** here: [https://twitter.com/BigScienceLLM](https://twitter.com/BigScienceLLM) and here: [https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard#scalars&tagFilter=loss](https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard#scalars&tagFilter=loss)

Here are more information on the model, dataset, engineering, training and hardware:

1. **The model**:

* 176B parameters decoder-only architecture (GPT-like)
* 70 layers - 112 attention heads per layers - hidden dimensionality of 14336 - 2048 tokens sequence length
* ALiBi positional embeddings - GeLU activation function
* **Read more**:
* Blog post summarizing how …

bigscience language language model machinelearning training

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne