all AI news
[N] Live and open training of BigScience's 176B multilingual language model has just started
March 16, 2022, 4:38 p.m. | /u/Thomjazz
Machine Learning www.reddit.com
Here are more information on the model, dataset, engineering, training and hardware:
1. **The model**:
* 176B parameters decoder-only architecture (GPT-like)
* 70 layers - 112 attention heads per layers - hidden dimensionality of 14336 - 2048 tokens sequence length
* ALiBi positional embeddings - GeLU activation function
* **Read more**:
* Blog post summarizing how …
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne