May 5, 2023, 9:34 a.m. | /u/Raikoya

Machine Learning www.reddit.com

[https://huggingface.co/blog/starcoder](https://huggingface.co/blog/starcoder)

>StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. Similar to LLaMA, we trained a \~15B parameter model for 1 trillion tokens. We fine-tuned StarCoderBase model for 35B Python tokens, resulting in a new model that we call StarCoder.

call code code llms data git github jupyter jupyter notebooks language language models languages large language models llama llms machinelearning notebooks programming programming languages python starcoder tokens

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (CPS-GfK)

@ GfK | Bucharest

Consultant Data Analytics IT Digital Impulse - H/F

@ Talan | Paris, France

Data Analyst

@ Experian | Mumbai, India

Data Scientist

@ Novo Nordisk | Princeton, NJ, US

Data Architect IV

@ Millennium Corporation | United States