May 3, 2023, 6:45 p.m. | /u/JonDurbin

Machine Learning www.reddit.com

**TL;DR the alpaca dataset has some issues, and the code was super slow. I updated it to be much faster, and it supports the chat completion API so you can use gpt-3.5-turbo for 1/10 the cost as well as gpt-4, and it uses the databricks dolly 15k dataset for samples.**

### Project/data resources

* [GitHub Repo](https://github.com/jondurbin/airoboros)
* [100k synthetic prompts, gpt-3.5-turbo](https://storage.googleapis.com/airoboros-dump/gpt-3.5-turbo-100k/instructions.jsonl)
* [random seed topics used](https://storage.googleapis.com/airoboros-dump/gpt-3.5-turbo-100k/topics.txt)

### Usage

(Python) install: `pip install airoboros`

Be sure to set `OPENAI_API_KEY` or pass …

alpaca api chat code cost databricks dataset dolly faster gpt gpt-3 gpt-3.5 gpt-4 machinelearning prompt synthetic

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne