March 1, 2024, 4:19 p.m. | /u/nihalnayak

Machine Learning www.reddit.com

Excited to share our work on synthetic task generation.

Introducing Bonito 🐟, an open-source model that converts your raw, unannotated data into synthetic instruction tuning datasets. With it, you can easily create a specialized LLM for your proprietary and private data!

Check out our work below:
Paper: [https://arxiv.org/abs/2402.18334](https://arxiv.org/abs/2402.18334)
Code: [https://github.com/BatsResearch/bonito](https://github.com/BatsResearch/bonito)
Model: [https://huggingface.co/BatsResearch/bonito-v1](https://huggingface.co/BatsResearch/bonito-v1)

check data datasets generate llm machinelearning private data proprietary raw synthetic work zero-shot

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training

@ Amazon.com | Cupertino, California, USA