all AI news
[R] Xwin-Math: A Series of Powerful SFT Math LLMs and Evaluation Toolkit
Nov. 24, 2023, 8:52 a.m. | /u/Left_Beat210
Machine Learning www.reddit.com
GitHub link: [Xwin-LM/Xwin-Math at main · Xwin-LM/Xwin-LM (github.com)](https://github.com/Xwin-LM/Xwin-LM/tree/main/Xwin-Math)
Model link: [Xwin-LM (Xwin-LM) (huggingface.co)](https://huggingface.co/Xwin-LM)
Gradio Demo: [Gradio](https://09776cc5ec5f786eb0.gradio.live/)
[Math capability on GSM8K and MATH benchmark](https://preview.redd.it/abwe37nml82c1.png?width=6200&format=png&auto=webp&s=d07e5b29ac86eebcea79d853c2d8be1e77e4d26d)
The [Xwin-Math-70B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-70B-V1.0) model achieves **31.8 pass@1 on MATH benchmark** and **87.0 pass@1 on GSM8K benchmark**. This performance places it first amongst all open-source CoT models.
The [Xwin-Math-7B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-7B-V1.0) …
benchmarks capabilities evaluation llama llama 2 llms machinelearning math mathematical reasoning performance promote prompt reasoning release series sft toolkit
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US