[R] Xwin-Math: A Series of Powerful SFT Math LLMs and Evaluation Toolkit | allainews.com

Nov. 24, 2023, 8:52 a.m. | /u/Left_Beat210

Machine Learning www.reddit.com

Hi, Xwin-Math is intended to promote the mathematical reasoning capabilities of LLMs. Now we release the first version, which is a series of Llama 2 SFT models with CoT prompt.

GitHub link: [Xwin-LM/Xwin-Math at main · Xwin-LM/Xwin-LM (github.com)](https://github.com/Xwin-LM/Xwin-LM/tree/main/Xwin-Math)

Model link: [Xwin-LM (Xwin-LM) (huggingface.co)](https://huggingface.co/Xwin-LM)

Gradio Demo: [Gradio](https://09776cc5ec5f786eb0.gradio.live/)

[Math capability on GSM8K and MATH benchmark](https://preview.redd.it/abwe37nml82c1.png?width=6200&format=png&auto=webp&s=d07e5b29ac86eebcea79d853c2d8be1e77e4d26d)

The [Xwin-Math-70B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-70B-V1.0) model achieves **31.8 pass@1 on MATH benchmark** and **87.0 pass@1 on GSM8K benchmark**. This performance places it first amongst all open-source CoT models.

The [Xwin-Math-7B-V1.0](https://huggingface.co/Xwin-LM/Xwin-Math-7B-V1.0) …

benchmarks capabilities evaluation llama llama 2 llms machinelearning math mathematical reasoning performance promote prompt reasoning release series sft toolkit

More from www.reddit.com / Machine Learning

[P] GPT-Burn: A simple & concise implementation of the GPT in pure Rust 🔥 3 hours ago | www.reddit.com

gpt implementation machinelearning rust +1

[R] 1:10 Radio Controlled Car autonomous driving 8 hours ago | www.reddit.com

advice autonomous autonomous driving cameras +13

[D] Machine Learning Engineers, what portion of your work is focused on deployment pipelines vs. … 18 hours ago | www.reddit.com

building data data engineer deployment +10

[D] How are subspace embeddings different from basic dimensionality reduction? 20 hours ago | www.reddit.com

advanced basic dimensionality embeddings +6

[P] Real Time Emotion Classification with FER-2013 dataset 1 day, 4 hours ago | www.reddit.com

accuracy classification dataset emotion +7

[D] Real chances to be accepted in NeurIPS 2024 - Other conferences 1 day, 9 hours ago | www.reddit.com

authors case conferences exit +5

[D] Seminal papers list since 2018 that will be considered cannon in the future 1 day, 11 hours ago | www.reddit.com

attention attention is all you need clip finally +13

[D] Are PyTorch high-level frameworks worth using? 1 day, 12 hours ago | www.reddit.com

biases experiment frameworks ignite +10

[D] Friday's Oxen.AI Water Cooler call: High-performance audio processing, Python vs Rust 1 day, 20 hours ago | www.reddit.com

audio conference data discuss +17

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net