all AI news
ReFT: Reasoning with Reinforced Fine-Tuning
June 28, 2024, 4:42 a.m. | Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li
cs.CL updates on arXiv.org arxiv.org
Abstract: One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This approach does not show sufficiently strong generalization ability, however, because the training only relies on the given CoT data. In math problem-solving, for example, there is usually only one annotated reasoning path for each question in the training data. Intuitively, it would be better for the algorithm to learn from multiple annotated …
abstract annotations arxiv capability cs.cl data example fine-tuning however language language models large language large language models llms math problem problem-solving reasoning replace sft show supervised fine-tuning thought training tuning type
More from arxiv.org / cs.CL updates on arXiv.org
ReFT: Reasoning with Reinforced Fine-Tuning
2 days, 11 hours ago |
arxiv.org
Exploring Defeasibility in Causal Reasoning
2 days, 11 hours ago |
arxiv.org
A Large Language Model Approach to Educational Survey Feedback Analysis
2 days, 11 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Scientist
@ Ford Motor Company | Chennai, Tamil Nadu, India
Systems Software Engineer, Graphics
@ Parallelz | Vancouver, British Columbia, Canada - Remote
Engineering Manager - Geo Engineering Team (F/H/X)
@ AVIV Group | Paris, France
Data Analyst
@ Microsoft | San Antonio, Texas, United States
Azure Data Engineer
@ TechVedika | Hyderabad, India
Senior Data & AI Threat Detection Researcher (Cortex)
@ Palo Alto Networks | Tel Aviv-Yafo, Israel