Feb. 13, 2024, 5:42 a.m. | Zhen-Yu Zhang Siwei Han Huaxiu Yao Gang Niu Masashi Sugiyama

cs.LG updates on arXiv.org arxiv.org

To improve the ability of the large language model (LLMs) to handle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, facilitating problem solving from simple to complex tasks. State-of-the-art approaches for generating such a chain involve interactive collaboration, where the learner generates candidate intermediate thoughts, evaluated by the LLM, guiding the generation of subsequent thoughts. However, a widespread yet understudied problem is that the evaluation from the LLM is typically noisy and unreliable, potentially …

