LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | allainews.com

April 9, 2024, 4:50 a.m. | Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.05221v1 Announce Type: new
Abstract: Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the lack of two key elements: (1) an automatic method for evaluating the generated reasoning chains on different tasks, and (2) a unified formalism and …

abstract advanced analysis and analysis arxiv cs.ai cs.cl diverse evaluation interpretability language language models large language large language models library llm llms reasoning research robustness step-by-step strategies type

More from arxiv.org / cs.CL updates on arXiv.org

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring 20 hours ago | arxiv.org

abstract arxiv biases concerns +24

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks 20 hours ago | arxiv.org

arxiv building cs.ai cs.cl +5

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark 20 hours ago | arxiv.org

abstract analysis arxiv basic +28

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces 20 hours ago | arxiv.org

abstract ancestry arxiv authors +21

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification 20 hours ago | arxiv.org

arxiv classification cs.cl cs.cv +9

Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations 20 hours ago | arxiv.org

abstract art arxiv clip +17

Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Trafficking Text Corpora 20 hours ago | arxiv.org

abstract arxiv change climate +19

Large Language Models Show Human-like Social Desirability Biases in Survey Responses 20 hours ago | arxiv.org

abstract arxiv become behavior +25

Linearizing Large Language Models 20 hours ago | arxiv.org

arxiv cs.cl language language models +3

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net