A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning | allainews.com

March 26, 2024, 4:52 a.m. | Ruixin Hong, Hongming Zhang, Xinyu Pang, Dong Yu, Changshui Zhang

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.07954v2 Announce Type: replace-cross
Abstract: Logical reasoning has been an ongoing pursuit in the field of AI. Despite significant advancements made by large language models (LLMs), they still struggle with complex logical reasoning problems. To enhance reasoning performance, one promising direction is scalable oversight, which requires LLMs to identify their own errors and then improve by themselves. Various self-verification methods have been proposed in pursuit of this goal. Nevertheless, whether existing models understand their own errors well is still under …

abstract arxiv closer look cs.ai cs.cl language language models large language large language models llms look oversight performance reasoning scalable struggle type verification

More from arxiv.org / cs.CL updates on arXiv.org

Learning Approximate and Exact Numeral Systems via Reinforcement Learning 20 hours ago | arxiv.org

abstract arxiv communication cs.ai +13

PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games 20 hours ago | arxiv.org

abstract applications artificial arxiv +20

Enhancing Trust in LLM-Generated Code Summaries with Calibrated Confidence Scores 20 hours ago | arxiv.org

abstract arxiv code confidence +14

Multi-hop Question Answering over Knowledge Graphs using Large Language Models 20 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +25

Blind Spots and Biases: Exploring the Role of Annotator Cognitive Biases in NLP 20 hours ago | arxiv.org

abstract academia artificial artificial intelligence +21

Better & Faster Large Language Models via Multi-token Prediction 20 hours ago | arxiv.org

abstract arxiv cs.cl efficiency +20

Iterative Reasoning Preference Optimization 20 hours ago | arxiv.org

abstract arxiv chen cs.ai +12

ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children … 20 hours ago | arxiv.org

abstract arxiv challenges children +18

Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language … 20 hours ago | arxiv.org

abstract arxiv automate automated +20

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Scientist

@ Publicis Groupe | New York City, United States

View on ai-jobs.net

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India

View on ai-jobs.net