Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. (arXiv:2205.05718v1 [cs.CL]) | allainews.com

May 13, 2022, 1:11 a.m. | Katherine M. Collins, Catherine Wong, Jiahai Feng, Megan Wei, Joshua B. Tenenbaum

cs.LG updates on arXiv.org arxiv.org

Human language offers a powerful window into our thoughts -- we tell stories,
give explanations, and express our beliefs and goals through words. Abundant
evidence also suggests that language plays a developmental role in structuring
our learning. Here, we ask: how much of human-like thinking can be captured by
learning statistical patterns in language alone? We first contribute a new
challenge benchmark for comparing humans and distributional large language
models (LLMs). Our benchmark contains two problem-solving domains (planning and
explanation …

arxiv behavior benchmarking distribution human human-like language language models large language models reasoning

More from arxiv.org / cs.LG updates on arXiv.org

Discovering Nuclear Models from Symbolic Machine Learning 14 hours ago | arxiv.org

abstract arxiv behavior challenge +12

Advancing Network Intrusion Detection: Integrating Graph Neural Networks with Scattering Transform and Node2Vec for Enhanced … 14 hours ago | arxiv.org

abstract analysis anomaly anomaly detection +19

A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection 14 hours ago | arxiv.org

arxiv closer look covid covid-19 +9

RELIANCE: Reliable Ensemble Learning for Information and News Credibility Evaluation 14 hours ago | arxiv.org

abstract arxiv challenge cs.cl +19

Artwork Protection Against Neural Style Transfer Using Locally Adaptive Adversarial Color Attack 14 hours ago | arxiv.org

abstract adversarial artists artwork +18

GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical … 14 hours ago | arxiv.org

abstract arxiv clinical cs.cv +19

Isolated pulsar population synthesis with simulation-based inference 14 hours ago | arxiv.org

abstract arxiv astro-ph.he astro-ph.im +15

Domain-Specific Fine-Tuning of Large Language Models for Interactive Robot Programming 14 hours ago | arxiv.org

abstract advanced applications arxiv +27

Training of Neural Networks with Uncertain Data -- A Mixture of Experts Approach 14 hours ago | arxiv.org

abstract arxiv cs.lg data +17

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst (Commercial Excellence)

@ Allegro | Poznan, Warsaw, Poland

View on ai-jobs.net

Senior Machine Learning Engineer

@ Motive | Pakistan - Remote

View on ai-jobs.net

Summernaut Customer Facing Data Engineer

@ Celonis | Raleigh, US, North Carolina

View on ai-jobs.net

Data Engineer Mumbai

@ Nielsen | Mumbai, India

View on ai-jobs.net