all AI news
Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. (arXiv:2205.05718v1 [cs.CL])
cs.LG updates on arXiv.org arxiv.org
Human language offers a powerful window into our thoughts -- we tell stories,
give explanations, and express our beliefs and goals through words. Abundant
evidence also suggests that language plays a developmental role in structuring
our learning. Here, we ask: how much of human-like thinking can be captured by
learning statistical patterns in language alone? We first contribute a new
challenge benchmark for comparing humans and distributional large language
models (LLMs). Our benchmark contains two problem-solving domains (planning and
explanation …
arxiv behavior benchmarking distribution human human-like language language models large language models reasoning