Feb. 26, 2024, 5:42 a.m. | Grgur Kova\v{c}, R\'emy Portelas, Masataka Sawayama, Peter Ford Dominey, Pierre-Yves Oudeyer

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.14846v1 Announce Type: cross
Abstract: The standard way to study Large Language Models (LLMs) through benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLM's highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence should be studied as another dimension of LLM comparison alongside others such …

abstract arxiv benchmarks context cs.ai cs.cl cs.lg language language models large language large language models llm llms multiple nature psychology queries questions role stability standard study through type values

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A

GN SONG MT Market Research Data Analyst 09

@ Accenture | Bengaluru, BDC7A