Feb. 26, 2024, 5:42 a.m. | Grgur Kova\v{c}, R\'emy Portelas, Masataka Sawayama, Peter Ford Dominey, Pierre-Yves Oudeyer

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.14846v1 Announce Type: cross
Abstract: The standard way to study Large Language Models (LLMs) through benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLM's highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence should be studied as another dimension of LLM comparison alongside others such …

abstract arxiv benchmarks context cs.ai cs.cl cs.lg language language models large language large language models llm llms multiple nature psychology queries questions role stability standard study through type values

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Principal, Product Strategy Operations, Cloud Data Analytics

@ Google | Sunnyvale, CA, USA; Austin, TX, USA

Data Scientist - HR BU

@ ServiceNow | Hyderabad, India