April 1, 2024, 4:42 a.m. | Peng Ding, Jiading Fang, Peng Li, Kangrui Wang, Xiaochen Zhou, Mo Yu, Jing Li, Matthew R. Walter, Hongyuan Mei

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.19913v1 Announce Type: cross
Abstract: Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks. In this paper, we propose MANGO, a benchmark to evaluate their capabilities to perform text-based mapping and navigation. Our benchmark includes 53 mazes taken from a suite of textgames: each maze is paired with a walkthrough that visits every location but does not cover all possible paths. The task is question-answering: for each maze, …

arxiv benchmark cs.ai cs.cl cs.lg cs.ro language language models large language large language models mapping navigation type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Senior DevOps/MLOps

@ Global Relay | Vancouver, British Columbia, Canada

Senior Statistical Programmer for Clinical Development

@ Novo Nordisk | Aalborg, North Denmark Region, DK

Associate, Data Analysis

@ JLL | USA-CLIENT Boulder CO-Google

AI Compiler Engineer, Model Optimization, Quantization & Framework

@ Renesas Electronics | Duesseldorf, Germany

Lead AI Security Researcher

@ Grammarly | United States; Hybrid