June 5, 2024, 4:44 a.m. | Maxime Griot, Jean Vanderdonckt, Demet Yuksel, Coralie Hemptinne

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.02394v1 Announce Type: cross
Abstract: Large Language Models (LLMs) like ChatGPT demonstrate significant potential in the medical field, often evaluated using multiple-choice questions (MCQs) similar to those found on the USMLE. Despite their prevalence in medical education, MCQs have limitations that might be exacerbated when assessing LLMs. To evaluate the effectiveness of MCQs in assessing the performance of LLMs, we developed a fictional medical benchmark focused on a non-existent gland, the Glianorex. This approach allowed us to isolate the knowledge …

abstract arxiv case case study chatgpt cs.ai cs.cl cs.lg data education found language language models languages large language large language models limitations llms medical medical data medical field multiple potential questions study type usmle

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Senior Research Engineer/Specialist - Motor Mechanical Design

@ GKN Aerospace | Bristol, GB

Research Engineer (Motor Mechanical Design)

@ GKN Aerospace | Bristol, GB

Senior Research Engineer (Electromagnetic Design)

@ GKN Aerospace | Bristol, GB

Associate Research Engineer Clubs | Titleist

@ Acushnet Company | Carlsbad, CA, United States