GPT-4's assessment of its performance in a USMLE-based case study | allainews.com

Feb. 16, 2024, 5:47 a.m. | Uttam Dhakal, Aniket Kumar Singh, Suman Devkota, Yogesh Sapkota, Bishal Lamichhane, Suprinsa Paudyal, Chandra Dhakal

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.09654v1 Announce Type: cross
Abstract: This study investigates GPT-4's assessment of its performance in healthcare applications. A simple prompting technique was used to prompt the LLM with questions taken from the United States Medical Licensing Examination (USMLE) questionnaire and it was tasked to evaluate its confidence score before posing the question and after asking the question. The questionnaire was categorized into two groups-questions with feedback (WF) and questions with no feedback(NF) post-question. The model was asked to provide absolute and …

abstract applications arxiv assessment case case study confidence cs.ai cs.cl cs.hc gpt gpt-4 healthcare licensing llm medical performance prompt prompting questions simple study type united united states usmle

More from arxiv.org / cs.CL updates on arXiv.org

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access 1 day, 1 hour ago | arxiv.org

abstract access application arxiv +21

LLaMA Pro: Progressive LLaMA with Block Expansion 1 day, 1 hour ago | arxiv.org

abstract arxiv block codellama +15

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning 1 day, 1 hour ago | arxiv.org

arxiv captioning chart charts +4

Sibyl: Sensible Empathetic Dialogue Generation with Visionary Commonsense Knowledge 1 day, 1 hour ago | arxiv.org

abstract access arxiv building +19

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models 1 day, 1 hour ago | arxiv.org

abstract accessibility art arxiv +17

ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models 1 day, 1 hour ago | arxiv.org

abstract arxiv challenges core +23

Cross-Lingual Knowledge Editing in Large Language Models 1 day, 1 hour ago | arxiv.org

arxiv cross-lingual cs.ai cs.cl +8

Hi Model, generating 'nice' instead of 'good' is not as bad as generating 'rice'! Towards … 1 day, 1 hour ago | arxiv.org

abstract arxiv context cs.cl +16

Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model 1 day, 1 hour ago | arxiv.org

abstract agent ai legal arxiv +28

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Security Data Engineer

@ ASML | Veldhoven, Building 08, Netherlands

View on ai-jobs.net

Data Engineer

@ Parsons Corporation | Pune - Business Bay

View on ai-jobs.net

Data Engineer

@ Parsons Corporation | Bengaluru, Velankani Tech Park

View on ai-jobs.net