June 7, 2024, 4:45 a.m. | Navreet Kaur, Monojit Choudhury, Danish Pruthi

cs.LG updates on arXiv.org arxiv.org

arXiv:2312.08800v2 Announce Type: replace-cross
Abstract: As corporations rush to integrate large language models (LLMs) to their search offerings, it is critical that they provide factually accurate information that is robust to any presuppositions that a user may express. In this work, we introduce UPHILL, a dataset consisting of health-related queries with varying degrees of presuppositions. Using UPHILL, we evaluate the factual accuracy and consistency of InstructGPT, ChatGPT, and BingChat models. We find that while model responses rarely disagree with true …

abstract arxiv corporations cs.ai cs.cl cs.hc cs.lg dataset express health information language language models large language large language models llms queries replace robust search type work

Senior Data Engineer

@ Displate | Warsaw

Junior Data Analyst - ESG Data

@ Institutional Shareholder Services | Mumbai

Intern Data Driven Development in Sensor Fusion for Autonomous Driving (f/m/x)

@ BMW Group | Munich, DE

Senior MLOps Engineer, Machine Learning Platform

@ GetYourGuide | Berlin

Data Engineer, Analytics

@ Meta | Menlo Park, CA

Data Engineer

@ Meta | Menlo Park, CA