Aug. 10, 2023, 4:47 a.m. | Debarag Banerjee, Pooja Singh, Arjun Avadhanam, Saksham Srivastava

cs.CL updates on arXiv.org arxiv.org

Autonomous conversational agents, i.e. chatbots, are becoming an increasingly
common mechanism for enterprises to provide support to customers and partners.
In order to rate chatbots, especially ones powered by Generative AI tools like
Large Language Models (LLMs) we need to be able to accurately assess their
performance. This is where chatbot benchmarking becomes important. In this
paper, we propose the use of a novel benchmark that we call the E2E (End to
End) benchmark, and show how the E2E benchmark …

agents ai tools arxiv autonomous benchmarking chatbot chatbots conversational conversational agents customers enterprises generative generative ai tools language language models large language large language models llm llms metrics partners performance rate support tools

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru