all AI news
Benchmarking LLM powered Chatbots: Methods and Metrics. (arXiv:2308.04624v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
Autonomous conversational agents, i.e. chatbots, are becoming an increasingly
common mechanism for enterprises to provide support to customers and partners.
In order to rate chatbots, especially ones powered by Generative AI tools like
Large Language Models (LLMs) we need to be able to accurately assess their
performance. This is where chatbot benchmarking becomes important. In this
paper, we propose the use of a novel benchmark that we call the E2E (End to
End) benchmark, and show how the E2E benchmark …
agents ai tools arxiv autonomous benchmarking chatbot chatbots conversational conversational agents customers enterprises generative generative ai tools language language models large language large language models llm llms metrics partners performance rate support tools