Dec. 7, 2023, 8:17 a.m. | Amber Roberts

Towards Data Science - Medium towardsdatascience.com

Image created by author using Dall-E

Benchmarking OpenAI function calling and explanations

Thanks to Roger Yang for his contributions to this piece

Observability in third-party large language models (LLMs) is largely approached with benchmarking and evaluations since models like Anthropic’s Claude, OpenAI’s GPT models, and Google’s PaLM 2 are proprietary. In this blog post, we benchmark OpenAI’s GPT models with function calling and explanations against various performance metrics. We are specifically interested in how the GPT models and OpenAI features …

anthropic author benchmark benchmarking blog claude dall function functions google gpt gpt-4 image language language models large language large language models llm-evaluation llms openai palm palm 2 roger yang

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South