all AI news
Calling All Functions
Towards Data Science - Medium towardsdatascience.com
Benchmarking OpenAI function calling and explanations
Thanks to Roger Yang for his contributions to this piece
Observability in third-party large language models (LLMs) is largely approached with benchmarking and evaluations since models like Anthropic’s Claude, OpenAI’s GPT models, and Google’s PaLM 2 are proprietary. In this blog post, we benchmark OpenAI’s GPT models with function calling and explanations against various performance metrics. We are specifically interested in how the GPT models and OpenAI features …
anthropic author benchmark benchmarking blog claude dall function functions google gpt gpt-4 image language language models large language large language models llm-evaluation llms openai palm palm 2 roger yang