promptfoo: How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs | allainews.com

s

Sept. 10, 2023, 4:19 p.m. |

Simon Willison's Weblog simonwillison.net

promptfoo: How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs

promptfoo is a CLI and library for "evaluating LLM output quality". This tutorial in their documentation about using it to compare Llama 2 to gpt-3.5-turbo is a good illustration of how it works: it uses YAML files to configure the prompts, and more YAML to define assertions such as "not-icontains: AI language model".

ai benchmark cli documentation generativeai good gpt gpt-3 gpt-3.5 gpt-3.5-turbo illustration library llama llama 2 llama2 llm llms quality testing turbo tutorial yaml

More from simonwillison.net / Simon Willison's Weblog

Si

Quoting Andrej Karpathy 3 hours ago | simonwillison.net

ai andrej karpathy andrejkarpathy article +12

Si

Experimenting with local alt text generation in Firefox Nightly 11 hours ago | simonwillison.net

adapt ai editor experimental +13

Si

How (some) good corporate engineering blogs are written 1 day, 16 hours ago | simonwillison.net

blogging blogs cloudflare companies +14

Si

Stealing everything you’ve ever typed or viewed on your own Windows PC is now possible … 1 day, 17 hours ago | simonwillison.net

code copilot disaster ever +12

Si

Quoting Will Larson 2 days, 5 hours ago | simonwillison.net

art ceo companies cost +10

Si

Man caught in scam after AI told him fake Facebook customer support number was legitimate 2 days, 8 hours ago | simonwillison.net

ai case chatbot customer +13

Si

Django Enhancement Proposal 14: Background Workers 2 days, 16 hours ago | simonwillison.net

django ecosystem frameworks howard +12

Si

Why, after 6 years, I’m over GraphQL 3 days, 14 hours ago | simonwillison.net

all in authorization complexity graphql +3

Si

What does the public in six countries think of generative AI in news? 3 days, 17 hours ago | simonwillison.net

ai chatgpt evidence generative +15

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV

View on ai-jobs.net

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A

View on ai-jobs.net