s
Sept. 10, 2023, 4:19 p.m. |

Simon Willison's Weblog simonwillison.net

promptfoo: How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs


promptfoo is a CLI and library for "evaluating LLM output quality". This tutorial in their documentation about using it to compare Llama 2 to gpt-3.5-turbo is a good illustration of how it works: it uses YAML files to configure the prompts, and more YAML to define assertions such as "not-icontains: AI language model".

ai benchmark cli documentation generativeai good gpt gpt-3 gpt-3.5 gpt-3.5-turbo illustration library llama llama 2 llama2 llm llms quality testing turbo tutorial yaml

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Principal Data Architect - Azure & Big Data

@ MGM Resorts International | Home Office - US, NV

GN SONG MT Market Research Data Analyst 11

@ Accenture | Bengaluru, BDC7A