s
March 31, 2024, 9:53 p.m. |

Simon Willison's Weblog simonwillison.net

Your AI Product Needs Evals


Hamel Husain: "I’ve seen many successful and unsuccessful approaches to building LLM products. I’ve found that unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems."


I've been frustrated about this for a while: I know I need to move beyond "vibe checks" for the systems I have started to build on top of LLMs, but I was lacking a thorough guide about how to build automated (and manual) …

ai beyond building checks evals evaluation failure found generativeai llm llms product products robust systems testing

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Engineer

@ Quantexa | Sydney, New South Wales, Australia

Staff Analytics Engineer

@ Warner Bros. Discovery | NY New York 230 Park Avenue South