Dec. 5, 2023, 5:06 p.m. | /u/AndreeSmothers

Machine Learning

How are folks evaluating the quality of your LLM applications? I'm running a therapist chatbot in production (small scale - 10's of active users) and I've spent a lot of time finetuning prompts but it's all just guesswork.

I'll make a tweak to the prompt and run a few test conversations and just kinda get the vibes of whether it's better or worse than before the tweak. Is this what y'all are doing too or am I missing something???

application applications chatbot engineering finetuning llm llm applications machinelearning production prompt prompts quality running scale small

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

NUSolve Innovation Assistant/Associate in Data Science'

@ Newcastle University | Newcastle, GB

Data Engineer (Snowflake)

@ Unit4 | Lisbon, Portugal

Lead Data Engineer

@ Provident Bank | Woodbridge, NJ, US

Specialist Solutions Engineer (Data Science/Machine Learning)

@ Databricks | London, United Kingdom

Staff Software Engineer, Data Mirgrations

@ Okta | Canada