June 16, 2023, noon | code_your_own_AI

code_your_own_AI www.youtube.com

New Evaluation Leaderboard on open-source Instruct LLMs.

Latest Arxiv pre-print on Evaluation of LLMs:
"INSTRUCTEVAL: Towards Holistic Evaluation of
Instruction-Tuned Large Language Models"
https://arxiv.org/pdf/2306.04757.pdf

3 other leaderboards from Stanford, HuggingFace and LMsys:
----------------------------------------------------------------------------------------------------

HuggingFace leaderboard:
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

LMsys leaderboard:
https://chat.lmsys.org/?leaderboard

HELM:
https://crfm.stanford.edu/helm/latest/?


#benchmark
#chatgpt
#gpt4
#largelanguagemodels

arxiv benchmark chatgpt evaluation gpt4 helm huggingface language language models large language large language models largelanguagemodels llms stanford

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US