The Best Performing Instruct LLMs (open-source) | allainews.com

June 16, 2023, noon | code_your_own_AI

code_your_own_AI www.youtube.com

New Evaluation Leaderboard on open-source Instruct LLMs.

Latest Arxiv pre-print on Evaluation of LLMs:
"INSTRUCTEVAL: Towards Holistic Evaluation of
Instruction-Tuned Large Language Models"
https://arxiv.org/pdf/2306.04757.pdf

3 other leaderboards from Stanford, HuggingFace and LMsys:
----------------------------------------------------------------------------------------------------

HuggingFace leaderboard:
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

LMsys leaderboard:
https://chat.lmsys.org/?leaderboard

HELM:
https://crfm.stanford.edu/helm/latest/?

#benchmark
#chatgpt
#gpt4
#largelanguagemodels

arxiv benchmark chatgpt evaluation gpt4 helm huggingface language language models large language large language models largelanguagemodels llms stanford

More from www.youtube.com / code_your_own_AI

Stealth LLM: im-a-good-gpt2-chatbot 1 day, 4 hours ago | www.youtube.com

chatbot good gpt2 gpt2-chatbot +15

Understand DSPy: Programming AI Pipelines 3 days, 4 hours ago | www.youtube.com

case dspy engineering evolution +9

Latest Insights in AI Performance Models 5 days, 4 hours ago | www.youtube.com

ai performance ai research benchmarks beyond +20

New Discovery: Retrieval Heads for Long Context 1 week ago | www.youtube.com

applications attention context dev +15

Multi-Token Prediction (forget next token LLM?) 1 week, 1 day ago | www.youtube.com

architecture autoregressive benchmark data +13

NEW LLM Test: Reasoning & gpt2-chatbot 1 week, 2 days ago | www.youtube.com

blind causal chatbot gpt2-chatbot +8

LLMs: Rewriting Our Tomorrow (plus code) #ai 1 week, 3 days ago | www.youtube.com

ai systems code effects future +10

Autonomous AI Agents: 14 % MAX Performance 1 week, 5 days ago | www.youtube.com

agents ai agents autonomous autonomous agents +14

480B LLM as 128x4B MoE? WHY? 2 weeks ago | www.youtube.com

architecture architectures causal comparison +15

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net