[D] Stanford's BioMedLM Paper reported accuracy vs Evaluated accuracy: Doesn't make sense | allainews.com

March 28, 2024, 5:32 p.m. | /u/aadityaura

Machine Learning www.reddit.com

Stanford releases [\#BioMedLM](https://twitter.com/hashtag/BioMedLM?src=hashtag_click), a 2.7B parameter language model trained on biomedical data. However, the results do not seem to make sense.

Here is the evaluation report using the LM Evaluation Harness framework on MultiMedQA (MedMCQA, MedQA, MMLU, PubMed).

https://preview.redd.it/vd21crtn14rc1.png?width=1442&format=png&auto=webp&s=ee905e8277006e40c37b7e5b87003165bd0de4b5

https://preview.redd.it/6ot7mibo14rc1.png?width=1164&format=png&auto=webp&s=5d76fcce909fb07d5404e148b0cdc2fbc6dae43c

evaluation framework harness machinelearning mmlu report

More from www.reddit.com / Machine Learning

[D] Real talk about RAG 2 hours ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 8 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 9 hours ago | www.reddit.com

70b art biomedical domain +16

How do I convince my superior to do data preprocessing? [D] 9 hours ago | www.reddit.com

ai engineer build chat chatbots +11

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 9 hours ago | www.reddit.com

70b art biomedical domain +16

[D] Mathematical aspects of tokenization 11 hours ago | www.reddit.com

compression educational encoding entropy +7

[R] Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey 13 hours ago | www.reddit.com

abstract advancement application challenges +15

[D] Does it make sense to talk about the probabilities of models? 19 hours ago | www.reddit.com

compute data likelihood machinelearning +4

Open-Sourced: Automated Data Sorting Tools [P] 1 day, 3 hours ago | www.reddit.com

application automated building community +11

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Intern Large Language Models Planning (f/m/x)

@ BMW Group | Munich, DE

View on ai-jobs.net

Data Engineer Analytics

@ Meta | Menlo Park, CA | Remote, US

View on ai-jobs.net