March 28, 2024, 5:32 p.m. | /u/aadityaura

Machine Learning www.reddit.com

Stanford releases [\#BioMedLM](https://twitter.com/hashtag/BioMedLM?src=hashtag_click), a 2.7B parameter language model trained on biomedical data. However, the results do not seem to make sense.

Here is the evaluation report using the LM Evaluation Harness framework on MultiMedQA (MedMCQA, MedQA, MMLU, PubMed).


https://preview.redd.it/vd21crtn14rc1.png?width=1442&format=png&auto=webp&s=ee905e8277006e40c37b7e5b87003165bd0de4b5

https://preview.redd.it/6ot7mibo14rc1.png?width=1164&format=png&auto=webp&s=5d76fcce909fb07d5404e148b0cdc2fbc6dae43c

evaluation framework harness machinelearning mmlu report

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Intern Large Language Models Planning (f/m/x)

@ BMW Group | Munich, DE

Data Engineer Analytics

@ Meta | Menlo Park, CA | Remote, US