[D] MetaGPT grossly misreported baseline numbers and got an ICLR Oral! | allainews.com

Feb. 22, 2024, 5:06 p.m. | /u/Signal-Aardvark-4179

Machine Learning www.reddit.com

OpenReview: [https://openreview.net/forum?id=VtmBAGCN7o](https://openreview.net/forum?id=VtmBAGCN7o)

I was looking at ICLR reviews and was surprised to see MetaGPT being submitted to ICLR. The acceptance decision states that they were awarded an Oral (highest level at ICLR).

Looking at the paper, they report these comparisons with HumanEval:

|Method|Pass@1|
|:-|:-|
|MetaGPT|85.9|
|GPT-4|67.0|
|GPT-3.5-Turbo (in the response)|48.1|

However the real GPT-4 and GPT-3.5-Turbo numbers on this benchmark are much much higher (see EvalPlus leaderboard: https://evalplus.github.io/leaderboard.html). The results from the EvalPlus leaderboard have been reproduced numerous times, …

decision gpt gpt-3 gpt-3.5 gpt-3.5-turbo gpt-4 humaneval iclr machinelearning metagpt numbers paper report reviews turbo

More from www.reddit.com / Machine Learning

[Research] xLSTM: Extended Long Short-Term Memory 4 hours ago | www.reddit.com

abstract contributed deep learning error +16

Non Technical ML Podcasts? [D] 11 hours ago | www.reddit.com

challenge context current data +16

[D] PEFT techniques actually used in the industry 15 hours ago | www.reddit.com

industry machinelearning normally peft +2

[D] Can anyone with the expertise speak to the overlap, or not, between Nvidia's hardware … 16 hours ago | www.reddit.com

apple chips expertise hardware +4

[P] Skyrim - Open-source model zoo for Large Weather Models 17 hours ago | www.reddit.com

ai models building capabilities fine-tuning +7

[P] Identify toxic underwater air bubbles lurking in the substrate with aquatic ultrasonic scans via … 19 hours ago | www.reddit.com

arduino classification color identify +11

[P] YARI - Yet Another RAG Implementation. Hybrid context retrieval 20 hours ago | www.reddit.com

api context cosine embedding +14

[D] Recognizing uncommon terms with whisper 1 day ago | www.reddit.com

audio file french hello +9

[D] Is EOS token crucial during pre-training? 1 day ago | www.reddit.com

documents eos flow information +7

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net