[D] MetaGPT grossly misreported baseline numbers and got an ICLR Oral! | allainews.com

Feb. 22, 2024, 5:06 p.m. | /u/Signal-Aardvark-4179

Machine Learning www.reddit.com

OpenReview: [https://openreview.net/forum?id=VtmBAGCN7o](https://openreview.net/forum?id=VtmBAGCN7o)

I was looking at ICLR reviews and was surprised to see MetaGPT being submitted to ICLR. The acceptance decision states that they were awarded an Oral (highest level at ICLR).

Looking at the paper, they report these comparisons with HumanEval:

|Method|Pass@1|
|:-|:-|
|MetaGPT|85.9|
|GPT-4|67.0|
|GPT-3.5-Turbo (in the response)|48.1|

However the real GPT-4 and GPT-3.5-Turbo numbers on this benchmark are much much higher (see EvalPlus leaderboard: https://evalplus.github.io/leaderboard.html). The results from the EvalPlus leaderboard have been reproduced numerous times, …

decision gpt gpt-3 gpt-3.5 gpt-3.5-turbo gpt-4 humaneval iclr machinelearning metagpt numbers paper report reviews turbo

More from www.reddit.com / Machine Learning

[D] - What is the latest in fusing the probability distribution outputs of LLMs with … 7 hours ago | www.reddit.com

distribution latest llms machinelearning +5

[P] mamba.np: pure NumPy implementation of Mamba 7 hours ago | www.reddit.com

code cpu machinelearning mamba +5

[R] xLSTM official code + Kilcher video 8 hours ago | www.reddit.com

code finally implementation improvement +13

[R] MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark 12 hours ago | www.reddit.com

abstract age benchmark benchmarks +14

[R] A Study in Dataset Pruning for Image Super-Resolution 14 hours ago | www.reddit.com

core dataset image loss +11

[D] Mamba2 SSD contractions visualized as a tensor network 17 hours ago | www.reddit.com

authors code dimensions machinelearning +4

[D] Vector Neural Networks (VNNs) – Enhancing Geometric Deep Learning with 2D Vector Neurons and … 21 hours ago | www.reddit.com

architecture capabilities community deep learning +13

[D] Who are some researchers to follow in the field of Model Evaluation and Model … 23 hours ago | www.reddit.com

evaluation good interpretability lists +4

[R] Introducing einspace: A Versatile Search Space for NAS based on Fundamental Operations 1 day, 1 hour ago | www.reddit.com

architecture architectures components convolutional +14

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Senior Principal Data Engineer

@ GSK | Bengaluru

View on ai-jobs.net

Senior Principal Data Engineering

@ GSK | Bengaluru

View on ai-jobs.net