all AI news
GSM8K Will Make AI Hate Humanity
DEV Community dev.to
In its release announcement of Claude 3 in March of 2024, Anthropic advertised that the new LM can solve 95% of grade-school math problems (GSM8K) and 50% of graduate-level reasoning problems (GPQA).
The 50% score on graduate-level reasoning is particularly impressive. Highly skilled non-expert humans with unlimited Internet access only get 34% on GPQA. However, this begs the question: Why is it that an AI that can beat skilled humans at graduate-level reasoning can't solve …
access ai announcement anthropic claude claude 3 expert graduate humanity humans internet internet access llm lm math reasoning release school skilled solve will