April 10, 2024, 9:48 p.m. | /u/the_snow_princess

Machine Learning www.reddit.com

I have seen that Devin broke the record in the SWE bench score, followed by SWE-agent (an open-source Devin). I have seen that Claude 2 got around 5%. But what about other projects?
What are the sources to check this?



And in general, what is your view on the test? I've seen people saying that the tasks are very easy (for humans), which of course doesn't mean machines are able to deal with them well.
But anyway, do you …

agent check claude claude 2 devin general list machinelearning performances projects public swe test the record

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne