[D] AI Agents: too early, too expensive, too unreliable | allainews.com

May 22, 2024, 2:27 p.m. | /u/madredditscientist

Machine Learning www.reddit.com

[**Reference: Full blog post**](https://www.kadoa.com/blog/ai-agents-hype-vs-reality)

There has been a lot of hype about the promise of autonomous agent-based LLM workflows. By now, all major LLMs are capable of interacting with external tools and functions, letting the LLM perform sequences of tasks automatically.

But reality is proving more challenging than anticipated.

The [WebArena leaderboard](https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit#gid=0), which benchmarks LLMs agents against real-world tasks, shows that even the best-performing models have a success rate of only 35.8%.

# Challenges in Practice

After seeing many attempts …

agent agents ai agents autonomous challenges functions hype llm llms machinelearning major practice reality tasks tools workflows

More from www.reddit.com / Machine Learning

[D] Need help finding an old Geoffrey Hinton video 8 hours ago | www.reddit.com

digit geoff geoff hinton hinton +12

[P] Created an open source version of "Math Notes" from Apple with GPT-4o! 10 hours ago | www.reddit.com

apple gpt gpt-4o machinelearning +3

[D] How to network at a conference 20 hours ago | www.reddit.com

big conference cvpr google +11

[R] CFG++ : A simple fix for addressing the flaws of CFG in diffusion models 1 day, 2 hours ago | www.reddit.com

challenges classifier design diffusion +12

[D] Nemotron-4 340b detailed analysis 1 day, 11 hours ago | www.reddit.com

analysis llm look machinelearning +2

I Trained an LLM on My WhatsApp Chats to Impersonate Me [P] 1 day, 15 hours ago | www.reddit.com

chat chat history export feature +12

[P] Improved Text2SQL Dataset Now Available on Huggingface! 1 day, 15 hours ago | www.reddit.com

download experiment free machinelearning +1

[D] Discussing Apple's Deployment of a 3 Billion Parameter AI Model on the iPhone 15 … 1 day, 18 hours ago | www.reddit.com

ai model apple billion deployment +10

[D]Enhancing Weather Forecast Accuracy Through Data Fusion 1 day, 20 hours ago | www.reddit.com

accuracy cities city cloud +11

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

View on ai-jobs.net

Product Manager, Generative AI

@ Adobe | San Jose

View on ai-jobs.net

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague

View on ai-jobs.net

Principal Data Scientist

@ Salesforce | California - San Francisco

View on ai-jobs.net

Senior Analyst Data Science

@ Novartis | Hyderabad (Office)

View on ai-jobs.net