[D] AI Agents: too early, too expensive, too unreliable | allainews.com

May 22, 2024, 2:27 p.m. | /u/madredditscientist

Machine Learning www.reddit.com

[**Reference: Full blog post**](https://www.kadoa.com/blog/ai-agents-hype-vs-reality)

There has been a lot of hype about the promise of autonomous agent-based LLM workflows. By now, all major LLMs are capable of interacting with external tools and functions, letting the LLM perform sequences of tasks automatically.

But reality is proving more challenging than anticipated.

The [WebArena leaderboard](https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit#gid=0), which benchmarks LLMs agents against real-world tasks, shows that even the best-performing models have a success rate of only 35.8%.

# Challenges in Practice

After seeing many attempts …

agent agents ai agents autonomous challenges functions hype llm llms machinelearning major practice reality tasks tools workflows

More from www.reddit.com / Machine Learning

[R] M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec 7 hours ago | www.reddit.com

audio codec machinelearning spatial +1

[P] C-GAN based MNIST model evaluator/validator 10 hours ago | www.reddit.com

building gan gans generative +5

[R] [CVPR 2024] AV-RIR: Audio-Visual Room Impulse Response Estimation 12 hours ago | www.reddit.com

audio cvpr machinelearning room +1

[Research] Exploiting the Layered Intrinsic Dimensionality for Practical Adversarial Training 13 hours ago | www.reddit.com

adversarial adversarial training aes algorithm +16

[D] Patenting in ML 15 hours ago | www.reddit.com

academia algorithms application applications +10

[R] Weight Rescaling: Applying Initialization Strategies During Training 20 hours ago | www.reddit.com

machinelearning strategies training

[P] llama.ttf: A font which is also an LLM 1 day ago | www.reddit.com

llama llm machinelearning

[D] Thought Space in LLMs? 1 day, 3 hours ago | www.reddit.com

concepts create generate image +12

Cuda advanced learning materials, [D] 1 day, 6 hours ago | www.reddit.com

advanced books course cuda +9

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Content Designer

@ Glean | Palo Alto, CA

View on ai-jobs.net

IT&D Data Solution Architect

@ Reckitt | Hyderabad, Telangana, IN, N/A

View on ai-jobs.net

Python Developer

@ Riskinsight Consulting | Hyderabad, Telangana, India

View on ai-jobs.net

Technical Lead (Java/Node.js)

@ LivePerson | Hyderabad, Telangana, India (Remote)

View on ai-jobs.net

Backend Engineer - Senior and Mid-Level - Sydney Hybrid or AU remote

@ Displayr | Sydney, New South Wales, Australia

View on ai-jobs.net