[R] GPT-4V(ision) is a Generalist Web Agent, if Grounded - The Ohio State University 2024 - Can successfully complete 50% of the tasks on live websites! | allainews.com

Jan. 5, 2024, 8:18 p.m. | /u/Singularian2501

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2401.01614](https://arxiv.org/abs/2401.01614)

Blog: [https://osu-nlp-group.github.io/SeeAct/](https://osu-nlp-group.github.io/SeeAct/)

Code: [https://github.com/OSU-NLP-Group/SeeAct](https://github.com/OSU-NLP-Group/SeeAct)

Abstract:

>The recent development on **large multimodal models (LMMs), especially GPT-4V(ision) and Gemini**, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering. In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website. We propose SEEACT, a generalist web agent that harnesses the power of …

abstract agent beyond capability captioning development explore gemini gpt gpt-4v image language lmms machinelearning multimodal multimodal models natural natural language question question answering tasks visual web work

More from www.reddit.com / Machine Learning

[D] Real chances to be accepted in NeurIPS 2024 - Other conferences 6 hours ago | www.reddit.com

authors case conferences exit +5

[D] Seminal papers list since 2018 that will be considered cannon in the future 9 hours ago | www.reddit.com

attention attention is all you need clip finally +13

[D] Are PyTorch high-level frameworks worth using? 10 hours ago | www.reddit.com

biases experiment frameworks ignite +10

[D] Friday's Oxen.AI Water Cooler call: High-performance audio processing, Python vs Rust 18 hours ago | www.reddit.com

audio conference data discuss +17

[R] Energy-based Hopfield Boosting for Out-of-Distribution Detection 18 hours ago | www.reddit.com

advanced boosting data decision +14

[D] LWhy are Linear RNNs so performant (in terms of accuracy, not compute)? Looking for … 19 hours ago | www.reddit.com

accuracy architecture compute linear +5

[D] ICML 2024 travel grants? 19 hours ago | www.reddit.com

applications financial grant grants +7

[D] Unveiling MileBench: Benchmarking MLLMs in Long Contexts! 22 hours ago | www.reddit.com

benchmark benchmarking benchmarks complexity +15

[D] What’s the best cloud compute service for hobby projects? 22 hours ago | www.reddit.com

applications cloud compute computer +17

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net