all AI news
[R] GPT-4V(ision) is a Generalist Web Agent, if Grounded - The Ohio State University 2024 - Can successfully complete 50% of the tasks on live websites!
Jan. 5, 2024, 8:18 p.m. | /u/Singularian2501
Machine Learning www.reddit.com
Blog: [https://osu-nlp-group.github.io/SeeAct/](https://osu-nlp-group.github.io/SeeAct/)
Code: [https://github.com/OSU-NLP-Group/SeeAct](https://github.com/OSU-NLP-Group/SeeAct)
Abstract:
>The recent development on **large multimodal models (LMMs), especially GPT-4V(ision) and Gemini**, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering. In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website. We propose SEEACT, a generalist web agent that harnesses the power of …
abstract agent beyond capability captioning development explore gemini gpt gpt-4v image language lmms machinelearning multimodal multimodal models natural natural language question question answering tasks visual web work
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US