all AI news
[D] [P] Web browsing UI-based AI agent: GPT-4V-Act
Oct. 21, 2023, 8:37 a.m. | /u/a6oo
Machine Learning www.reddit.com
(A demo video can be found on the Github)
Hi there!
I'd like to share with you a project I recently developed. My inspiration came from a recent post about [Set-of-Mark visual grounding in GPT-4V](https://www.reddit.com/r/MachineLearning/comments/17bcikh/r_setofmark_som_unleashes_extraordinary_visual/). Fascinatingly, my tests showed that GPT-4V, equipped with this capability, could inspect a UI screenshot and provide the precise pixel coordinates needed for steering a mouse/keyboard to perform a specified task.
Motivated by this, I built a proof-of-concept web browser embedded with a …
api backend basic browser chatgpt chatgpt plus concept co-pilot demo embedded found github gpt gpt-4v machinelearning pilot proof-of-concept scraping the browser video web web browser
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Scientist, Commercial Analytics
@ Checkout.com | London, United Kingdom
Data Engineer I
@ Love's Travel Stops | Oklahoma City, OK, US, 73120