all AI news
[R] GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Nov. 15, 2023, 12:13 a.m. | /u/Successful-Western27
Machine Learning www.reddit.com
The agent can "understand" and interact with smartphone interfaces in a much more human-like manner than previous attempts.
The key innovation lies in GPT-4V's ability to process both text and image …
agent amazon beyond enabling gpt gpt-4v gui interactive iphone machinelearning multimodal multimodal models navigation paper processing smartphone tasks text visual
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
AI Engineering Manager
@ M47 Labs | Barcelona, Catalunya [Cataluña], Spain