March 17, 2024, 5:31 p.m. | 1littlecoder

1littlecoder www.youtube.com

Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both
the visual and textual elements within the app’s front-end interface. Based on the perceived vision context, it then autonomously plans and decomposes the complex operation task, and navigates
the mobile Apps through operations step by step. Different from previous solutions that …

agent alibaba app application automate autonomous front-end identify language language models large language large language models mllm mobile mobile device modal multi-modal multimodal paper perception phone popular textual tools visual

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Global Data Architect, AVP - State Street Global Advisors

@ State Street | Boston, Massachusetts

Data Engineer

@ NTT DATA | Pune, MH, IN