Feb. 14, 2024, 5:47 a.m. | Zhenhua Xu Yujia Zhang Enze Xie Zhen Zhao Yong Guo Kwan-Yee. K. Wong Zhenguo Li Hengshuang Zha

cs.CV updates on arXiv.org arxiv.org

Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively …

application autonomous autonomous driving community cs.cv cs.ro data driving images language language model language models large language large language model large language models mllms multimodal novel reasoning research research community study textual via videos

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US