all AI news
PALO: A Polyglot Large Multimodal Model for 5B People
Feb. 23, 2024, 5:46 a.m. | Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan, Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan
cs.CV updates on arXiv.org arxiv.org
Abstract: In pursuit of more inclusive Vision-Language Models (VLMs), this study introduces a Large Multilingual Multimodal Model called \textsc{Palo}. \textsc{Palo} offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of $\sim$5B people (65\% of the world population). Our approach involves a semi-automated translation approach to adapt the multimodal instruction dataset from English to the target languages using a fine-tuned Large Language …
arxiv cs.cl cs.cv multimodal multimodal model people polyglot type
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
1 day, 10 hours ago |
arxiv.org
Mirror-Aware Neural Humans
1 day, 10 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US