s
April 19, 2023, 1:14 a.m. |

Simon Willison's Weblog simonwillison.net

LLaVA: Large Language and Vision Assistant


Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service.


Via Hacker News

ai assistant clip computervision dataset demo generativeai gpt gpt-4 language llama llms minigpt minigpt-4 openai service terms training vicuna vision vit

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (H/F)

@ Business & Decision | Montpellier, France

Machine Learning Researcher

@ VERSES | Brighton, England, United Kingdom - Remote