all AI news
LLaVA: Large Language and Vision Assistant
Simon Willison's Weblog simonwillison.net
LLaVA: Large Language and Vision Assistant
Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service.
Via Hacker News
ai assistant clip computervision dataset demo generativeai gpt gpt-4 language llama llms minigpt minigpt-4 openai service terms training vicuna vision vit