all AI news
DOCCI: Descriptions of Connected and Contrasting Images
May 1, 2024, 4:43 a.m. | Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer,
cs.LG updates on arXiv.org arxiv.org
Abstract: Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that were taken, curated and donated by a single researcher intent on capturing key challenges such as spatial relations, counting, text …
abstract arxiv cs.ai cs.cl cs.cv cs.lg current dataset datasets fine-grained gap however image images image-to-text language research text text-to-image type vision vision-language vital
More from arxiv.org / cs.LG updates on arXiv.org
Testing the Segment Anything Model on radiology data
2 days, 1 hour ago |
arxiv.org
Calorimeter shower superresolution
2 days, 1 hour ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US