all AI news
Data Curation for Image Captioning with Text-to-Image Generative Models. (arXiv:2305.03610v1 [cs.CV])
cs.CL updates on arXiv.org arxiv.org
Recent advances in image captioning are mainly driven by large-scale
vision-language pretraining, relying heavily on computational resources and
increasingly large multimodal datasets. Instead of scaling up pretraining data,
we ask whether it is possible to improve performance by improving the quality
of the samples in existing datasets. We pursue this question through two
approaches to data curation: one that assumes that some examples should be
avoided due to mismatches between the image and caption, and one that assumes
that the …
arxiv captioning computational curation data data curation datasets generative generative models image language multimodal performance quality resources scale scaling scaling up text text-to-image vision