May 8, 2023, 12:45 a.m. | Wenyan Li, Jonas F. Lotz, Chen Qiu, Desmond Elliott

cs.CL updates on arXiv.org arxiv.org

Recent advances in image captioning are mainly driven by large-scale
vision-language pretraining, relying heavily on computational resources and
increasingly large multimodal datasets. Instead of scaling up pretraining data,
we ask whether it is possible to improve performance by improving the quality
of the samples in existing datasets. We pursue this question through two
approaches to data curation: one that assumes that some examples should be
avoided due to mismatches between the image and caption, and one that assumes
that the …

arxiv captioning computational curation data data curation datasets generative generative models image language multimodal performance quality resources scale scaling scaling up text text-to-image vision

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Science Specialist

@ Telstra | Telstra ICC Bengaluru

Senior Staff Engineer, Machine Learning

@ Nagarro | Remote, India