April 24, 2024, 4:40 a.m. | /u/jferments

Computer Vision www.reddit.com

I have a large (>2.5 million files) dataset of NSFW images that I would like to auto-generate detailed (\~100-150 token) captions for, using a visual language model similar to CogVLM or Llava.

I have tried both CogVLM and Llava, and unfortunately both models are far too heavily censored to complete the task. The responses range either from outright refusal to caption the images, or captions that are so heavily filtered for "appropriateness" that they fail to describe the important features …

auto captioning captions computervision dataset datasets files generate image image datasets images language language model libraries llava nsfw token visual visual language model work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant Senior Power BI & Azure - CDI - H/F

@ Talan | Lyon, France