April 24, 2024, 4:40 a.m. | /u/jferments

Computer Vision www.reddit.com

I have a large (>2.5 million files) dataset of NSFW images that I would like to auto-generate detailed (\~100-150 token) captions for, using a visual language model similar to CogVLM or Llava.

I have tried both CogVLM and Llava, and unfortunately both models are far too heavily censored to complete the task. The responses range either from outright refusal to caption the images, or captions that are so heavily filtered for "appropriateness" that they fail to describe the important features …

auto captioning captions computervision dataset datasets files generate image image datasets images language language model libraries llava nsfw token visual visual language model work

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York