Sept. 28, 2022, 8:18 a.m. | /u/Academy-

Natural Language Processing www.reddit.com

Hi everyone. I hope this fits in this sub.

I’m curious as to whether there is any way to fine-tune a stable diffusion (text2img) model on, say, a larger (1000+) corpus of (img, text) pairs.

I have seen posts on [textual inversion](https://towardsdatascience.com/how-to-fine-tune-stable-diffusion-using-textual-inversion-b995d7ecc095) which enable one to fine-tune the underlying embeddings. However, this method seems to work well with 3 - 5 new examples and doesn’t scale well.

So, is there a way to efficiently fine-tune on more data? Compute cost …

diffusion languagetechnology stable diffusion

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Staff Software Engineer, Generative AI, Google Cloud AI

@ Google | Mountain View, CA, USA; Sunnyvale, CA, USA

Expert Data Sciences

@ Gainwell Technologies | Any city, CO, US, 99999