Sept. 12, 2022, 3:30 p.m. | /u/cccntu

Machine Learning www.reddit.com

A few weeks ago, before stable-diffusion was officially released, I found that fine-tuning Dalle-mini's VQGAN decoder can improve the performance on anime images. See:

https://preview.redd.it/eekf9hjt3gn91.png?width=1280&format=png&auto=webp&s=25938a4ad284e6cfff958ad0d69968cd2c01ed18

And with a few lines of code change, I was able to train the stable-diffusion VAE decoder. See:


https://preview.redd.it/45xogflo5gn91.png?width=1129&format=png&auto=webp&s=43f98e863b918bba9d7471a0cfa7de4dcc8df98c

You can find the exact training code used in this repo: [https://github.com/cccntu/fine-tune-models/](https://github.com/cccntu/fine-tune-models/)

More details about the models are also in the repo.

And you can play with the former model at [https://github.com/cccntu/anim\_e](https://github.com/cccntu/anim_e)

code dalle dalle-mini diffusion machinelearning release

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne