Oct. 2, 2022, 1:37 a.m. | /u/0x00groot

Machine Learning www.reddit.com

Code: [https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth](https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth)

Colab: [https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth\_Stable\_Diffusion.ipynb](https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb)

https://preview.redd.it/rj70zdpqqar91.png?width=1009&format=png&auto=webp&s=940710714f058f0e0e9707e19e119c79ed7f3ce6

Tested on Tesla T4 GPU on google colab. It is still pretty fast, no further precision loss from the previous 12 GB version. I have also added a table to choose the best flags according to the memory and speed requirements.

​

|`fp16`|`train_batch_size`|`gradient_accumulation_steps`|`gradient_checkpointing`|`use_8bit_adam`|GB VRAM usage|Speed (it/s)|
|:-|:-|:-|:-|:-|:-|:-|
|fp16|1|1|TRUE|TRUE|9.92|0.93|
|no|1|1|TRUE|TRUE|10.08|0.42|
|fp16|2|1|TRUE|TRUE|10.4|0.66|
|fp16|1|1|FALSE|TRUE|11.17|1.14|
|no|1|1|FALSE|TRUE|11.17|0.49|
|fp16|1|2|TRUE|TRUE|11.56|1|
|fp16|2|1|FALSE|TRUE|13.67|0.82|
|fp16|1|2|FALSE|TRUE|13.7|0.83|
|fp16|1|1|TRUE|FALSE|15.79|0.77|

Might also work on 3080 10GB now but I haven't tested. Let me know if anybody here can test.

8bit adam diffusion gradient machinelearning stable diffusion training

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Lead Data Engineer

@ JPMorgan Chase & Co. | Jersey City, NJ, United States

Senior Machine Learning Engineer

@ TELUS | Vancouver, BC, CA

CT Technologist - Ambulatory Imaging - PRN

@ Duke University | Morriville, NC, US, 27560

BH Data Analyst

@ City of Philadelphia | Philadelphia, PA, United States