PI #009: Let's Open the Stable Diffusion Black Box

Part 1: How Does a Stable Diffusion Model Learn to Generate Images?

Jul 27, 2023

Hello there, I am Paul Iusztin, and within this newsletter, I will deliver your weekly piece of MLE & MLOps wisdom straight to your inbox 🔥

This week we will cover stable diffusion models. More concretely:

Get an Intuition of How Diffusion Models Work
How Diffusion models generate new images

→ Next week, we will release Part 2, where we cover how stable diffusion models are trained and controlled using text prompts.

Before diving into the topic, I want to celebrate that my Full Stack 7-Steps MLOps Framework course reached 410+ stars on GitHub.

I want to thank everyone who starred and read my course. This means a lot to me.

If you haven’t read it yet and you are interested in learning how to design, build, deploy, and monitor an end-to-end ML batch system, check it out on GitHub and Medium.

Now, let’s get back to our diffusion models 👇

#1. Get an Intuition of How Diffusion Models Work

We will use a dataset with cats as an example.

Thus, let's say that we want to train a stable diffusion model to generate new cats.

Then, to:

#𝟭. 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗲 𝘁𝗵𝗲 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 - 𝗮𝗱𝗱 𝗚𝗮𝘂𝘀𝘀𝗶𝗮𝗻 𝗻𝗼𝗶𝘀𝗲

We take every image from the dataset and gradually add Gaussian noise to them.

Now, we have multiple images containing various noise levels for every initial cat image.

#𝟮. 𝗧𝗿𝗮𝗶𝗻 𝘁𝗵𝗲 𝗺𝗼𝗱𝗲𝗹 - 𝗿𝗲𝗺𝗼𝘃𝗲 𝘁𝗵𝗲 𝗻𝗼𝗶𝘀𝗲

The actual job of the model is to take a noisy image and remove the noise from it.

Thus, when training the diffusion model:
- it will take a noisy image as input
- it will try to remove the noise
- the loss is computed between the "cleaned" image & the original unnoisy image

There are some missing pieces to the algorithm I explained, but this is the central intuition that will get you started 🔥

→ 𝘖𝘬... 𝘛𝘩𝘢𝘵'𝘴 𝘤𝘰𝘰𝘭, 𝘣𝘶𝘵 𝘩𝘰𝘸 𝘥𝘰𝘦𝘴 𝘪𝘵 𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘦 𝘯𝘦𝘸 𝘪𝘮𝘢𝘨𝘦𝘴?

Well, the model learned to add cat features to a noisy image.

Thus, you can throw some Gaussian noise into the model,

... and after some magic happens, it will generate a completely new cat for you.

Get an intuition of how diffusion models work [Image by the Author].

Note that here the process isn't controlled with a prompt as most of you are used to, but I will present that soon.

#2. How Diffusion models generate new images

The process of generating new images with stable diffusion can be explained in 5 simple steps.

The generation of new images is an iterative process (aka the sampling process).

The most naive sampling algorithm is called DDPM.

Here it is 👇

#𝟭. Sample Gaussian Noise with the size of the desired output.

For a 516x516 image sample noise with a shape of 516x516.

#𝟮. Pass the sample through the diffusion model.

The model will return what it thinks represents noise from the sample.

#𝟯. Subtract the noise returned by the model from the sample.

#𝟰. Add back some Gaussian noise to stabilize the process.

This is important, as the model learned to get an image with Gaussian noise as input.

Thus, if you remove the noise from the image, the input won't have the same distribution anymore.

#𝟱. Repeat steps 2-4 for T iterations.

One important note is that all the operations are conditioned by T.

Thus, the model will return different results for different timesteps, and you subtract different noise scales from the sample.

Also, one step won't remove all the noise. That is why we need T steps to get high-quality samples.

How diffusion models generate new images [Image by the Author].

To conclude…

To generate new images using Diffusion models, you have to:
- sample gaussian noise
- pass it throw the model
- subtract the predicted noise from the sample

That’s it 🔥

These are this week’s insights about stable diffusion.

See you next Thursday at 9:00 am CET.

Have a fantastic weekend!

Paul

Whenever you’re ready, here is how I can help you:

The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: here, I approach in-depth topics about designing and productionizing ML systems using MLOps.

PI #009: Let's Open the Stable Diffusion Black Box

Part 1: How Does a Stable Diffusion Model Learn to Generate Images?

#1. Get an Intuition of How Diffusion Models Work

#2. How Diffusion models generate new images

Whenever you’re ready, here is how I can help you:

Discussion about this post