Given an image data point sampled from the image dataset distribution , let us gradually add Gaussian Noise in a series of time steps, producing a sequence of noisy images () where is the image after the first steps.
More formally this process can be described as the following: our image of pixels can be flattened to a vector and for type-checking purposes can be normalized into pixel intensities (instead of ). Each transition step is a conditional probability distribution (), giving us the probability density for the image given the previous time step’s . We call this process Markovian, because it satisfies the Markov Property: each step only relies on the previous step (more formally )
To clarify some notation:
Retrieving image means retrieving from a Gaussian distribution denoted by
Mean: of each pixel
Covariance: (where is the identity matrix)
What this means is that each individual pixel (with variance at ) is independently distributed of each other since off-diagonal entries .
Using the reparameterization trick from VAEs, this means
is a constant given from our noise scheduler, specifying the variance (noise intensity) added each time step. One interesting question one might have is why include the coefficient for mean. According to the reparameterization trick, since , and because and are sampled from independent gaussians we can state
Since we normalized our image to , , so then . Apparently, this is called “variance preserving”. If our input , then as well. The point is the variance is constant through the entire forward process!
Variance/Noise Schedule
Originally, the authors of DDPM utilizes a linear schedule.
Variance Schedule of Linear (top) vs Cosine (bottom)
Simplified Sampling Form
The joint distribution of the entire trajectory of time steps, the product of all different PDF —
— can be expressed a simpler closed form expression if we define a additional variables
defining “fraction” of the previous step’s signal retained
denoting the fraction of the original image left after time steps.
is the gaussian noise added at each time step
and induct on or :
The key combined variance step is as follows: Since and are sampled independently, the linear combination of independent Gaussians stays Gaussian, and yields a merged standard deviation as follows
which allows us to replace as sampling from a shared Thus we can write
and thus produce a sample
As , then we should have reached an isotropic Gaussian distribution, one where follows a perfect gaussian distribution of mean . Note that is because !
This is advantageous, because we all already know how to sample gaussian noise, so figuring out how to reverse the gaussian noise in the reverse diffusion process allows us to generate random images!
Reverse Diffusion Process
We want to learn the reverse distribution to acquire some new images in our dataset by learning a deep learning model by where we learn some estimate mean and variance through parameters :
One specific path we may take from is represented by
But the PDF of the entire reverse diffusion process is an “integral” over all the possible pathways we can take to reach .