Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on training Unet2DModel to predict the clean instead of noise #64

Open
tmquan opened this issue Jun 1, 2023 · 2 comments
Open

Comments

@tmquan
Copy link

tmquan commented Jun 1, 2023

Thanks for the notebooks.

I have one comment
In this file https://github.com/huggingface/diffusion-models-class/blob/main/unit1/02_diffusion_models_from_scratch.ipynb

# The training loop
for epoch in range(n_epochs):

    for x, y in train_dataloader:

        # Get some data and prepare the corrupted version
        x = x.to(device) # Data on the GPU
        noise_amount = torch.rand(x.shape[0]).to(device) # Pick random noise amounts
        noisy_x = corrupt(x, noise_amount) # Create our noisy x

        # Get the model prediction
        pred = net(noisy_x, 0).sample #<<< Using timestep 0 always, adding .sample

        # Calculate the loss
        loss = loss_fn(pred, x) # How close is the output to the true 'clean' x?

        # Backprop and update the params:
        opt.zero_grad()
        loss.backward()
        opt.step()

        # Store the loss for later
        losses.append(loss.item())

    # Print our the average of the loss values for this epoch:
    avg_loss = sum(losses[-len(train_dataloader):])/len(train_dataloader)
    print(f'Finished epoch {epoch}. Average loss for this epoch: {avg_loss:05f}')

In case I want to make the network predict the clean images, should the pred formula be changed to this by attaching noise_amount?

        # Get the model prediction
        pred = net(noisy_x, noise_amount).sample #<<< Using timestep 0 always, adding .sample
@stefgina
Copy link

predicting the image is a lot harder for the model, don't expect similar results!

@johnowhitaker
Copy link
Collaborator

The model in diffusers expects a timestep as the second argument but since we're training from scratch we can choose to ignore it by always passing 0 as the timestep. In the text, I call out how to change this if you want to add timestep conditioning:
We can replicate the training shown above using this model in place of our original one. We need to pass both x and timestep to the model (here I always pass t=0 to show that it works without this timestep conditioning and to keep the sampling code easy, but you can also try feeding in (amount*1000) to get a timestep equivalent from the corruption amount).

To change what the network is predicting (the 'target') this is the relevant line:
loss = loss_fn(pred, x) # How close is the output to the true 'clean' x?

Here we compare the output of the network (pred) with the clean image. If you want to predict the noise, you might use loss_fn(pred, noise) (you will also then have to change the sampling method).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants