Imports and Setup

In this blog post we’ll build an autoencoder in Pytorch from scratch, and have you encoding and decoding your first images! Hop over to google colab and open a blank notebook. To begin, we’ll want to install pytorch and torchvision, since we’ll rely on them for the components that make up our dataset. To install them into our colab notebook:

!pip install torch torchvision

And then we also need to import them:

import torch, torchvision
from torch import nn, optim

Constructing Encoder and Decoder

Creating the network is also relatively easy, we create two classes, one that represents the encoder and another that represents the decoder. These classes need to a have a very specific format and should always inherit the nn.Module base class. Inheriting from the nn.Module class will automatically give our class some useful and essential properties.

In python, to inherit in from another class we simply pass the class name into the parenthesis of the class that we’re creating and calling the super() function in the constructor, such as follows:

class Encoder(nn.Module):
    def __init__(self, **kwargs):
        super().__init__()

        self.layer = nn.Linear(784, 16)

    def forward(self, x):
        x = self.layer(x)

        return x

The two other things that are noteworthy here are the layer attribute that we’re creating and the forward function. Creating a class and making it inherit from nn.Module is only half the work, we also need to declare the number of layers and what kind of layers our network will have. In the case of the simple encoder, we are just going to create a single fully connected layer, that can take as input some tensor of a specific shape and outputs another tensor with a different shape. In our case we’re creating a layer that takes as input a tensor with 784 dimensions and outputs a tensor with 16 dimensions.

Next up is the forward method. The forward method specifies how our neural network is connected, or ‘wired’ internally if you will. You can imagine the input to the forward method as the input to our neural network, here the input is designated by the variable ‘x’. x is then immediately passed as an input to the layer that we declared in our constructor, this means that we’re passing it through that layer and receive the output of that layer in x again, and we return it. Whatever we return from the forward method, will be the output of our neural network.

We rinse and repeat the same steps for the decoder, except that the layer in the decoder has the input and output shapes reversed. This essentially means that we’re creating two separate neural networks, one that is the encoder and another that is the decoder.

class Decoder(nn.Module):
    def __init__(self, **kwargs):
        super().__init__()

        self.layer = nn.Linear(16, 784)

    def forward(self, x):
        x = self.layer(x)

        return x

Chaining Encoder and Decoder Together

Now we have two separate nets that should ultimately comprise our entire pipeline, in this manner we need to chain them together. We can do this by creating a 3rd class, that also inherits nn.Module, and which holds two member variables, that are instances of the Encoder and Decoder that we just created. This showcases the amazing modularity of pytorch, which allows us to integrate one network as a part of another network. You can also see the forward function, which again as before will pass the input x through the encoder, pass it on to the decoder and then ultimately output it.

class AE(nn.Module):
    def __init__(self, **kwargs):
        super().__init__()

        self.encoder = Encoder()
        self.decoder = Decoder()

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)

        return x

Training Setup

We’ve created and wired our network, now we still need to set up some stuff before we can begin training. Ideally we want to train on a GPU to accelerate this process, and since we’re on google colab, we can run our model on the provided GPU with following statements:

#  use gpu if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# create a model from `AE` autoencoder class
# load it to the specified device, either gpu or cpu
model = AE().to(device)

Next up we need to specify an optimizer and a criterion. The optimizer is the procedure by which we improve our neural network during training, the criterion is the metric that calculates how well or how badly we performed each training step. We’ll utilize the most standard optimizer and criterion, which are the Adam optimizer and the Mean Square Error Loss:

# create an optimizer object
# Adam optimizer with learning rate 1e-3
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# mean-squared error loss
criterion = nn.MSELoss()

The mean square error loss essentially compares the pixel values of the output to those of the input.

Preparing the Dataset

To train our network we also still need some data. We’ll train our dataset on the good old MNIST dataset that consists of greyscale images of handwritten digits that are 28x28 in dimension. Downloading it is also straightforward,

transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

train_dataset = torchvision.datasets.MNIST(
    root="~/torch_datasets", train=True, transform=transform, download=True
)

test_dataset = torchvision.datasets.MNIST(
    root="~/torch_datasets", train=False, transform=transform, download=True
)

You can see that we’re actually downloading two datasets, one for training and another for testing. We also need to declare two data loaders, which will allow us to conveniently feed our dataset to our network:

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True
)

test_loader = torch.utils.data.DataLoader(
    test_dataset, batch_size=32, shuffle=False, num_workers=4
)

The training Loop

Now we’ve completed all the setup and can start training. The training loop is the most integral component of our code and we’ll go through it step by step. Firstly, we’ll create a main loop that will specify the number of epochs that we’ll train for. Every epoch our network will see the entire dataset, not all at once, but rather in small portions. The small portions are called batches. This is where the data-loaders, that we previously created, come into play. We create a loop statement that gets a batch of ‘features’ (input images), to pass to the auto encoder.

epochs = 5
for epoch in range(epochs):
    loss = 0
    for batch_features, _ in train_loader:
        #training loop steps go here

These batches of data need to be flattened such that they have the exact shape that we specified earlier when we constructed our encoder layer. In this manner we flatten the 28x28 pixel images into a linear tensor of 784 dimensions. We can do this with the view() function in pytorch. We also need to make sure that we send this batch of data to device that we’re training on (GPU, same as network):

# reshape mini-batch data to [N, 784] matrix
# load it to the active device
batch_features = batch_features.view(-1, 784).to(device)

Another important function call that should generally be made before we feed the batch to our network, depending on the type of neural network you’re training and what you’re trying to achieve, is ‘zero_grad()’ on the optimizer. This function call resets the gradients to zero before backpropagation. Otherwise we would be accumulating the gradients, which is useful when training RNNs for example.

# reset the gradients back to zero
# PyTorch accumulates gradients on subsequent backward passes
optimizer.zero_grad()

Next, we pass the batch to our network such that we perform a forward pass, and receive what comes out on the other end in the outputs variable. After that we need to obtain the training loss by comparing input and outputs with our specified criterion. Note that train_loss here, is not just a number but rather a tensor that has functionality:

# compute reconstructions
outputs = model(batch_features)

# compute training reconstruction loss
train_loss = criterion(outputs, batch_features)

After obtaining the training loss for a specific batch, we perform backpropagation and perform an optimizer step.

# compute accumulated gradients
train_loss.backward()

# perform parameter update based on current gradients
optimizer.step()

We also have a variable that accumulates the loss over every batch, which we then average outside the inner loop and print to screen to verify that the network is improving and that the overall loss is decreasing:

    # add the mini-batch training loss to epoch loss
    loss += train_loss.item()

### Outside the batch loop ###
# compute the epoch training loss
loss = loss / len(train_loader)

# display the epoch training loss
print("epoch : {}/{}, loss = {:.6f}".format(epoch + 1, epochs, loss))

The training loop in it’s entirety will look like this:

epochs = 5
for epoch in range(epochs):
    loss = 0
    for batch_features, _ in train_loader:
        # reshape mini-batch data to [N, 784] matrix
        # load it to the active device
        batch_features = batch_features.view(-1, 784).to(device)

        # reset the gradients back to zero
        # PyTorch accumulates gradients on subsequent backward passes
        optimizer.zero_grad()

        # compute reconstructions
        outputs = model(batch_features)

        # compute training reconstruction loss
        train_loss = criterion(outputs, batch_features)

        # compute accumulated gradients
        train_loss.backward()

        # perform parameter update based on current gradients
        optimizer.step()

        # add the mini-batch training loss to epoch loss
        loss += train_loss.item()

    # compute the epoch training loss
    loss = loss / len(train_loader)

    # display the epoch training loss
    print("epoch : {}/{}, loss = {:.6f}".format(epoch + 1, epochs, loss))

Testing the model

Lastly, we also want to see how well our model performs qualitatively, we can do this by looking at some of the outputs that our model outputs and compare them with the original input. We’ll need to import the matplotlib library for that and use our test_loader, which holds data that was not included in the training dataset. This will give us a good idea how well our model can generalize to new data:

import matplotlib.pyplot as plt
for batch_features, _ in test_loader:
    batch_features = batch_features.view(-1, 784).to(device)  
    outputs = model(batch_features)
    #print(outputs.shape)

    print('IMAGE')

    plt.subplot(211)
    plt.imshow(batch_features[0].cpu().detach().view(28, 28).numpy())

    plt.subplot(212)
    plt.imshow(outputs[0].cpu().detach().view(28, 28).numpy())


    plt.show()