Back to articles

ML Series: A Practical Tutorial on Building a Convolutional Neural Network (CNN)

Hanif

Welcome back to the ML Series, In the previous article in this series we implemented a Multiplayer Perceptron Model, that was an introduction into Neural Networks using a very basic architecture. Today we are going to be implementing a Convolutional Neural Network (CNN) using pyTorch and testing using the MNIST handwriting dataset.

Introduction to CNNs

Imagine a computer program that can learn to recognize objects in pictures just by looking at a bunch of examples. That's the magic of Convolutional Neural Networks (CNNs)!

CNNs are a special type of artificial neural network inspired by the structure of the visual cortex in animals. They excel at tasks involving images and videos. The image fed into the Network passes through convolutional layers that act like filters to extract essential features such as edges, textures, and patterns. These features become progressively complex as they move through multiple layers.

The output from these layers is then flattened and fed into fully connected layers, which function similarly to a traditional neural network to perform classification. Ultimately, the CNN processes and learns from the image features, enabling it to output classifications such as identifying objects or recognizing handwritten digits.

This combination of feature extraction and classification within a single framework makes CNNs powerful tools for image recognition tasks.

Components of a CNN

A Convolutional Neural Network makes use of two very distinct operations:

Convolutional Layers: These layers use filters (like tiny templates) to scan an image, identifying patterns and extracting features like edges, shapes, and textures.

Pooling Layers: These layers reduce the image size while retaining important features, making the network more efficient. The trade off of pooling is that, too much and important information can be lost in the process

Some other components of the Convolutional Network are:

Activation Function: An activation function (e.g., ReLU, sigmoid, tanh) is applied element wise to the output of the convolutional layer. It introduces non-linearity, allowing the network to learn complex patterns.

Fully Connected Layer: Transforms the output of the previous layers into a single, streamlined list (vector). Then, it applies fully connected operations, much like a multilayered perceptron, to make classifications or predictions.

Softmax or Output Layer: Generates the final output probabilities or values for the target classes or outputs

Importing Required Packages

So, first of all, we are going to import all the packages needed for our model

python
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torchvision
from keras.datasets import mnist
from mlxtend.data import loadlocal_mnist
%matplotlib inline
import os

Loading the MNIST Dataset

We are going to be using the MNIST dataset from Kaggle, you can go ahead and download it here. The next step is to divide it into test and train variables

python
training_images_filepath = '/kaggle/input/d/hojjatk/mnist-dataset/train-images.idx3-ubyte'
training_labels_filepath = '/kaggle/input/d/hojjatk/mnist-dataset/train-labels.idx1-ubyte'
test_images_filepath = '/kaggle/input/d/hojjatk/mnist-dataset/t10k-images.idx3-ubyte'
test_labels_filepath = '/kaggle/input/d/hojjatk/mnist-dataset/t10k-labels.idx1-ubyte'

X_train, y_train = loadlocal_mnist(training_images_filepath, training_labels_filepath)
X_test, y_test = loadlocal_mnist(test_images_filepath, test_labels_filepath)

Previewing the Dataset

To preview some of the handwritten text, we would use the show function, that takes in random five images from the dataset using the plt.imshow function.

python
def show(image, title):
    index = 1
    plt.figure(figsize=(10,5))
    for x in zip(image, title):
        image = x[0]
        title = x[1]
        plt.subplot(2, 5, index)
        plt.imshow(image, cmap=plt.cm.gray)
        plt.title(x[1], fontsize = 9)
        index += 1

image = []
title = []

X_train_reshape = X_train.reshape(60000, 28, 28)
X_test_reshape = X_test.reshape(10000, 28, 28)

for i in range(0, 5):
    r = random.randint(1, len(X_train_reshape))
    image.append(X_train_reshape[r])
    title.append('training image:' + str(y_train[r]))

for i in range(0, 5):
    r = random.randint(1, len(X_test_reshape))
    image.append(X_test_reshape[r])
    title.append('testing image:' + str(y_test[r]))

show(image, title)

Creating DataLoader

In machine learning, especially with large datasets, efficiently managing and processing data is crucial for training and evaluation. PyTorch's DataLoader is a powerful utility that simplifies this task by creating an iterable over a dataset, facilitating the handling of batches of data. It enables automatic batching, shuffling, and multi-threaded data loading, which accelerates the training process and improves model performance.

By converting data into manageable chunks and supporting parallel data processing, the DataLoader ensures that the training loop operates smoothly, even with extensive and complex datasets, thereby optimizing the computational workflow and enhancing model accuracy.

The CustomDataset class wraps your feature and label arrays into a format that PyTorch can understand. The DataLoader objects make it easy to iterate over the dataset in batches, which is essential for training and evaluating machine learning models efficiently. train_loader and test_loader allow you to load the training and testing data in batches, handle shuffling, and provide data in a format ready for training loops or evaluation

python
import torch
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.long)
    
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

# Create train and test datasets
train_dataset = CustomDataset(X_train, y_train)
test_dataset = CustomDataset(X_test, y_test)

# Create train and test loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

Setting Random Seed

To ensure that each run of the code with the same seed yields the same results, we are going to be setting seed for NumPy and PyTorch

Consistency: Facilitates consistent model training and evaluation by fixing the randomness across different libraries and environments.

python
def set_seed(seed):
    # Set the seed for Python's built-in random module
    random.seed(seed)
    # Set the seed for NumPy
    np.random.seed(seed)
    # Set the seed for PyTorch
    torch.manual_seed(seed)
    # If using GPU
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        # Ensure deterministic behavior in CUDA operations
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

seed = 42
set_seed(seed)

Defining the CNN Model

Then we define our CNN model, with three convolutional layers, and two fully connected layers, the activation function used in all the convolutional layers and the first fully connected layer is the reLu activation function, then a max pooling with a stride of two right before the fully connected layer. The output was passed through a Log softmax activation function because of it is a classification task.

python
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
        # Fully connected layers
        self.fc1 = nn.Linear(21632, 128)  # Calculate input size after flattening
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        
        x = F.max_pool2d(x, 2)
        x = F.relu(self.fc1(x.view(-1, 21632)))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

Initializing Model and Optimizer

Next step is to initialize our CNN model, define our optimizer with weight decay, and we would be using Cross Entropy Loss function. Then to speed up the training time for the model, we check if GPU is available on the device, then we load the data and model to the GPU (if available)

python
# Initialize the model
model = CNN()

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer with weight decay (L2 regularization)
optimizer_with_weight_decay = optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01)

# Check if CUDA is available and move model and data to GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Move model to device
model.to(device)

# Move data to device
for data, target in train_loader:
    data, target = data.to(device), target.to(device)

Training the Model

Time to train our model! The model would be trained by going through a forward pass, then reshaped and the passed through an optimizer and loss function. Then back propagation is performed on the model, and the average loss for each epoch is calculated and displayed in a graph.

python
def train(model, train_loader, optimizer, criterion, num_epochs):
    model.train()
    losses = []
    average_losses = []
    for epoch in range(num_epochs):
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)  # Move data to GPU
            data = data.unsqueeze(1)  # Add a channel dimension (channels = 1 for grayscale images)
            data = data.view(-1, 1, 28, 28)  # Reshape the data to match the expected input shape
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            if batch_idx % 100 == 0:
                losses.append(loss.item())
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(data), len(train_loader.dataset),
                    100. * batch_idx / len(train_loader), loss.item()))
        average_losses = sum(losses) / len(losses)
        losses = []
    
    plt.plot(average_losses, label=f'11 Epochs')
    plt.xlabel('Iterations')
    plt.ylabel('Loss')
    plt.title('Training Loss')
    plt.legend()
    plt.show()

Testing the Model

We can test our CNN model on the test dataset. For each batch we passes the data through the model to get predictions. We then calculate the loss using a criterion, sum the loss over all batches, and determine the model's accuracy by comparing predicted labels with the actual labels.

Finally, it prints the average loss and overall accuracy, offering insight into the model's performance on unseen data. Which was 0.0003 for the average loss and a 99% accuracy!

python
# Test function
def test(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)  # Move data to the same device as the model
            data = data.unsqueeze(1)  # Add a channel dimension (channels = 1 for grayscale images)
            data = data.view(-1, 1, 28, 28)  # Reshape the data to match the expected input shape
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset), accuracy))

Running Training and Testing

With our Convolutional Neural Network (CNN) fully defined, we can proceed to train and test our model. We'll train the model for 11 epochs, allowing it to learn progressively from the data.

While a higher number of epochs generally helps the model capture intricate patterns and improve accuracy, it's crucial to balance this with the risk of overfitting, where the model performs well on training data but poorly on new, unseen data. Thus, careful monitoring and adjustment of epochs, along with other hyperparameters, ensure optimal model performance.

python
train(model, train_loader, optimizer_with_weight_decay, criterion, 11)
test(model, test_loader)

Live Model hosted on Hugging Face spaces

In order to visualize our handwriting model, I created a live demo on hugging face, which you can feel free to play with using the link below.

Handwriting Classifier - hanif-cnn-handwriting-classifier.hf.space

Final thoughts

In this article, we have walked through the essential steps to build, train, and evaluate a Convolutional Neural Network using PyTorch, tailored for recognizing handwritten digits. From data preparation to defining the network architecture, training with appropriate techniques, and finally evaluating performance, each step plays a vital role in developing a robust machine learning model.

CNNs, with their powerful ability to capture spatial hierarchies in images, continue to be pivotal in various applications beyond digit recognition, such as object detection, image segmentation, and beyond. By understanding and applying these concepts, one can leverage CNNs to tackle complex visual recognition tasks.