ML Series: The Basic Structure of a Neural Network: A step-by-step Tutorial in Python
Neural networks are algorithms created explicitly to simulate biological neural networks. The idea behind the Neural networks was to created an artificial system that would function like the human brain. At the basic level a Multi Layer Perceptron (MLP) is the simplest form of a Neural Network.
Introduction
Neural networks are algorithms created explicitly to simulate biological neural networks. The idea behind the Neural networks was to created an artificial system that would function like the human brain.
At the basic level a Multi Layer Perceptron (MLP) is the simplest form of a Neural Network.
MLPs were initially inspired by the Perceptron, a supervised machine learning algorithm for binary classification. The Perceptron was only capable of handling linearly separable data hence the multi-layer perception was introduced to overcome this limitation.
An MLP is an artificial neural network and it consists of an input layer, one or more hidden layers and an output layer, an activation function and a set of weights and biases:
In this article, we will implement a basic neural network from scratch making use of mainly the NumPy library.
The goal is to implement a Multilayer Perceptron, with an input layer, two hidden layers and an output layer. We will be making use of the Softmax activation function for the output and ReLu activation function for the hidden layers. The steps we are going to be taken are:
1. Data Preparation
2. Initialize parameters
3. Define activation functions
4. Initialize weights and biases
5. Forward Propagation
6. Backward Propagation
7. Train the Neural Network
8. Evaluate the accuracy on test data
9. Make Prediction
1. Data Preparation
We are going to be utilizing the iris dataset from sklearn, it is a simple dataset with, which has four features (Sepal Length, Sepal Width, Petal Length, Petal Width) and three flower classes (Iris Setosa, Iris Versicolor, Iris Virginica)
import numpy as np
import copy
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1,1)2. Initialize Parameters
The next step is to use one hot encoding for the y values, and the reason we are doing this is to convert the categorical labels into numerical vectors, making them compatible with our neural network. So we are going to split the data into test and training sets
def one_hot_encode(y):
maps = {0: [1., 0., 0.], 1: [0., 1., 0.], 2: [0., 0., 1.]}
new_y = []
for i in y:
new_y.append(maps[i[0]])
return np.array(new_y)
y_onehot = one_hot_encode(y)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42)3. Define activation functions
Activation functions are functions that take in the in weighted sum of inputs and then, gives a real number (regression) or binary (classification). They are used to introduce a level of non-linearity in a network, which allows it to learn complex patterns.
For our neural network, we are going to be using the ReLU activation function for our hidden layers, and the Softmax activation function for the output layer.
def relu(x):
return np.maximum(0, x)
def softmax(x):
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return e_x / np.sum(e_x, axis=1, keepdims=True)
# Define derivative of activation function (ReLU derivative)
def relu_derivative(x):
return np.where(x > 0, 1, 0)4. Initialize weights and biases
Here we initialized the weights and biases of each layer to train the neural network. The random module from Numpy takes care of that.
# Step 4: Initialize weights and biases
def initialize_parameters(hidden_size, hidden_size2):
# Define the network architecture
input_size = X_train.shape[1] # 4 features
output_size = y_train.shape[1] # 3 unique labels
# Initialize weights and biases
np.random.seed(42)
weights = {
'h1': np.random.uniform(-0.1, 0.1, (input_size, hidden_size)),
'h2': np.random.uniform(-0.1, 0.1, (hidden_size, hidden_size2)),
'out': np.random.uniform(-0.1, 0.1, (hidden_size2, output_size))
}
biases = {
'h1': np.random.uniform(-0.1, 0.1, hidden_size),
'h2': np.random.uniform(-0.1, 0.1, hidden_size2),
'out': np.random.uniform(-0.1, 0.1, output_size)
}
return weights, biases5. Forward Propagation
The forward propagation algorithm takes in the weights and biases and computes the linear output and then passes it through an activation function, and then repeats it for each layer until an output value for each is obtained.
def forward(X, weights, biases):
z1 = np.dot(X, weights['h1']) + biases['h1'] # W1.X + b1
a1 = relu(z1)
z2 = np.dot(a1, weights['h2']) + biases['h2'] # W2.A1 + b2
a2 = relu(z2)
z3 = np.dot(a2, weights['out']) + biases['out'] # Wout.A2 + bout
a3 = softmax(z3)
return a1, a2, a36. Backward Propagation
The backward propagation algorithm is calculates the error of the output layer, propagating the error backward to the hidden layer, and computing the gradients using the chain rule. Then it goes a step further to update the weights and biases gotten from the back propagation.
def backward_propagation(X, y, a1, a2, a3, learning_rate, weights, biases):
m = y.shape[0]
dz3 = a3 - y
dw3 = np.dot(a2.T, dz3) / m
db3 = np.sum(dz3, axis=0) / m
dz2 = np.dot(dz3, weights['out'].T) * relu_derivative(a2)
dw2 = np.dot(a1.T, dz2) / m
db2 = np.sum(dz2, axis=0) / m
dz1 = np.dot(dz2, weights['h2'].T) * relu_derivative(a1)
dw1 = np.dot(X.T, dz1) / m
db1 = np.sum(dz1, axis=0) / m
# Update weights and biases
weights['h1'] -= learning_rate * dw1
weights['h2'] -= learning_rate * dw2
weights['out'] -= learning_rate * dw3
biases['h1'] -= learning_rate * db1
biases['h2'] -= learning_rate * db2
biases['out'] -= learning_rate * db37. Train the Neural Network
Then we move on to training the model, using the training data. The training process repeatedly performs forward propagation, and backward propagation to update the weights and biases, until the number of epochs has been reached.
def train_mlp(hidden_layer_size, hidden_layer_size2, epochs, learning_rate):
scores_so_far = {}
weights, biases = initialize_parameters(hidden_layer_size, hidden_layer_size2)
# Training the network
for epoch in range(epochs):
a1, a2, a3 = forward(X_train, weights, biases)
backward_propagation(X_train, y_train, a1, a2, a3, learning_rate, weights, biases)
predictions = np.argmax(a3, axis=1)
true_labels = np.argmax(y_train, axis=1)
curr_accuracy = accuracy_score(true_labels, predictions)
print(f'Accuracy: {curr_accuracy}, epoch: {epoch}')8. Evaluate the accuracy on test data
Based on the data passed through, the accuracy of the predicted output values would be compared against the training set. The accuracy of both the train and test datasets are displayed.
# Evaluate the model
scores_so_far[curr_accuracy] = [copy.deepcopy(weights), copy.deepcopy(biases), copy.deepcopy(a3)]
weights_new = scores_so_far[max(scores_so_far.keys())][0]
biases_new = scores_so_far[max(scores_so_far.keys())][1]
_, _, out_test = forward(X_test, weights_new, biases_new)
y_pred_train = np.argmax(scores_so_far[max(scores_so_far.keys())][2], axis=1)
train_labels = np.argmax(y_train, axis=1)
accuracy = accuracy_score(train_labels, y_pred_train)
print(f'Model Train Accuracy: {accuracy * 100:.2f}')
y_pred = np.argmax(out_test, axis=1)
test_labels = np.argmax(y_test, axis=1)
accuracy = accuracy_score(test_labels, y_pred)
print(f'Model Test Accuracy: {accuracy * 100:.2f}')
return weights, biases
#Call the train_mlp function
weights, biases = train_mlp(hidden_layer_size=90, hidden_layer_size2=55, epochs=200, learning_rate=0.1)9. Make prediction
We can then use the weights and biases from the training of the model to predict new values. Voila, we have a fully functional deep learning model!
def predict(X):
_, _, prediction = forward(X, weights, biases)
predictions = (prediction > 0.5).astype(int)
return predictions
predictions = predict(X_test)Conclusion
We have been able to successfully develop a MultiLayer Perceptron Model from ground up using mainly the numpy library. The model created is the less complex version of the deep learning models that we use interact with everyday i.e. ChatGPT, Gemini, Google assistant, Midjourney etc.
The article Building a Neural Network from Scratch using Numpy and Math Libraries: A Step-by-Step Tutorial In Python was a major source of inspiration for my article.
Thank you for reading!
The full code to the project is highlighted below 👇