A quick note on MLP implementation using numpy.
MLP step by step
Here, we implement the forward and backward propagation for both one-layer and two-layer Multi-Layer Perceptrons (MLP) using numpy. The formulas for forward and backward propagation are provided along with the corresponding Python code.
1. One-Layer MLP
Forward Propagation
Given:
- is the input matrix of shape , where is the number of samples, and is the number of input features.
- is the weight matrix of shape .
- is the bias term of shape .
Forward Pass Formula:
Where:
- is the sigmoid activation function: .
Code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
class OneLayerMLP:
def __init__(self, input_size, output_size):
# Initialize weights and biases
self.weights = np.random.randn(input_size, output_size) # Input to output weights
self.bias = np.zeros((1, output_size)) # Bias
def forward(self, X):
# Forward pass
self.X = X
self.Z = np.dot(X, self.weights) + self.bias # Z = X * W + b
self.A = sigmoid(self.Z) # Activation function output
return self.A
Backward Propagation
Loss Function: Mean Squared Error (MSE):
Where:
- is the predicted output from the network.
Gradient of Loss with Respect to Output:
Gradient with Respect to :
Gradient with Respect to Weights and Bias:
Weight and Bias Update:
Where is the learning rate.
Code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31def mean_squared_error_derivative(y_true, y_pred):
return y_pred - y_true
class OneLayerMLP:
def __init__(self, input_size, output_size):
self.weights = np.random.randn(input_size, output_size)
self.bias = np.zeros((1, output_size))
def forward(self, X):
self.X = X
self.Z = np.dot(X, self.weights) + self.bias
self.A = sigmoid(self.Z)
return self.A
def backward(self, Y, learning_rate=0.1):
m = Y.shape[0]
# Gradient for output layer
dA = mean_squared_error_derivative(Y, self.A)
dZ = dA * sigmoid_derivative(self.A)
# Gradients for weights and bias
dW = np.dot(self.X.T, dZ) / m
db = np.sum(dZ, axis=0, keepdims=True) / m
# Update weights and bias
self.weights -= learning_rate * dW
self.bias -= learning_rate * db
loss = np.mean((Y - self.A) ** 2) # MSE loss
return loss
2. Two-Layer MLP
Forward Propagation
For a two-layer MLP, the architecture consists of:
- Input layer:
- Hidden layer: with weights and bias
- Output layer: with weights and bias
- First Layer (Input to Hidden Layer):
- Second Layer (Hidden to Output Layer):
Where is the final output.
Code:1
2
3
4
5
6
7
8
9
10
11
12
13
14class TwoLayerMLP:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = np.random.randn(input_size, hidden_size) # Input to hidden weights
self.b1 = np.zeros((1, hidden_size)) # Hidden layer bias
self.W2 = np.random.randn(hidden_size, output_size) # Hidden to output weights
self.b2 = np.zeros((1, output_size)) # Output layer bias
def forward(self, X):
self.X = X
self.Z1 = np.dot(X, self.W1) + self.b1 # Hidden layer linear output
self.A1 = sigmoid(self.Z1) # Hidden layer activation
self.Z2 = np.dot(self.A1, self.W2) + self.b2 # Output layer linear output
self.A2 = sigmoid(self.Z2) # Output layer activation
return self.A2
Backward Propagation
We compute the gradients of the loss with respect to weights and biases for both layers.
Gradient for Output Layer:
Gradients for Weight and Bias :
Gradients for Hidden Layer:
To propagate the error back to the hidden layer:
Gradients for Weight and Bias :
Code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38class TwoLayerMLP:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = np.random.randn(input_size, hidden_size)
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size)
self.b2 = np.zeros((1, output_size))
def forward(self, X):
self.X = X
self.Z1 = np.dot(X, self.W1) + self.b1
self.A1 = sigmoid(self.Z1)
self.Z2 = np.dot(self.A1, self.W2) + self.b2
self.A2 = sigmoid(self.Z2)
return self.A2
def backward(self, Y, learning_rate=0.1):
m = Y.shape[0]
# Output layer gradients
dA2 = mean_squared_error_derivative(Y, self.A2)
dZ2 = dA2 * sigmoid_derivative(self.A2)
dW2 = np.dot(self.A1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
# Hidden layer gradients
dA1 = np.dot(dZ2, self.W2.T)
dZ1 = dA1 * sigmoid_derivative(self.A1)
dW1 = np.dot(self.X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
# Update weights and biases
self.W2 -= learning_rate * dW2
self.b2 -= learning_rate * db2
self.W1 -= learning_rate * dW1
self.b1 -= learning_rate * db1
loss = np.mean((Y - self.A2) ** 2) # MSE loss
return loss