Building the Neural Network Framework
Project 6 – Part 1
Building the Neural Network Framework (Code Walkthrough)
- Part 1 introduced the architecture and file structure.
- Part 2 is where we implement the core components.
You’ve already seen the math in Section 1. Now you’ll see how that math becomes code in a modular, reusable form that mirrors how real frameworks like PyTorch and TensorFlow are built.
Before we write any code, here is the big picture of how the pieces fit together.
DenseLayer
Implements the math from Project 2
Stores weights, biases, z, a, x
Computes forward and backward
Updates its own parameters
SequentialModel
A container that stacks layers
Runs forward through all layers
Runs backward in reverse
Provides predict() and summary()
Trainer
Handles batching
Runs the training loop
Computes loss
Calls backward
Updates parameters
This is the same structure used in modern deep‑learning frameworks.
How Projects 1–5 Connect to This Framework
Project 2 → DenseLayer.forward()
You learned Wx + b. That is exactly the forward pass of a Dense layer.
Project 3 → activation functions + BCE
Sigmoid and its derivative now live in activations.py.
Project 4 → SequentialModel
You manually stacked layers for XOR. Now it is automated.
Projects 1–3 → Trainer
You wrote the learning loop three times. Now it works for any model.
Project 5 → matrix shapes
You learned how shapes flow through a network. Now you are using that knowledge to build layers.
This is the bridge between Section 1 (math‑first learning) and Section 2 (real neural network engineering).
activations.py
Activation functions are pure functions. They do not store state, so they live in their own file.
These functions introduce nonlinearity, which is the key to solving problems like XOR.
Code
import numpy as np
# Utility: numerical stability helpers
def _clip(z, min_val=-500, max_val=500):
return np.clip(z, min_val, max_val)
def sigmoid(z):
z = _clip(z)
return 1 / (1 + np.exp(-z))
def sigmoid_deriv(z):
s = sigmoid(z)
return s * (1 - s)
# ReLU
def relu(z):
return np.maximum(0, z)
def relu_deriv(z):
return (z > 0).astype(float)
Other activation functions (tanh, leaky ReLU, ELU, softplus, softmax) are included in the GitHub repo.
Activation Functions and examples:
losses.py
Loss functions measure how wrong the model is.
Their derivatives tell the model how to fix its mistakes.
Code
import numpy as np def mse(y_hat, y): return 0.5 * np.mean((y_hat - y) ** 2) def mse_deriv(y_hat, y): return (y_hat - y) def binary_cross_entropy(y_hat, y, eps=1e-10): y_hat = np.clip(y_hat, eps, 1 - eps) loss = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat)) return np.mean(loss) def binary_cross_entropy_deriv(y_hat, y, eps=1e-10): return (y_hat - y) def softmax_cross_entropy(logits, y_true, eps=1e-10): # logits: (batch, num_classes) # y_true: (batch,) integer labels shifted = logits - np.max(logits, axis=1, keepdims=True) exp_vals = np.exp(shifted) probs = exp_vals / np.sum(exp_vals, axis=1, keepdims=True) probs = np.clip(probs, eps, 1 - eps) correct_logprobs = -np.log(probs[np.arange(len(y_true)), y_true]) return np.mean(correct_logprobs) def softmax_cross_entropy_deriv(logits, y_true): shifted = logits - np.max(logits, axis=1, keepdims=True) exp_vals = np.exp(shifted) probs = exp_vals / np.sum(exp_vals, axis=1, keepdims=True) one_hot = np.zeros_like(probs) one_hot[np.arange(len(y_true)), y_true] = 1.0 return probs - one_hot
Sigmoid + BCE simplifies to (y_hat - y).
This is why logistic regression gradients were so clean in Project 3.
layers.py
This is the heart of the framework.
A Dense layer is just the generalization of the linear models you built in Projects 2–4:
XW + b → activation
Instead of one output, we compute h outputs.
W has shape (out_dim, in_dim).
The output has dimension out_dim.
Before building DenseLayer, we define a simple Layer class.
Code
import numpy as np class Layer: def forward(self, x): raise NotImplementedError def backward(self, grad_output): raise NotImplementedError def apply_gradients(self, lr): pass class DenseLayer(Layer): def __init__(self, in_dim, out_dim, activation, activation_deriv): # Kaiming for ReLU, Xavier for Sigmoid if activation == 'sigmoid': self.w = np.random.randn(out_dim, in_dim) * np.sqrt(1.0 / in_dim) else: self.w = np.random.randn(out_dim, in_dim) * np.sqrt(2.0 / in_dim) self.b = np.zeros((1, out_dim)) self.activation = activation self.activation_deriv = activation_deriv self.grad_w_accum = np.zeros_like(self.w) self.grad_b_accum = np.zeros_like(self.b) self.x = None self.z = None self.a = None def forward(self, x): """ x: (batch, in_dim) returns: (batch, out_dim) """ self.x = x self.z = x @ self.w.T + self.b self.a = self.activation(self.z) return self.a def backward(self, grad_output): """ grad_output: (batch, out_dim) returns: (batch, in_dim) """ local_grad = self.activation_deriv(self.z) # (batch, out_dim) delta = grad_output * local_grad # (batch, out_dim) self.grad_w_accum += delta.T @ self.x # (out_dim, in_dim) self.grad_b_accum += np.sum(delta, axis=0, keepdims=True) # (1, out_dim) return delta @ self.w # (batch, in_dim) def apply_gradients(self, lr): self.w -= lr * self.grad_w_accum self.b -= lr * self.grad_b_accum self.grad_w_accum.fill(0) self.grad_b_accum.fill(0)
Backward pass:
Uses the same backprop math you derived for XOR.
model.py
SequentialModel is where layers become a network.
It simply stores layers in order and runs them in sequence.
Code
import numpy as np class SequentialModel: def __init__(self, layers): self.layers = layers def forward(self, x): """ x: (batch, in_dim) returns: (batch, out_dim) """ a = x for layer in self.layers: a = layer.forward(a) return a def backward(self, grad_output): """ grad_output: (batch, out_dim) """ grad = grad_output for layer in reversed(self.layers): grad = layer.backward(grad) return grad def predict(self, X): """ X: (N, in_dim) returns: (N, out_dim) """ a = X for layer in self.layers: a = layer.forward(a) return a def summary(self): print("Model Summary:") print("==============") for i, layer in enumerate(self.layers): name = layer.__class__.__name__ if hasattr(layer, "w"): print(f"Layer {i}: {name} | Weights: {layer.w.shape} | Biases: {layer.b.shape}") else: print(f"Layer {i}: {name}") print("==============")
Forward: runs each layer in order
Backward: runs each layer in reverse
Predict: forward pass without storing gradients
trainer.py
The Trainer handles the learning loop.
It does not know anything about layers or math.
It simply calls model.forward(), computes loss, and calls model.backward().
Code
import numpy as np class Trainer: def __init__(self, model, loss_fn, loss_deriv, lr=0.001): self.model = model self.loss_fn = loss_fn self.loss_deriv = loss_deriv self.lr = lr self.loss_history = [] def train(self, X, y, epochs=1000, batch_size=1, log_interval=100): n = len(X) for epoch in range(epochs): indices = np.random.permutation(n) for start in range(0, n, batch_size): end = start + batch_size batch_idx = indices[start:end] X_batch = X[batch_idx] # (batch, in_dim) y_batch = y[batch_idx] # (batch, 1) # forward y_hat = self.model.forward(X_batch) # (batch, 1) # loss batch_loss = self.loss_fn(y_hat, y_batch) self.loss_history.append(batch_loss) # backward grad_output = self.loss_deriv(y_hat, y_batch) # (batch, 1) self.model.backward(grad_output) # update for layer in self.model.layers: if hasattr(layer, "apply_gradients"): layer.apply_gradients(self.lr) if epoch % log_interval == 0: print(f"Epoch {epoch}: Loss = {float(batch_loss):.6f}") def evaluate(self, X, y, classification=False): preds = self.model.predict(X) # (N, 1) for XOR loss = self.loss_fn(preds, y) if classification: # binary classification preds_class = (preds > 0.5).astype(int) accuracy = np.mean(preds_class.flatten() == y.flatten()) return loss, accuracy return loss def get_loss_history(self): return np.array(self.loss_history)
Connecting Everything Together
Here is the final workflow:
model = SequentialModel([
DenseLayer(2, 6, relu, relu_deriv),
DenseLayer(6, 3, relu, relu_deriv),
DenseLayer(3, 1, sigmoid, sigmoid_deriv)
])
Train:
trainer = Trainer(model, binary_cross_entropy, binary_cross_entropy_deriv, lr=0.01)
trainer.train(X_train, y_train, epochs=2000, batch_size=4)
Evaluate:
loss, acc = trainer.evaluate(X_test, y_test, classification=True)
Save:
save_model(model, "model.pkl")
Predict:
prediction = predict([5, 7])
This is the exact workflow used in real deep‑learning frameworks.
What’s Coming in Part 2
Part 2 will cover:
building a dataset
training a real model
plotting loss curves
evaluating accuracy
saving and loading
running inference
comparing to PyTorch
This is where the framework becomes fun to use.
Comments
Post a Comment