Building the Neural Network Framework

 Project 6 – Part 1

Building the Neural Network Framework (Code Walkthrough)

  • Part 1 introduced the architecture and file structure.
  • Part 2 is where we implement the core components.

You’ve already seen the math in Section 1. Now you’ll see how that math becomes code in a modular, reusable form that mirrors how real frameworks like PyTorch and TensorFlow are built.

Before we write any code, here is the big picture of how the pieces fit together.

DenseLayer

  • Implements the math from Project 2

  • Stores weights, biases, z, a, x

  • Computes forward and backward

  • Updates its own parameters

SequentialModel

  • A container that stacks layers

  • Runs forward through all layers

  • Runs backward in reverse

  • Provides predict() and summary()

Trainer

  • Handles batching

  • Runs the training loop

  • Computes loss

  • Calls backward

  • Updates parameters

This is the same structure used in modern deep‑learning frameworks.


How Projects 1–5 Connect to This Framework

Project 2 → DenseLayer.forward()
You learned Wx + b. That is exactly the forward pass of a Dense layer.

Project 3 → activation functions + BCE
Sigmoid and its derivative now live in activations.py.

Project 4 → SequentialModel
You manually stacked layers for XOR. Now it is automated.

Projects 1–3 → Trainer
You wrote the learning loop three times. Now it works for any model.

Project 5 → matrix shapes
You learned how shapes flow through a network. Now you are using that knowledge to build layers.

This is the bridge between Section 1 (math‑first learning) and Section 2 (real neural network engineering).


activations.py

Activation functions are pure functions. They do not store state, so they live in their own file.

These functions introduce nonlinearity, which is the key to solving problems like XOR.

Code

import numpy as np


# Utility: numerical stability helpers

def _clip(z, min_val=-500, max_val=500):

    return np.clip(z, min_val, max_val)


def sigmoid(z):

    z = _clip(z)

    return 1 / (1 + np.exp(-z))


def sigmoid_deriv(z):

    s = sigmoid(z)

    return s * (1 - s)


# ReLU

def relu(z):

    return np.maximum(0, z)


def relu_deriv(z):

    return (z > 0).astype(float)

Other activation functions (tanh, leaky ReLU, ELU, softplus, softmax) are included in the GitHub repo.

Activation Functions and examples:


losses.py

Loss functions measure how wrong the model is.
Their derivatives tell the model how to fix its mistakes.

Code

import numpy as np def mse(y_hat, y): return 0.5 * np.mean((y_hat - y) ** 2) def mse_deriv(y_hat, y): return (y_hat - y) def binary_cross_entropy(y_hat, y, eps=1e-10): y_hat = np.clip(y_hat, eps, 1 - eps) loss = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat)) return np.mean(loss) def binary_cross_entropy_deriv(y_hat, y, eps=1e-10): return (y_hat - y) def softmax_cross_entropy(logits, y_true, eps=1e-10): # logits: (batch, num_classes) # y_true: (batch,) integer labels shifted = logits - np.max(logits, axis=1, keepdims=True) exp_vals = np.exp(shifted) probs = exp_vals / np.sum(exp_vals, axis=1, keepdims=True) probs = np.clip(probs, eps, 1 - eps) correct_logprobs = -np.log(probs[np.arange(len(y_true)), y_true]) return np.mean(correct_logprobs) def softmax_cross_entropy_deriv(logits, y_true): shifted = logits - np.max(logits, axis=1, keepdims=True) exp_vals = np.exp(shifted) probs = exp_vals / np.sum(exp_vals, axis=1, keepdims=True) one_hot = np.zeros_like(probs) one_hot[np.arange(len(y_true)), y_true] = 1.0 return probs - one_hot



Sigmoid + BCE simplifies to (y_hat - y).
This is why logistic regression gradients were so clean in Project 3.


layers.py

This is the heart of the framework.

A Dense layer is just the generalization of the linear models you built in Projects 2–4:

XW + b → activation

Instead of one output, we compute h outputs.
W has shape (out_dim, in_dim).
The output has dimension out_dim.

Before building DenseLayer, we define a simple Layer class.

Code

import numpy as np class Layer: def forward(self, x): raise NotImplementedError def backward(self, grad_output): raise NotImplementedError def apply_gradients(self, lr): pass class DenseLayer(Layer): def __init__(self, in_dim, out_dim, activation, activation_deriv): # Kaiming for ReLU, Xavier for Sigmoid if activation == 'sigmoid': self.w = np.random.randn(out_dim, in_dim) * np.sqrt(1.0 / in_dim) else: self.w = np.random.randn(out_dim, in_dim) * np.sqrt(2.0 / in_dim) self.b = np.zeros((1, out_dim)) self.activation = activation self.activation_deriv = activation_deriv self.grad_w_accum = np.zeros_like(self.w) self.grad_b_accum = np.zeros_like(self.b) self.x = None self.z = None self.a = None def forward(self, x): """ x: (batch, in_dim) returns: (batch, out_dim) """ self.x = x self.z = x @ self.w.T + self.b self.a = self.activation(self.z) return self.a def backward(self, grad_output): """ grad_output: (batch, out_dim) returns: (batch, in_dim) """ local_grad = self.activation_deriv(self.z) # (batch, out_dim) delta = grad_output * local_grad # (batch, out_dim) self.grad_w_accum += delta.T @ self.x # (out_dim, in_dim) self.grad_b_accum += np.sum(delta, axis=0, keepdims=True) # (1, out_dim) return delta @ self.w # (batch, in_dim) def apply_gradients(self, lr): self.w -= lr * self.grad_w_accum self.b -= lr * self.grad_b_accum self.grad_w_accum.fill(0) self.grad_b_accum.fill(0)


Backward pass:
Uses the same backprop math you derived for XOR.


model.py

SequentialModel is where layers become a network.

 It simply stores layers in order and runs them in sequence.

Code

import numpy as np class SequentialModel: def __init__(self, layers): self.layers = layers def forward(self, x): """ x: (batch, in_dim) returns: (batch, out_dim) """ a = x for layer in self.layers: a = layer.forward(a) return a def backward(self, grad_output): """ grad_output: (batch, out_dim) """ grad = grad_output for layer in reversed(self.layers): grad = layer.backward(grad) return grad def predict(self, X): """ X: (N, in_dim) returns: (N, out_dim) """ a = X for layer in self.layers: a = layer.forward(a) return a def summary(self): print("Model Summary:") print("==============") for i, layer in enumerate(self.layers): name = layer.__class__.__name__ if hasattr(layer, "w"): print(f"Layer {i}: {name} | Weights: {layer.w.shape} | Biases: {layer.b.shape}") else: print(f"Layer {i}: {name}") print("==============")



Forward: runs each layer in order
Backward: runs each layer in reverse
Predict: forward pass without storing gradients


trainer.py

The Trainer handles the learning loop.
It does not know anything about layers or math.
It simply calls model.forward(), computes loss, and calls model.backward().

Code

import numpy as np class Trainer: def __init__(self, model, loss_fn, loss_deriv, lr=0.001): self.model = model self.loss_fn = loss_fn self.loss_deriv = loss_deriv self.lr = lr self.loss_history = [] def train(self, X, y, epochs=1000, batch_size=1, log_interval=100): n = len(X) for epoch in range(epochs): indices = np.random.permutation(n) for start in range(0, n, batch_size): end = start + batch_size batch_idx = indices[start:end] X_batch = X[batch_idx] # (batch, in_dim) y_batch = y[batch_idx] # (batch, 1) # forward y_hat = self.model.forward(X_batch) # (batch, 1) # loss batch_loss = self.loss_fn(y_hat, y_batch) self.loss_history.append(batch_loss) # backward grad_output = self.loss_deriv(y_hat, y_batch) # (batch, 1) self.model.backward(grad_output) # update for layer in self.model.layers: if hasattr(layer, "apply_gradients"): layer.apply_gradients(self.lr) if epoch % log_interval == 0: print(f"Epoch {epoch}: Loss = {float(batch_loss):.6f}") def evaluate(self, X, y, classification=False): preds = self.model.predict(X) # (N, 1) for XOR loss = self.loss_fn(preds, y) if classification: # binary classification preds_class = (preds > 0.5).astype(int) accuracy = np.mean(preds_class.flatten() == y.flatten()) return loss, accuracy return loss def get_loss_history(self): return np.array(self.loss_history)




Connecting Everything Together

Here is the final workflow:

model = SequentialModel([

    DenseLayer(2, 6, relu, relu_deriv),

    DenseLayer(6, 3, relu, relu_deriv),

    DenseLayer(3, 1, sigmoid, sigmoid_deriv)

])


Train:

  • trainer = Trainer(model, binary_cross_entropy, binary_cross_entropy_deriv, lr=0.01)

  • trainer.train(X_train, y_train, epochs=2000, batch_size=4)


Evaluate:

  • loss, acc = trainer.evaluate(X_test, y_test, classification=True)


Save:

  • save_model(model, "model.pkl")


Predict:

  • prediction = predict([5, 7])


This is the exact workflow used in real deep‑learning frameworks.


What’s Coming in Part 2

Part 2 will cover:

  • building a dataset

  • training a real model

  • plotting loss curves

  • evaluating accuracy

  • saving and loading

  • running inference

  • comparing to PyTorch

This is where the framework becomes fun to use.



Comments

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives