Building the Neural Network Framework

 Project 6 – Part 1

Building the Neural Network Framework (Code Walkthrough)

  • Part 1 introduced the architecture and file structure.
  • Part 2 is where we implement the core components.

You’ve already seen the math in Section 1. Now you’ll see how that math becomes code in a modular, reusable form that mirrors how real frameworks like PyTorch and TensorFlow are built.

Before we write any code, here is the big picture of how the pieces fit together.

DenseLayer

  • Implements the math from Project 2

  • Stores weights, biases, z, a, x

  • Computes forward and backward

  • Updates its own parameters

SequentialModel

  • A container that stacks layers

  • Runs forward through all layers

  • Runs backward in reverse

  • Provides predict() and summary()

Trainer

  • Handles batching

  • Runs the training loop

  • Computes loss

  • Calls backward

  • Updates parameters

This is the same structure used in modern deep‑learning frameworks.


How Projects 1–5 Connect to This Framework

Project 2 → DenseLayer.forward()
You learned Wx + b. That is exactly the forward pass of a Dense layer.

Project 3 → activation functions + BCE
Sigmoid and its derivative now live in activations.py.

Project 4 → SequentialModel
You manually stacked layers for XOR. Now it is automated.

Projects 1–3 → Trainer
You wrote the learning loop three times. Now it works for any model.

Project 5 → matrix shapes
You learned how shapes flow through a network. Now you are using that knowledge to build layers.

This is the bridge between Section 1 (math‑first learning) and Section 2 (real neural network engineering).


activations.py

Activation functions are pure functions. They do not store state, so they live in their own file.

These functions introduce nonlinearity, which is the key to solving problems like XOR.

Code

import numpy as np


# Utility: numerical stability helpers

def _clip(z, min_val=-500, max_val=500):

    return np.clip(z, min_val, max_val)


def sigmoid(z):

    z = _clip(z)

    return 1 / (1 + np.exp(-z))


def sigmoid_deriv(z):

    s = sigmoid(z)

    return s * (1 - s)


# ReLU

def relu(z):

    return np.maximum(0, z)


def relu_deriv(z):

    return (z > 0).astype(float)

Other activation functions (tanh, leaky ReLU, ELU, softplus, softmax) are included in the GitHub repo.

Activation Functions and examples:


losses.py

Loss functions measure how wrong the model is.
Their derivatives tell the model how to fix its mistakes.

Code

import numpy as np # Mean Squared Error (Regression) def mse(y_hat, y): """ y_hat: scalar or array y: scalar or array """ return 0.5 * np.mean((y_hat - y)**2) def mse_deriv(y_hat, y): """ d/dy_hat (0.5 * (y_hat - y)^2) = (y_hat - y) """ return (y_hat - y) # Binary Cross Entropy (Binary Classification) def binary_cross_entropy(y_hat, y, eps=1e-10): """ y_hat: predicted probability (0..1) y: true label (0 or 1) """ y_hat = np.clip(y_hat, eps, 1 - eps) loss = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat)) return np.mean(loss) # ensures scalar def binary_cross_entropy_deriv(y_hat, y, eps=1e-10): return (y_hat - y) # Softmax Cross Entropy (Multi-Class Classification) def softmax_cross_entropy(logits, y_true, eps=1e-10): """ logits: raw scores (vector) y_true: integer class index """ # shift for numerical stability shifted = logits - np.max(logits) exp_vals = np.exp(shifted) probs = exp_vals / np.sum(exp_vals) # clip to avoid log(0) probs = np.clip(probs, eps, 1 - eps) return -np.log(probs[y_true]) def softmax_cross_entropy_deriv(logits, y_true): """ Derivative of softmax + cross entropy: dL/dlogits = softmax(logits) - one_hot(y_true) """ shifted = logits - np.max(logits) exp_vals = np.exp(shifted) probs = exp_vals / np.sum(exp_vals) # one-hot vector one_hot = np.zeros_like(probs) one_hot[y_true] = 1.0 return probs - one_hot



Sigmoid + BCE simplifies to (y_hat - y).
This is why logistic regression gradients were so clean in Project 3.


layers.py

This is the heart of the framework.

A Dense layer is just the generalization of the linear models you built in Projects 2–4:

XW + b → activation

Instead of one output, we compute h outputs.
W has shape (out_dim, in_dim).
The output has dimension out_dim.

Before building DenseLayer, we define a simple Layer class.

Code

import numpy as np # Base Layer Class (Optional but Recommended) class Layer: """ Base class for all layers. Provides a consistent interface for forward and backward passes. """ def forward(self, x): raise NotImplementedError def backward(self, grad_output, lr): raise NotImplementedError # Dense (Fully Connected) Layer class DenseLayer(Layer): def __init__(self, in_dim, out_dim, activation, activation_deriv): # initialization for ReLU-like activations self.w = np.random.randn(out_dim, in_dim) * np.sqrt(2.0 / in_dim) self.b = np.zeros(out_dim) self.activation = activation self.activation_deriv = activation_deriv # Cache for forward pass self.z = None self.a = None self.x = None def forward(self, x): """ x: input vector (shape: in_dim) Returns: activation output """ self.x = x self.z = self.w @ x + self.b self.a = self.activation(self.z) return self.a def backward(self, grad_output, lr): """ grad_output: gradient from next layer (shape: out_dim) lr: learning rate Returns: gradient to pass to previous layer """ # dL/dz = dL/da * da/dz local_grad = self.activation_deriv(self.z) delta = grad_output * local_grad # shape: (out_dim,) # Gradients for weights and biases grad_w = np.outer(delta, self.x) # (out_dim, in_dim) grad_b = delta # (out_dim,) # Update parameters self.w -= lr * grad_w self.b -= lr * grad_b # Return gradient for previous layer: dL/dx return self.w.T @ delta class EmbeddingLayer(Layer): def __init__(self, vocab_size, embedding_dim): self.vocab_size = vocab_size self.embedding_dim = embedding_dim # Small random initialization self.embeddings = np.random.randn(vocab_size, embedding_dim) * 0.01 # Cache for backprop self.last_input_indices = None def forward(self, input_indices): """ input_indices: array of token indices (shape: sequence_length) Returns: embeddings for each token """ self.last_input_indices = input_indices return self.embeddings[input_indices] def backward(self, grad_output, lr): """ grad_output: gradient wrt embedding output (shape: seq_len, embedding_dim) """ for i, idx in enumerate(self.last_input_indices): self.embeddings[idx] -= lr * grad_output[i] # Embedding layers do not propagate gradients backward to earlier layers return None


Backward pass:
Uses the same backprop math you derived for XOR.


model.py

SequentialModel is where layers become a network.

 It simply stores layers in order and runs them in sequence.

Code

import numpy as np class SequentialModel: """ A simple container that stacks layers in order. """ def __init__(self, layers): self.layers = layers # list of Layer instances # Forward Pass def forward(self, x): """ x: input vector (numpy array) Returns: list of activations (including input) """ activations = [x] for layer in self.layers: a = layer.forward(activations[-1]) activations.append(a) return activations # Backward Pass def backward(self, grad_output, activations, lr): """ grad_output: gradient from loss wrt final output activations: list returned from forward() """ grad = grad_output # Traverse layers in reverse order for i in reversed(range(len(self.layers))): grad = self.layers[i].backward(grad, lr) return grad # Prediction (no gradient) def predict(self, X): """ X: array of input samples (shape: num_samples x input_dim) Returns: predictions for each sample """ preds = [] for x in X: a = x for layer in self.layers: a = layer.forward(a) preds.append(a.item() if a.size == 1 else a) return np.array(preds) # Optional: Model Summary (like Keras) def summary(self): print("Model Summary:") print("==============") for i, layer in enumerate(self.layers): name = layer.__class__.__name__ if hasattr(layer, "w"): print(f"Layer {i}: {name} | Weights: {layer.w.shape} | Biases: {layer.b.shape}") else: print(f"Layer {i}: {name}") print("==============")



Forward: runs each layer in order
Backward: runs each layer in reverse
Predict: forward pass without storing gradients


trainer.py

The Trainer handles the learning loop.
It does not know anything about layers or math.
It simply calls model.forward(), computes loss, and calls model.backward().

Code

import numpy as np class Trainer: """ Handles the training loop for a SequentialModel. """ def __init__(self, model, loss_fn, loss_deriv, lr=0.001): self.model = model self.loss_fn = loss_fn self.loss_deriv = loss_deriv self.lr = lr self.loss_history = [] # Training Loop def train(self, X, y, epochs=1000, batch_size=1, log_interval=100): n = len(X) for epoch in range(epochs): # Shuffle indices for each epoch indices = np.random.permutation(n) # Mini-batch training for start in range(0, n, batch_size): end = start + batch_size batch_idx = indices[start:end] X_batch = X[batch_idx] y_batch = y[batch_idx] batch_loss = 0 for i in range(len(X_batch)): x_i = X_batch[i] y_i = y_batch[i] activations = self.model.forward(x_i) y_hat = activations[-1] # Compute loss loss = self.loss_fn(y_hat, y_i) batch_loss += loss # Compute gradient wrt output grad_output = self.loss_deriv(y_hat, y_i) # Backprop self.model.backward(grad_output, activations, self.lr) batch_loss /= len(X_batch) self.loss_history.append(batch_loss) if epoch % log_interval == 0: print(f"Epoch {epoch}: Loss = {float(batch_loss):.6f}") # Evaluation (Regression or Classification) def evaluate(self, X, y, classification=False): preds = self.model.predict(X) loss = np.mean([self.loss_fn(preds[i], y[i]) for i in range(len(y))]) if classification: # Case 1: Binary classification, predictions shape (N,) if preds.ndim == 1: preds_class = (preds > 0.5).astype(int) # Case 2: Binary classification, predictions shape (N,1) elif preds.ndim == 2 and preds.shape[1] == 1: preds_class = (preds[:, 0] > 0.5).astype(int) # Case 3: Multi-class classification else: preds_class = np.argmax(preds, axis=1) accuracy = np.mean(preds_class.flatten() == y.flatten()) return loss, accuracy return loss def get_loss_history(self): return np.array(self.loss_history)




Connecting Everything Together

Here is the final workflow:

model = SequentialModel([

    DenseLayer(2, 6, relu, relu_deriv),

    DenseLayer(6, 3, relu, relu_deriv),

    DenseLayer(3, 1, sigmoid, sigmoid_deriv)

])


Train:

  • trainer = Trainer(model, binary_cross_entropy, binary_cross_entropy_deriv, lr=0.01)

  • trainer.train(X_train, y_train, epochs=2000, batch_size=4)


Evaluate:

  • loss, acc = trainer.evaluate(X_test, y_test, classification=True)


Save:

  • save_model(model, "model.pkl")


Predict:

  • prediction = predict([5, 7])


This is the exact workflow used in real deep‑learning frameworks.


What’s Coming in Part 2

Part 2 will cover:

  • building a dataset

  • training a real model

  • plotting loss curves

  • evaluating accuracy

  • saving and loading

  • running inference

  • comparing to PyTorch

This is where the framework becomes fun to use.



Comments

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives