Project 5: The Design Matrix


Project 5 : The Design Matrix

The Structure of Neural Networks

Kaggle Notebook
GitHub Repo

Why This Project Exists

Projects 1–4 taught you the operations of machine learning:
  • dot products
  • gradients
  • MSE
  • cross‑entropy
  • logistic regression
  • hidden layers

But they didn’t yet reveal the structural object that unifies all of them:

The Design Matrix 𝑋

This project shows:
  • Why ML always uses a matrix
  • how neural networks generalize it
  • how this simplifies the system


What Is the Design Matrix?

A design matrix is how machine learning represents data.

Each row = one example = 
  • n = number of examples (rows of X)
Each column = one feature
  • d = number of features (columns of X)

When we say:     
𝑋 ∈ 𝑅 𝑛×𝑑

We mean:
  • X has n rows (one per data point)
  • X has d columns (one per feature)

EX.

import numpy as np
# 3 samples, 2 features

X = np.array([
                     [1.0, 2.0],
                     [3.0, 4.0],
                     [5.0, 6.0]])
X, X.shape


This is the simplest possible design matrix.

From One Example to Many: Matrix Multiplication

Now suppose we want predictions for all n examples at once.

Instead of computing:
  • y^(1),y^(2),Y^(3)
Ex code:

x1, x2 = 3.0, 4.0
w1, w2 = 2.0, -1.0
b = 0.5

y_hat_scalar = w1*x1 + w2*x2 + b
y_hat_scalar


Vector Dot Product

x = np.array([3.0, 4.0])
w = np.array([2.0, -1.0])
b = 0.5

y_hat_vector = w @ x + b
y_hat_vector




Matrix Form (All n Examples at Once)

Stack all examples into the design matrix:

𝑦^ = 𝑋𝑀 + 𝑏


X = np.array([[1.0, 2.0],
                       [3.0, 4.0],
                       [5.0, 6.0]])

w = np.array([[2.0], [-1.0]]) # shape (d,1)
b = 0.5

y_hat_matrix = X @ w + b
y_hat_matrix

We have already seen this in Logistic Regression

The forward pass is:

𝑧 = 𝑋𝑀 + 𝑏
𝑦^ = 𝜎(𝑧)


Same structure — just with a sigmoid on top.

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

z = X @ w + b
y_hat_logistic = sigmoid(z)
Y_hat_logistic


…we can finally see the big idea:

A neural network layer is just a generalization of

𝑋𝑀 + 𝑏

Instead of one output, we compute h outputs:

π‘‹π‘Š + 𝑏

Where:
π‘Š is (𝑑 × β„Ž)

output is
(𝑛 × β„Ž)

ex code:

# X: (n=3 samples, d=2 features)

X = np.array([[1.0, 2.0],
                       [3.0, 4.0],
                       [5.0, 6.0]])

# W: (d=2 features, h=3 outputs)

W = np.array([[1.0, -1.0, 0.5],
                        [0.5, 2.0, -1.5]])

b = np.array([0.1, -0.2, 0.3])

Z = X @ W + b
Z, Z.shape

# add an activation

A = sigmoid(Z)
A

Building a Real Design Matrix (US House Prices Dataset)

Now we connect everything to a real dataset.

I have a Kaggle notebook of this that uses a US housing prices dataset.
We’ll build a design matrix from these.

Example code:

import pandas as pd

df = pd.read_csv("house_prices.csv")
df.head()

features = ["sqft_living", "bedrooms", "bathrooms"]
target = "price"

X = df[features].values # shape (n, d)
y = df[target].values.reshape(-1, 1)
X.shape, y.shape

# Now that we have a design matrix **X** from the dataset
# This is how a neural network layer would consume it

n, d = X.shape # n = number of examples, d = number of features
h = 3 # number of outputs (like 3 neurons in a layer)

# Random weight initialization
W = np.random.randn(d, h) * np.sqrt(2.0 / d)

# Bias vector
b = np.zeros(h)

W.shape, b.shape
Z = X @ W + b

Z[:5], Z.shape

def relu(z):
    return np.maximum(0, z)

A = relu(Z)
A[:5]


Final Summary: The Unifying Structure

You’ve now seen the entire descent:
  • Single‑input linear regression
  • Multi‑input dot product
  • Matrix form (Xw + b)
  • Logistic regression (sigmoid on top)
  • Neural network layer (XW + b)

The entire field of supervised learning is built from:

  • Linear transformation π‘‹π‘Š + 𝑏
  • Nonlinearity 𝑓(⋅)


Stack these two ideas and you get:
  • regression
  • classification
  • deep neural networks
  • transformers
  • everything

This project completes your first‑principles foundation.
Preparing for Project 6

Now that you’ve seen:
  • how X is built
  • how W and b are shaped
  • how XW + b works
  • how an activation is applied


…you’re ready for the next project.

In Section 2 - Project 6, you will replace this manual W and b with a real DenseLayer class that:
  • initializes weights
  • stores biases
  • performs forward passes
  • computes gradients
  • updates parameters through backpropagation

And the design matrix X you built here will plug directly into that class.

Comments

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives