Project 5: The Design Matrix
Project 5 : The Design Matrix
The Structure of Neural Networks
Kaggle Notebook
GitHub RepoWhy This Project Exists
Projects 1–4 taught you the operations of machine learning:
- dot products
- gradients
- MSE
- cross‑entropy
- logistic regression
- hidden layers
But they didn’t yet reveal the structural object that unifies all of them:
The Design Matrix π
This project shows:- Why ML always uses a matrix
- how neural networks generalize it
- how this simplifies the system
What Is the Design Matrix?
A design matrix is how machine learning represents data.Each row = one example =
- n = number of examples (rows of X)
- d = number of features (columns of X)
We mean:
- X has n rows (one per data point)
- X has d columns (one per feature)
EX.
import numpy as np
# 3 samples, 2 features
X = np.array([
[1.0, 2.0],
[3.0, 4.0],[5.0, 6.0]])
X, X.shape
This is the simplest possible design matrix.
This is the simplest possible design matrix.
From One Example to Many: Matrix Multiplication
Now suppose we want predictions for all n examples at once.Instead of computing:
- y^(1),y^(2),Y^(3)
x1, x2 = 3.0, 4.0
w1, w2 = 2.0, -1.0
b = 0.5
y_hat_scalar = w1*x1 + w2*x2 + b
y_hat_scalar
Vector Dot Product
x = np.array([3.0, 4.0])
w = np.array([2.0, -1.0])
b = 0.5
y_hat_vector = w @ x + b
y_hat_vector
Matrix Form (All n Examples at Once)
Stack all examples into the design matrix:
π¦^ = ππ€ + π
X = np.array([[1.0, 2.0],
[5.0, 6.0]])
w = np.array([[2.0], [-1.0]]) # shape (d,1)
b = 0.5
y_hat_matrix = X @ w + b
y_hat_matrix
We have already seen this in Logistic Regression
The forward pass is:π§ = ππ€ + π
π¦^ = π(π§)
Same structure — just with a sigmoid on top.
def sigmoid(z):
z = X @ w + b
[5.0, 6.0]])
# W: (d=2 features, h=3 outputs)
W = np.array([[1.0, -1.0, 0.5],
[0.5, 2.0, -1.5]])
b = np.array([0.1, -0.2, 0.3])
Z = X @ W + b
Z, Z.shape
# add an activation
A = sigmoid(Z)
A
We’ll build a design matrix from these.
Example code:
import pandas as pd
df = pd.read_csv("house_prices.csv")
df.head()
features = ["sqft_living", "bedrooms", "bathrooms"]
target = "price"
X = df[features].values # shape (n, d)
y = df[target].values.reshape(-1, 1)
X.shape, y.shape
# Now that we have a design matrix **X** from the dataset
# This is how a neural network layer would consume it
n, d = X.shape # n = number of examples, d = number of features
h = 3 # number of outputs (like 3 neurons in a layer)
# Random weight initialization
W = np.random.randn(d, h) * np.sqrt(2.0 / d)
# Bias vector
b = np.zeros(h)
W.shape, b.shape
Z = X @ W + b
Z[:5], Z.shape
def relu(z):
return np.maximum(0, z)
A = relu(Z)
A[:5]
Stack these two ideas and you get:
This project completes your first‑principles foundation.
Preparing for Project 6
Now that you’ve seen:
…you’re ready for the next project.
In Section 2 - Project 6, you will replace this manual W and b with a real DenseLayer class that:
And the design matrix X you built here will plug directly into that class.
y_hat_logistic = sigmoid(z)
Y_hat_logistic
ππ€ + π
Instead of one output, we compute h outputs:
ππ + π
Where:
π is (π × β)
output is
(π × β)
ex code:
# X: (n=3 samples, d=2 features)
X = np.array([[1.0, 2.0],
[3.0, 4.0],Y_hat_logistic
…we can finally see the big idea:
A neural network layer is just a generalization ofππ€ + π
Instead of one output, we compute h outputs:
ππ + π
Where:
π is (π × β)
output is
(π × β)
ex code:
# X: (n=3 samples, d=2 features)
X = np.array([[1.0, 2.0],
[5.0, 6.0]])
# W: (d=2 features, h=3 outputs)
W = np.array([[1.0, -1.0, 0.5],
b = np.array([0.1, -0.2, 0.3])
Z = X @ W + b
Z, Z.shape
# add an activation
A = sigmoid(Z)
A
Building a Real Design Matrix (US House Prices Dataset)
Now we connect everything to a real dataset.
I have a Kaggle notebook of this that uses a US housing prices dataset.We’ll build a design matrix from these.
Example code:
import pandas as pd
df = pd.read_csv("house_prices.csv")
df.head()
features = ["sqft_living", "bedrooms", "bathrooms"]
target = "price"
X = df[features].values # shape (n, d)
y = df[target].values.reshape(-1, 1)
X.shape, y.shape
# Now that we have a design matrix **X** from the dataset
# This is how a neural network layer would consume it
n, d = X.shape # n = number of examples, d = number of features
h = 3 # number of outputs (like 3 neurons in a layer)
# Random weight initialization
W = np.random.randn(d, h) * np.sqrt(2.0 / d)
# Bias vector
b = np.zeros(h)
W.shape, b.shape
Z = X @ W + b
Z[:5], Z.shape
def relu(z):
return np.maximum(0, z)
A = relu(Z)
A[:5]
Final Summary: The Unifying Structure
You’ve now seen the entire descent:- Single‑input linear regression
- Multi‑input dot product
- Matrix form (Xw + b)
- Logistic regression (sigmoid on top)
- Neural network layer (XW + b)
The entire field of supervised learning is built from:
- Linear transformation ππ + π
- Nonlinearity π(⋅)
Stack these two ideas and you get:
- regression
- classification
- deep neural networks
- transformers
- everything
This project completes your first‑principles foundation.
Preparing for Project 6
Now that you’ve seen:
- how X is built
- how W and b are shaped
- how XW + b works
- how an activation is applied
…you’re ready for the next project.
In Section 2 - Project 6, you will replace this manual W and b with a real DenseLayer class that:
- initializes weights
- stores biases
- performs forward passes
- computes gradients
- updates parameters through backpropagation
And the design matrix X you built here will plug directly into that class.
Comments
Post a Comment