Project 5 : The Design Matrix

The Structure of Neural Networks

Kaggle Notebook

GitHub Repo

Why This Project Exists

Projects 1–4 taught you the operations of machine learning:

dot products
gradients
MSE
cross‑entropy
logistic regression
hidden layers

But they didn’t yet reveal the structural object that unifies all of them:

The Design Matrix 𝑋

This project shows:

Why ML always uses a matrix
how neural networks generalize it
how this simplifies the system

What Is the Design Matrix?

A design matrix is how machine learning represents data.

Each row = one example =

n = number of examples (rows of X)

Each column = one feature

d = number of features (columns of X)

When we say: 𝑋 ∈ 𝑅 𝑛×𝑑

We mean:

X has n rows (one per data point)
X has d columns (one per feature)

EX.

import numpy as np
# 3 samples, 2 features

X = np.array([

[1.0, 2.0],

[3.0, 4.0],
[5.0, 6.0]])

X, X.shape

This is the simplest possible design matrix.

From One Example to Many: Matrix Multiplication

Now suppose we want predictions for all n examples at once.

Instead of computing:

y^(1),y^(2),Y^(3)

Ex code:

x1, x2 = 3.0, 4.0
w1, w2 = 2.0, -1.0
b = 0.5

y_hat_scalar = w1*x1 + w2*x2 + b
y_hat_scalar

Vector Dot Product

x = np.array([3.0, 4.0])
w = np.array([2.0, -1.0])
b = 0.5

y_hat_vector = w @ x + b
y_hat_vector

Matrix Form (All n Examples at Once)

Stack all examples into the design matrix:

𝑦^ = 𝑋𝑤 + 𝑏

X = np.array([[1.0, 2.0],

[3.0, 4.0],
[5.0, 6.0]])

w = np.array([[2.0], [-1.0]]) # shape (d,1)
b = 0.5

y_hat_matrix = X @ w + b
y_hat_matrix

We have already seen this in Logistic Regression

The forward pass is:

𝑧 = 𝑋𝑤 + 𝑏
𝑦^ = 𝜎(𝑧)

Same structure — just with a sigmoid on top.

def sigmoid(z):

return 1 / (1 + np.exp(-z))

z = X @ w + b

y_hat_logistic = sigmoid(z)
Y_hat_logistic

…we can finally see the big idea:

A neural network layer is just a generalization of

𝑋𝑤 + 𝑏

Instead of one output, we compute h outputs:

𝑋𝑊 + 𝑏

Where:
𝑊 is (𝑑 × ℎ)

output is
(𝑛 × ℎ)

ex code:

# X: (n=3 samples, d=2 features)

X = np.array([[1.0, 2.0],

[3.0, 4.0],
[5.0, 6.0]])

# W: (d=2 features, h=3 outputs)

W = np.array([[1.0, -1.0, 0.5],

[0.5, 2.0, -1.5]])

b = np.array([0.1, -0.2, 0.3])

Z = X @ W + b
Z, Z.shape

# add an activation

A = sigmoid(Z)
A

Building a Real Design Matrix (US House Prices Dataset)

Now we connect everything to a real dataset.

I have a Kaggle notebook of this that uses a US housing prices dataset.
We’ll build a design matrix from these.

Example code:

import pandas as pd

df = pd.read_csv("house_prices.csv")
df.head()

features = ["sqft_living", "bedrooms", "bathrooms"]
target = "price"

X = df[features].values # shape (n, d)
y = df[target].values.reshape(-1, 1)
X.shape, y.shape

# Now that we have a design matrix **X** from the dataset
# This is how a neural network layer would consume it

n, d = X.shape # n = number of examples, d = number of features
h = 3 # number of outputs (like 3 neurons in a layer)

# Random weight initialization
W = np.random.randn(d, h) * np.sqrt(2.0 / d)

# Bias vector
b = np.zeros(h)

W.shape, b.shape
Z = X @ W + b

Z[:5], Z.shape

def relu(z):
return np.maximum(0, z)

A = relu(Z)
A[:5]

Final Summary: The Unifying Structure

You’ve now seen the entire descent:

Single‑input linear regression
Multi‑input dot product
Matrix form (Xw + b)
Logistic regression (sigmoid on top)
Neural network layer (XW + b)

The entire field of supervised learning is built from:

Linear transformation 𝑋𝑊 + 𝑏
Nonlinearity 𝑓(⋅)

Stack these two ideas and you get:

regression
classification
deep neural networks
transformers
everything

This project completes your first‑principles foundation.
Preparing for Project 6

Now that you’ve seen:

how X is built
how W and b are shaped
how XW + b works
how an activation is applied

…you’re ready for the next project.

In Section 2 - Project 6, you will replace this manual W and b with a real DenseLayer class that:

initializes weights
stores biases
performs forward passes
computes gradients
updates parameters through backpropagation

And the design matrix X you built here will plug directly into that class.

Search This Blog

Human Side of Tech