Project 2 — Multi‑Feature Linear Regression
Project 2 — Multi‑Feature Linear Regression
How Two Features Combine into One Prediction
Kaggle NotebookGitHub Repo
In Project 1, the model used one feature:
- y_pred = w * x + b
Now we move to two features:
- x1 = square footage
- x2 = number of rooms
The model becomes:
y_pred = w1 * x1 + w2 * x2 + b
This is the simplest possible neural network:
- Inputs: x1 and x2
- Weights: w1 and w2
- Bias: b
- Output: y_pred
The key new idea is the dot product, which is simply:
- dot_product = w1 * x1 + w2 * x2
This dot product is what allows the model to combine two features into one number
The Dataimport numpy as np
# Feature 1: square footage
x1 = np.array([180, 200, 230, 260, 280, 300, 325, 375, 425, 480, 488, 510, 560, 600])
# Feature 2: number of rooms
x2 = np.array([4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 9, 10])
# Target: price
y = np.array([122, 120, 170, 180, 240, 238, 246, 320, 361, 370, 376, 390, 410, 470])
# Combine into matrix
X = np.column_stack([x1, x2])
Each house is represented by:
- x1 = square footage
- x2 = number of rooms
- y = price
Understanding the Dot Product
How Two Features Become One Number
Take a single house:x = [x1, x2]
w = [w1, w2]
The model computes:
y_pred = w1 * x1 + w2 * x2 + b
The dot product part is:
dot_product = w1 * x1 + w2 * x2
What the dot product means in plain English
Imagine all houses plotted on a 3D graph:
Horizontal axis: x1 (square footage)
Horizontal axis: x2 (number of rooms)
Vertical axis: y (house price)
Each house is a point (x1, x2) on the horizontal plain.
The dot product turns the 2D point (x1, x2) into a single number.
This is the key idea:
The dot product compresses two features into one combined feature.
Plane vs Line
In the original space (x1, x2, y):
The model forms a plane.
In the transformed space (dot_product, y_pred):
The model becomes a simple line.
The dot product is the transformation that makes this possible.

This plot shows:
X‑axis: x1 (square footage)
Y‑axis: price
2. Effect of Number of Rooms on Price
This plot shows:
X‑axis: x2 (number of rooms)
Y‑axis: price
Vertical axis: y (house price)
Each house is a point (x1, x2) on the horizontal plain.
The dot product turns the 2D point (x1, x2) into a single number.
This is the key idea:
The dot product compresses two features into one combined feature.
Plane vs Line
In the original space (x1, x2, y):
The model forms a plane.
In the transformed space (dot_product, y_pred):
The model becomes a simple line.
The dot product is the transformation that makes this possible.
The image has four plots arranged in a 2x2 grid. Here’s what each one represents.
1. Effect of Square Footage on PriceThis plot shows:
X‑axis: x1 (square footage)
Y‑axis: price
2. Effect of Number of Rooms on Price
This plot shows:
X‑axis: x2 (number of rooms)
Y‑axis: price
3. Original Features Together
This plot shows both features in their original form.
You can think of each house as living in a 2D feature space (x1, x2).
If you add price as the vertical axis, the model becomes a plane in 3D.
4. Dot Product + Bias (Vertical Shift)
This is the most important plot.
It shows what happens after applying:
dot_product = w1 * x1 + w2 * x2
y_pred = dot_product + b
X‑axis: dot_product
Y‑axis: y_pred
All the points line up along a single line because the model is linear in this transformed space.
This plot visually demonstrates:
- Two features → compressed into one number
- Bias shifts the line up or down
- The model becomes a simple line in dot‑product space
Scalar Gradient Descent (Project 1 Style)
w1 = 0.0
w2 = 0.0
b = 0.0
lr = 1e-7
n = len(y)
for epoch in range(20000):
y_hat = w1*x1 + w2*x2 + b
error = y_hat - y.flatten()
dw1 = (2/n) * np.sum(error * x1)
dw2 = (2/n) * np.sum(error * x2)
db = (2/n) * np.sum(error)
w1 -= lr * dw1
w2 -= lr * dw2
b -= lr * db
print(f"weight 1 : {w1}")
print(f"weight 2 : {w2}")
print(f"bias : {b}")
This version updates each weight separately.
Vectorized Gradient Descent (Neural‑Network Style)
# Vectorized gradient descent
X = np.column_stack([x1, x2]) # (n × 2)
y = y.reshape(-1, 1) # (n × 1)
w = np.zeros((2, 1)) # (2 × 1)
b = 0.0
lr = 1e-7
n = len(y)
for epoch in range(20000):
y_hat = X @ w + b # (n×2)(2×1) → (n×1)
error = y_hat - y # (n×1)
dw = (2/n) * (X.T @ error) # (2×n)(n×1) → (2×1)
db = (2/n) * np.sum(error) # scalar
w -= lr * dw
b -= lr * db
print(f"weight : {w}")
print(f"bias : {b}")
This version updates all weights at once using matrix math.
This is exactly how a neural network layer works:
- Inputs → dot product → add bias → output
- Compute loss
- Compute gradient vector
- Update all weights together
Comments
Post a Comment