Project 2 — Multi‑Feature Linear Regression


Project 2 — Multi‑Feature Linear Regression

How Two Features Combine into One Prediction

Kaggle Notebook
GitHub Repo

In Project 1, the model used one feature:
  • y_pred = w * x + b

Now we move to two features:
  • x1 = square footage
  • x2 = number of rooms

The model becomes:

y_pred = w1 * x1 + w2 * x2 + b

This is the simplest possible neural network:

  • Inputs: x1 and x2
  • Weights: w1 and w2
  • Bias: b
  • Output: y_pred

The key new idea is the dot product, which is simply:
  • dot_product = w1 * x1 + w2 * x2

This dot product is what allows the model to combine two features into one number

The Data


import numpy as np
# Feature 1: square footage
x1 = np.array([180, 200, 230, 260, 280, 300, 325, 375, 425, 480, 488, 510, 560, 600])

# Feature 2: number of rooms
x2 = np.array([4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 9, 10])

# Target: price
y = np.array([122, 120, 170, 180, 240, 238, 246, 320, 361, 370, 376, 390, 410, 470])

# Combine into matrix
X = np.column_stack([x1, x2])


Each house is represented by:
  • x1 = square footage
  • x2 = number of rooms
  • y = price

Understanding the Dot Product

How Two Features Become One Number

Take a single house:

x = [x1, x2]
w = [w1, w2]

The model computes:

y_pred = w1 * x1 + w2 * x2 + b

The dot product part is:

dot_product = w1 * x1 + w2 * x2
What the dot product means in plain English

Imagine all houses plotted on a 3D graph:

Horizontal axis: x1 (square footage)
Horizontal axis: x2 (number of rooms)
Vertical axis: y (house price) 

Each house is a point (x1, x2) on the horizontal plain.

The dot product turns the 2D point (x1, x2) into a single number.

This is the key idea:

The dot product compresses two features into one combined feature.
Plane vs Line

In the original space (x1, x2, y):

The model forms a plane.

In the transformed space (dot_product, y_pred):

The model becomes a simple line.
The dot product is the transformation that makes this possible.






The image has four plots arranged in a 2x2 grid. Here’s what each one represents.

1. Effect of Square Footage on Price
This plot shows:

X‑axis: x1 (square footage)
Y‑axis: price

2. Effect of Number of Rooms on Price
This plot shows:

X‑axis: x2 (number of rooms)
Y‑axis: price

3. Original Features Together
This plot shows both features in their original form.

You can think of each house as living in a 2D feature space (x1, x2).
If you add price as the vertical axis, the model becomes a plane in 3D.

4. Dot Product + Bias (Vertical Shift)
This is the most important plot.

It shows what happens after applying:

dot_product = w1 * x1 + w2 * x2
y_pred = dot_product + b

X‑axis: dot_product
Y‑axis: y_pred

All the points line up along a single line because the model is linear in this transformed space.

This plot visually demonstrates:
  • Two features → compressed into one number
  • Bias shifts the line up or down
  • The model becomes a simple line in dot‑product space


Scalar Gradient Descent (Project 1 Style)


w1 = 0.0
w2 = 0.0
b = 0.0
lr = 1e-7
n = len(y)


for epoch in range(20000):
    y_hat = w1*x1 + w2*x2 + b
    error = y_hat - y.flatten()

    dw1 = (2/n) * np.sum(error * x1)
    dw2 = (2/n) * np.sum(error * x2)
    db = (2/n) * np.sum(error)

    w1 -= lr * dw1
    w2 -= lr * dw2
    b -= lr * db

print(f"weight 1 : {w1}")
print(f"weight 2 : {w2}")
print(f"bias : {b}")


This version updates each weight separately.

Vectorized Gradient Descent (Neural‑Network Style)


# Vectorized gradient descent

X = np.column_stack([x1, x2]) # (n × 2)
y = y.reshape(-1, 1) # (n × 1)

w = np.zeros((2, 1)) # (2 × 1)
b = 0.0
lr = 1e-7
n = len(y)

for epoch in range(20000):
    y_hat = X @ w + b # (n×2)(2×1) → (n×1)

    error = y_hat - y # (n×1)

    dw = (2/n) * (X.T @ error) # (2×n)(n×1) → (2×1)
    db = (2/n) * np.sum(error) # scalar

    w -= lr * dw
    b -= lr * db

print(f"weight : {w}")
print(f"bias : {b}")


This version updates all weights at once using matrix math.


This is exactly how a neural network layer works:
  • Inputs → dot product → add bias → output
  • Compute loss
  • Compute gradient vector
  • Update all weights together

Comments

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives