Human Side of Tech

Project 8: Recurrent Neural Networks

(A Beginner-Friendly Walkthrough)

Kaggle Notebook

GitHub Repo

Recurrent Neural Networks (RNNs) are the first neural architecture designed to handle sequences. Unlike feed‑forward networks, which treat every input independently, an RNN carries information forward through time through a hidden state. This makes it suitable for time‑series data such as weather, text, audio, or stock prices.

To introduce RNN lets build the RNN entirely from scratch using NumPy. No deep‑learning frameworks. No shortcuts. Just the underlying math and the mechanics of recurrence.

By the end, you will understand how hidden state works, how backpropagation through time (BPTT) computes gradients, and how to wrap everything into clean, reusable classes.

Generating a Synthetic Weather Time-Series

To learn RNNs, we need a sequence with memory. Weather is a natural example: today’s temperature depends on previous days plus some randomness.

We generate a simple autoregressive process:

T_t = 0.7 * T_(t-1) + 0.2 * T_(t-2) + Normal(0, 0.5)

This produces a smooth, realistic temperature sequence with short‑term memory and noise. We will use the first 5 days to predict day 6.

___________________________________________________________________________

import numpy as np

temps = []

temps.append(20) # seed day 0

temps.append(21) # seed day 1

for t in range(2, 100):

next_temp = (

0.7 * temps[-1] +

0.2 * temps[-2] +

np.random.normal(0, 0.5)

)

temps.append(next_temp)

input_sq = temps[:5]

target = temps[5]

print(input_sq, target)

___________________________________________________________________________

The Minimal RNN Cell (Scalar Version)

Before building a full vectorized RNN, we start with the smallest possible version: one input, one hidden unit, one output. This makes the mechanics completely transparent.

The RNN cell has five parameters:

W_xh: input to hidden
W_hh: hidden to hidden (the recurrent connection)
b_h: hidden bias

W_hy: hidden to output
b_y: output bias

____________________________________________________________________________________

hidden_size = 1

W_xh = np.random.randn(hidden_size) * 0.01 # input → hidden

W_hh = np.random.randn(hidden_size) * 0.01 # hidden → hidden

b_h = np.random.randn(hidden_size) # hidden bias

W_hy = np.random.randn(hidden_size) * 0.01 # hidden → output

b_y = np.random.randn() # output bias

print("\nWeights:")

print("W_xh:", W_xh)

print("W_hh:", W_hh)

print("b_h :", b_h, "\n")

print("W_hy:", W_hy)

print("b_y :", b_y)

___________________________________________________________________________

Forward pass at each timestep

This is where the RNN differs from a normal feed‑forward network

For each timestep in the input sequence:

Compute the pre‑activation

Given input x_t and previous hidden state h_(t-1):

a_t = W_xh * x_t + W_hh * h_(t-1) + b_h

Apply the activation

h_t = tanh(a_t)

Store:

the raw activation (a_t)
the hidden state (h_t)

This loop is the “unrolling through time” that gives RNNs memory.

After the final timestep, compute the output:

y = W_hy * h_T + b_y

This loop is the core of recurrence. The RNN maintains a memory value (the hidden state) that updates each day based on the new input and the previous memory.

___________________________________________________________________________

# Forward pass with storage for BPTT

hs = [0.0] # h_0

raws = [] # a_t

h = 0.0

for x in input_sq:

a = W_xh * x + W_hh * h + b_h

h = np.tanh(a)

raws.append(a)

hs.append(h)

y_pred = W_hy * h + b_y

print("Final h:", h)

print("Prediction:", y_pred)

print("Target:", target)

___________________________________________________________________________

Loss Function

We use mean squared error:

L = (y_pred - target)^2

This measures how far the prediction is from the true next temperature.

Backpropagation Through Time (BPTT)

Training an RNN requires gradients to flow not only through layers but also through time. This is the purpose of BPTT.

Step A: Backprop through the output

Compute:

dL/dy
gradients for W_hy and b_y
gradient flowing into the final hidden state

Step B: Walk backward through each timestep

For each timestep t, in reverse order:

Compute derivative of tanh:

dtanh = (1 - tanh(a_t)^2) * dh_next
Accumulate gradients for:

W_xh
W_hh
b_h

Propagate gradient to the previous hidden state:

dh_next = dtanh * W_hh

This is the "through time" part. The gradient flows backward across all timesteps, not just through the final layer.

___________________________________________________________________________

# Loss

loss = (y_pred - target)**2

print("Loss:", loss)

# dL/dy

dL_dy = 2 * (y_pred - target)

# Output layer gradients

dW_hy = dL_dy * hs[-1]

db_y = dL_dy

# Gradient flowing into last hidden state

dh_next = dL_dy * W_hy

___________________________________________________________________________

Update the Parameters

Just like any neural network:

𝜃← 𝜃 − 𝜂 ⋅ ∇ 𝜃

You applied this to all weights and biases:

- W_xh, W_hh, b_h

- W_hy, b_y

___________________________________________________________________________

# Initialize RNN parameter grads

dW_xh = 0.0

dW_hh = 0.0

db_h = 0.0

# Backprop through time

for t in reversed(range(len(input_sq))):

a_t = raws[t]

h_prev = hs[t]

x_t = input_sq[t]

# derivative of tanh

da = (1 - np.tanh(a_t)**2) * dh_next

# accumulate grads

dW_xh += da * x_t

dW_hh += da * h_prev

db_h += da

# propagate to previous h

dh_next = da * W_hh

print("dW_xh:", dW_xh)

print("dW_hh:", dW_hh)

print("db_h :", db_h)

print("dW_hy:", dW_hy)

print("db_y :", db_y)

____________________________________________________________________________

Wrapping Everything Into Clean Classes (Vectorized RNN)

Once the scalar version is understood, we scale up to a proper vectorized RNN.

RNNCell

Handles a single timestep.

Computes the hidden state.

Stores W_xh, W_hh, and b_h.

RNNPredictor

Unrolls the RNN across a sequence
Stores all hidden states and raw activations
Computes the output
Performs BPTT
Updates parameters

This structure mirrors real deep‑learning libraries such as PyTorch and TensorFlow.

____________________________________________________________________________

class RNNCell:

def __init__(self, input_size, hidden_size):

self.input_size = input_size

self.hidden_size = hidden_size

self.W_xh = np.random.randn(hidden_size, input_size) * 0.01

self.W_hh = np.random.randn(hidden_size, hidden_size) * 0.01

self.b_h = np.random.randn(hidden_size)

def forward(self, x_t, h_prev):

raw = self.W_xh @ x_t + self.W_hh @ h_prev + self.b_h

h_t = np.tanh(raw)

return h_t, raw

class RNNPredictor:

def __init__(self, input_size, hidden_size):

self.cell = RNNCell(input_size, hidden_size)

self.W_hy = np.random.randn(1, hidden_size) * 0.01

self.b_y = np.random.randn()

def forward_sequence(self, sequence):

h = np.zeros(self.cell.hidden_size)

hs = [h] # store hidden states

raws = [] # store raw pre-activations

for x in sequence:

x_t = np.array([x])

h, raw = self.cell.forward(x_t, h)

hs.append(h)

raws.append(raw)

y_pred = self.W_hy @ h + self.b_y

return y_pred, hs, raws

def train_step(self, sequence, target, lr=0.0001):

y_pred, hs, raws = self.forward_sequence(sequence)

# ----- Loss -----

loss = (y_pred - target)**2

# ----- Gradients -----

dL_dy = 2 * (y_pred - target) # scalar

# Output layer grads

dW_hy = dL_dy * hs[-1].reshape(1, -1)

db_y = dL_dy

# Backprop into last hidden state

dh_next = (self.W_hy.T * dL_dy).flatten()

# Initialize grads for RNN cell

dW_xh = np.zeros_like(self.cell.W_xh)

dW_hh = np.zeros_like(self.cell.W_hh)

db_h = np.zeros_like(self.cell.b_h)

# ----- BPTT -----

for t in reversed(range(len(sequence))):

raw = raws[t]

h_prev = hs[t]

dtanh = (1 - np.tanh(raw)**2) * dh_next

x_t = np.array([sequence[t]])

dW_xh += dtanh.reshape(-1,1) @ x_t.reshape(1,-1)

dW_hh += dtanh.reshape(-1,1) @ h_prev.reshape(1,-1)

db_h += dtanh

dh_next = self.cell.W_hh.T @ dtanh

# ----- Gradient Clipping -----

clip_value = 1.0

dW_xh = np.clip(dW_xh, -clip_value, clip_value)

dW_hh = np.clip(dW_hh, -clip_value, clip_value)

db_h = np.clip(db_h, -clip_value, clip_value)

dW_hy = np.clip(dW_hy, -clip_value, clip_value)

db_y = np.clip(db_y, -clip_value, clip_value)

# ----- Update weights -----

self.W_hy -= lr * dW_hy

self.b_y -= lr * db_y

self.cell.W_xh -= lr * dW_xh

self.cell.W_hh -= lr * dW_hh

self.cell.b_h -= lr * db_h

return loss, y_pred

____________________________________________________________________________

Training the NumPy RNN

We create a dataset of sliding windows:

Each input sequence contains 5 consecutive temperatures
Each target is the next temperature

____________________________________________________________________________

def make_dataset(temps, seq_len=5):

X = []

y = []

for i in range(len(temps) - seq_len):

X.append(temps[i:i+seq_len])

y.append(temps[i+seq_len])

return np.array(X), np.array(y)

X, y = make_dataset(temps, seq_len=5)

print(X.shape, y.shape)

____________________________________________________________________________

Training loop:

Forward pass
Compute loss
BPTT
Gradient clipping
Gradient descent update

After training, we test the model by predicting day 6 from days 1 through 5.

____________________________________________________________________________

model = RNNPredictor(input_size=1, hidden_size=20)

for epoch in range(2500):

total_loss = 0

for seq, target in zip(X, y):

loss, pred = model.train_step(seq, target)

total_loss += loss

if epoch % 250 == 0:

print(f"epoch {epoch}, total_loss={total_loss}")

___________________________________________________________________________

test_seq = temps[:5] # or any 5‑day window

pred, hs, raws = model.forward_sequence(test_seq)

print("Input:", test_seq)

print("Prediction:", pred)

print("True next value:", temps[5])

error = (pred - temps[5])

error

Summary : What You Have Built

By completing this section, you have:

built an RNN from scratch
implemented recurrence manually
implemented BPTT manually
understood how hidden state flows through time
wrapped the model into clean, reusable classes
trained a real sequence model

This forms the foundation for the next section, where we move to PyTorch and explore more advanced options with RNN.

project 8: Section 2 : PyTorch Version exploring more with RNN

Search This Blog

Human Side of Tech

Generating a Synthetic Weather Time-Series

The Minimal RNN Cell (Scalar Version)

Forward pass at each timestep

Loss Function

Backpropagation Through Time (BPTT)

Step A: Backprop through the output

Step B: Walk backward through each timestep

Update the Parameters

Wrapping Everything Into Clean Classes (Vectorized RNN)

RNNCell

RNNPredictor

Training the NumPy RNN

Training loop:

Summary : What You Have Built

Comments

Post a Comment

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives