Project 8: Recurrent Neural Networks
(A Beginner-Friendly Walkthrough)
Recurrent Neural Networks (RNNs) are the first neural architecture designed to handle sequences. Unlike feed‑forward networks, which treat every input independently, an RNN carries information forward through time through a hidden state. This makes it suitable for time‑series data such as weather, text, audio, or stock prices.
To introduce RNN lets build the RNN entirely from scratch using NumPy. No deep‑learning frameworks. No shortcuts. Just the underlying math and the mechanics of recurrence.
By the end, you will understand how hidden state works, how backpropagation through time (BPTT) computes gradients, and how to wrap everything into clean, reusable classes.
Generating a Synthetic Weather Time-Series
To learn RNNs, we need a sequence with memory. Weather is a natural example: today’s temperature depends on previous days plus some randomness.
We generate a simple autoregressive process:
T_t = 0.7 * T_(t-1) + 0.2 * T_(t-2) + Normal(0, 0.5)
This produces a smooth, realistic temperature sequence with short‑term memory and noise. We will use the first 5 days to predict day 6.
___________________________________________________________________________
import numpy as np
temps = []
temps.append(20) # seed day 0
temps.append(21) # seed day 1
for t in range(2, 100):
next_temp = (
0.7 * temps[-1] +
0.2 * temps[-2] +
np.random.normal(0, 0.5)
)
temps.append(next_temp)
input_sq = temps[:5]
target = temps[5]
print(input_sq, target)
___________________________________________________________________________
The Minimal RNN Cell (Scalar Version)
Before building a full vectorized RNN, we start with the smallest possible version: one input, one hidden unit, one output. This makes the mechanics completely transparent.
The RNN cell has five parameters:
W_xh: input to hidden
W_hh: hidden to hidden (the recurrent connection)
b_h: hidden bias
W_hy: hidden to output
b_y: output bias
____________________________________________________________________________________
hidden_size = 1
W_xh = np.random.randn(hidden_size) * 0.01 # input → hidden
W_hh = np.random.randn(hidden_size) * 0.01 # hidden → hidden
b_h = np.random.randn(hidden_size) # hidden bias
W_hy = np.random.randn(hidden_size) * 0.01 # hidden → output
b_y = np.random.randn() # output bias
print("\nWeights:")
print("W_xh:", W_xh)
print("W_hh:", W_hh)
print("b_h :", b_h, "\n")
print("W_hy:", W_hy)
print("b_y :", b_y)
___________________________________________________________________________
Forward pass at each timestep
This is where the RNN differs from a normal feed‑forward network
For each timestep in the input sequence:
Compute the pre‑activation
Given input x_t and previous hidden state h_(t-1):
a_t = W_xh * x_t + W_hh * h_(t-1) + b_h
Apply the activation
h_t = tanh(a_t)
Store:
the raw activation (a_t)
the hidden state (h_t)
This loop is the “unrolling through time” that gives RNNs memory.
After the final timestep, compute the output:
y = W_hy * h_T + b_y
This loop is the core of recurrence. The RNN maintains a memory value (the hidden state) that updates each day based on the new input and the previous memory.
___________________________________________________________________________
# Forward pass with storage for BPTT
hs = [0.0] # h_0
raws = [] # a_t
h = 0.0
for x in input_sq:
a = W_xh * x + W_hh * h + b_h
h = np.tanh(a)
raws.append(a)
hs.append(h)
y_pred = W_hy * h + b_y
print("Final h:", h)
print("Prediction:", y_pred)
print("Target:", target)
___________________________________________________________________________
Loss Function
We use mean squared error:
L = (y_pred - target)^2
This measures how far the prediction is from the true next temperature.
Backpropagation Through Time (BPTT)
Training an RNN requires gradients to flow not only through layers but also through time. This is the purpose of BPTT.
Step A: Backprop through the output
Compute:
dL/dy
gradients for W_hy and b_y
gradient flowing into the final hidden state
Step B: Walk backward through each timestep
For each timestep t, in reverse order:
Compute derivative of tanh:
dtanh = (1 - tanh(a_t)^2) * dh_next
Accumulate gradients for:
W_xh
W_hh
b_h
Propagate gradient to the previous hidden state:
dh_next = dtanh * W_hh
This is the "through time" part. The gradient flows backward across all timesteps, not just through the final layer.
___________________________________________________________________________
# Loss
loss = (y_pred - target)**2
print("Loss:", loss)
# dL/dy
dL_dy = 2 * (y_pred - target)
# Output layer gradients
dW_hy = dL_dy * hs[-1]
db_y = dL_dy
# Gradient flowing into last hidden state
dh_next = dL_dy * W_hy
___________________________________________________________________________
Update the Parameters
Just like any neural network:
𝜃← 𝜃 − 𝜂 ⋅ ∇ 𝜃
You applied this to all weights and biases:
- W_xh, W_hh, b_h
- W_hy, b_y
___________________________________________________________________________
# Initialize RNN parameter grads
dW_xh = 0.0
dW_hh = 0.0
db_h = 0.0
# Backprop through time
for t in reversed(range(len(input_sq))):
a_t = raws[t]
h_prev = hs[t]
x_t = input_sq[t]
# derivative of tanh
da = (1 - np.tanh(a_t)**2) * dh_next
# accumulate grads
dW_xh += da * x_t
dW_hh += da * h_prev
db_h += da
# propagate to previous h
dh_next = da * W_hh
print("dW_xh:", dW_xh)
print("dW_hh:", dW_hh)
print("db_h :", db_h)
print("dW_hy:", dW_hy)
print("db_y :", db_y)
____________________________________________________________________________
Wrapping Everything Into Clean Classes (Vectorized RNN)
Once the scalar version is understood, we scale up to a proper vectorized RNN.
RNNCell
Handles a single timestep.
Computes the hidden state.
Stores W_xh, W_hh, and b_h.
RNNPredictor
Unrolls the RNN across a sequence
Stores all hidden states and raw activations
Computes the output
Performs BPTT
Updates parameters
This structure mirrors real deep‑learning libraries such as PyTorch and TensorFlow.
____________________________________________________________________________
class RNNCell:
def __init__(self, input_size, hidden_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.W_xh = np.random.randn(hidden_size, input_size) * 0.01
self.W_hh = np.random.randn(hidden_size, hidden_size) * 0.01
self.b_h = np.random.randn(hidden_size)
def forward(self, x_t, h_prev):
raw = self.W_xh @ x_t + self.W_hh @ h_prev + self.b_h
h_t = np.tanh(raw)
return h_t, raw
class RNNPredictor:
def __init__(self, input_size, hidden_size):
self.cell = RNNCell(input_size, hidden_size)
self.W_hy = np.random.randn(1, hidden_size) * 0.01
self.b_y = np.random.randn()
def forward_sequence(self, sequence):
h = np.zeros(self.cell.hidden_size)
hs = [h] # store hidden states
raws = [] # store raw pre-activations
for x in sequence:
x_t = np.array([x])
h, raw = self.cell.forward(x_t, h)
hs.append(h)
raws.append(raw)
y_pred = self.W_hy @ h + self.b_y
return y_pred, hs, raws
def train_step(self, sequence, target, lr=0.0001):
y_pred, hs, raws = self.forward_sequence(sequence)
# ----- Loss -----
loss = (y_pred - target)**2
# ----- Gradients -----
dL_dy = 2 * (y_pred - target) # scalar
# Output layer grads
dW_hy = dL_dy * hs[-1].reshape(1, -1)
db_y = dL_dy
# Backprop into last hidden state
dh_next = (self.W_hy.T * dL_dy).flatten()
# Initialize grads for RNN cell
dW_xh = np.zeros_like(self.cell.W_xh)
dW_hh = np.zeros_like(self.cell.W_hh)
db_h = np.zeros_like(self.cell.b_h)
# ----- BPTT -----
for t in reversed(range(len(sequence))):
raw = raws[t]
h_prev = hs[t]
dtanh = (1 - np.tanh(raw)**2) * dh_next
x_t = np.array([sequence[t]])
dW_xh += dtanh.reshape(-1,1) @ x_t.reshape(1,-1)
dW_hh += dtanh.reshape(-1,1) @ h_prev.reshape(1,-1)
db_h += dtanh
dh_next = self.cell.W_hh.T @ dtanh
# ----- Gradient Clipping -----
clip_value = 1.0
dW_xh = np.clip(dW_xh, -clip_value, clip_value)
dW_hh = np.clip(dW_hh, -clip_value, clip_value)
db_h = np.clip(db_h, -clip_value, clip_value)
dW_hy = np.clip(dW_hy, -clip_value, clip_value)
db_y = np.clip(db_y, -clip_value, clip_value)
# ----- Update weights -----
self.W_hy -= lr * dW_hy
self.b_y -= lr * db_y
self.cell.W_xh -= lr * dW_xh
self.cell.W_hh -= lr * dW_hh
self.cell.b_h -= lr * db_h
return loss, y_pred
____________________________________________________________________________
Training the NumPy RNN
We create a dataset of sliding windows:
Each input sequence contains 5 consecutive temperatures
Each target is the next temperature
____________________________________________________________________________
def make_dataset(temps, seq_len=5):
X = []
y = []
for i in range(len(temps) - seq_len):
X.append(temps[i:i+seq_len])
y.append(temps[i+seq_len])
return np.array(X), np.array(y)
X, y = make_dataset(temps, seq_len=5)
print(X.shape, y.shape)
____________________________________________________________________________
Training loop:
Forward pass
Compute loss
BPTT
Gradient clipping
Gradient descent update
After training, we test the model by predicting day 6 from days 1 through 5.
____________________________________________________________________________
model = RNNPredictor(input_size=1, hidden_size=20)
for epoch in range(2500):
total_loss = 0
for seq, target in zip(X, y):
loss, pred = model.train_step(seq, target)
total_loss += loss
if epoch % 250 == 0:
print(f"epoch {epoch}, total_loss={total_loss}")
___________________________________________________________________________
test_seq = temps[:5] # or any 5‑day window
pred, hs, raws = model.forward_sequence(test_seq)
print("Input:", test_seq)
print("Prediction:", pred)
print("True next value:", temps[5])
error = (pred - temps[5])
error
Summary : What You Have Built
By completing this section, you have:
built an RNN from scratch
implemented recurrence manually
implemented BPTT manually
understood how hidden state flows through time
wrapped the model into clean, reusable classes
trained a real sequence model
This forms the foundation for the next section, where we move to PyTorch and explore more advanced options with RNN.
project 8: Section 2 : PyTorch Version exploring more with RNN
Comments
Post a Comment