Project 8: Recurrent Neural Networks 

(A Beginner-Friendly Walkthrough)


Kaggle Notebook
GitHub Repo

Recurrent Neural Networks (RNNs) are the first neural architecture designed to handle sequences. Unlike feed‑forward networks, which treat every input independently, an RNN carries information forward through time through a hidden state. This makes it suitable for time‑series data such as weather, text, audio, or stock prices.

To introduce RNN lets build the RNN entirely from scratch using NumPy. No deep‑learning frameworks. No shortcuts. Just the underlying math and the mechanics of recurrence.

By the end, you will understand how hidden state works, how backpropagation through time (BPTT) computes gradients, and how to wrap everything into clean, reusable classes.


Generating a Synthetic Weather Time-Series

To learn RNNs, we need a sequence with memory. Weather is a natural example: today’s temperature depends on previous days plus some randomness.

We generate a simple autoregressive process:

T_t = 0.7 * T_(t-1) + 0.2 * T_(t-2) + Normal(0, 0.5)

This produces a smooth, realistic temperature sequence with short‑term memory and noise. We will use the first 5 days to predict day 6.

___________________________________________________________________________


import numpy as np

temps = []

temps.append(20)   # seed day 0

temps.append(21)   # seed day 1


for t in range(2, 100):

    next_temp = (

        0.7 * temps[-1] +

        0.2 * temps[-2] +

        np.random.normal(0, 0.5)

    )

    temps.append(next_temp)


input_sq = temps[:5]

target = temps[5]

print(input_sq, target)

___________________________________________________________________________

The Minimal RNN Cell (Scalar Version)

Before building a full vectorized RNN, we start with the smallest possible version: one input, one hidden unit, one output. This makes the mechanics completely transparent.

The RNN cell has five parameters:

  • W_xh: input to hidden

  • W_hh: hidden to hidden (the recurrent connection)

  • b_h: hidden bias


  • W_hy: hidden to output

  • b_y: output bias

____________________________________________________________________________________


hidden_size = 1


W_xh = np.random.randn(hidden_size) * 0.01     # input → hidden

W_hh = np.random.randn(hidden_size) * 0.01    # hidden → hidden

b_h  = np.random.randn(hidden_size)    # hidden bias


W_hy = np.random.randn(hidden_size) * 0.01      # hidden → output

b_y  = np.random.randn()                 # output bias


print("\nWeights:")

print("W_xh:", W_xh)

print("W_hh:", W_hh)

print("b_h :", b_h, "\n")

print("W_hy:", W_hy)

print("b_y :", b_y)

___________________________________________________________________________


Forward pass at each timestep

This is where the RNN differs from a normal feed‑forward network


For each timestep in the input sequence:


Compute the pre‑activation

Given input x_t and previous hidden state h_(t-1):

  • a_t = W_xh * x_t + W_hh * h_(t-1) + b_h


Apply the activation

  • h_t = tanh(a_t)


Store:


  • the raw activation (a_t)

  • the hidden state (h_t)


This loop is the “unrolling through time” that gives RNNs memory.


After the final timestep, compute the output:

  • y = W_hy * h_T + b_y

This loop is the core of recurrence. The RNN maintains a memory value (the hidden state) that updates each day based on the new input and the previous memory.

___________________________________________________________________________


# Forward pass with storage for BPTT

hs = [0.0]   # h_0

raws = []    # a_t


h = 0.0

for x in input_sq:

    a = W_xh * x + W_hh * h + b_h

    h = np.tanh(a)

    raws.append(a)

    hs.append(h)


y_pred = W_hy * h + b_y


print("Final h:", h)

print("Prediction:", y_pred)

print("Target:", target)

___________________________________________________________________________

Loss Function

We use mean squared error:

  • L = (y_pred - target)^2

This measures how far the prediction is from the true next temperature.


Backpropagation Through Time (BPTT)

Training an RNN requires gradients to flow not only through layers but also through time. This is the purpose of BPTT.

Step A: Backprop through the output

Compute:

  • dL/dy

  • gradients for W_hy and b_y

  • gradient flowing into the final hidden state

Step B: Walk backward through each timestep

For each timestep t, in reverse order:

Compute derivative of tanh:

  1. dtanh = (1 - tanh(a_t)^2) * dh_next

  2. Accumulate gradients for:

    • W_xh

    • W_hh

    • b_h

Propagate gradient to the previous hidden state:

  1. dh_next = dtanh * W_hh

This is the "through time" part. The gradient flows backward across all timesteps, not just through the final layer.

___________________________________________________________________________


# Loss

loss = (y_pred - target)**2

print("Loss:", loss)


# dL/dy

dL_dy = 2 * (y_pred - target)


# Output layer gradients

dW_hy = dL_dy * hs[-1]

db_y  = dL_dy


# Gradient flowing into last hidden state

dh_next = dL_dy * W_hy

___________________________________________________________________________


Update the Parameters

Just like any neural network:


𝜃← 𝜃 − 𝜂 ⋅ ∇ 𝜃

You applied this to all weights and biases:


- W_xh, W_hh, b_h

- W_hy, b_y

___________________________________________________________________________


# Initialize RNN parameter grads

dW_xh = 0.0

dW_hh = 0.0

db_h  = 0.0


# Backprop through time

for t in reversed(range(len(input_sq))):

    a_t = raws[t]

    h_prev = hs[t]

    x_t = input_sq[t]


    # derivative of tanh

    da = (1 - np.tanh(a_t)**2) * dh_next


    # accumulate grads

    dW_xh += da * x_t

    dW_hh += da * h_prev

    db_h  += da


    # propagate to previous h

    dh_next = da * W_hh


print("dW_xh:", dW_xh)

print("dW_hh:", dW_hh)

print("db_h :", db_h)

print("dW_hy:", dW_hy)

print("db_y :", db_y)

____________________________________________________________________________


Wrapping Everything Into Clean Classes (Vectorized RNN)

Once the scalar version is understood, we scale up to a proper vectorized RNN.

RNNCell

Handles a single timestep.

Computes the hidden state.

Stores W_xh, W_hh, and b_h.

RNNPredictor

  • Unrolls the RNN across a sequence

  • Stores all hidden states and raw activations

  • Computes the output

  • Performs BPTT

  • Updates parameters

This structure mirrors real deep‑learning libraries such as PyTorch and TensorFlow.

____________________________________________________________________________


class RNNCell:

    def __init__(self, input_size, hidden_size):

        self.input_size = input_size

        self.hidden_size = hidden_size


        self.W_xh = np.random.randn(hidden_size, input_size) * 0.01

        self.W_hh = np.random.randn(hidden_size, hidden_size) * 0.01

        self.b_h  = np.random.randn(hidden_size)


    def forward(self, x_t, h_prev):

        raw = self.W_xh @ x_t + self.W_hh @ h_prev + self.b_h

        h_t = np.tanh(raw)

        return h_t, raw



class RNNPredictor:

    def __init__(self, input_size, hidden_size):

        self.cell = RNNCell(input_size, hidden_size)


        self.W_hy = np.random.randn(1, hidden_size) * 0.01

        self.b_y  = np.random.randn()


    def forward_sequence(self, sequence):

        h = np.zeros(self.cell.hidden_size)

        hs = [h]      # store hidden states

        raws = []     # store raw pre-activations


        for x in sequence:

            x_t = np.array([x])

            h, raw = self.cell.forward(x_t, h)

            hs.append(h)

            raws.append(raw)


        y_pred = self.W_hy @ h + self.b_y

        return y_pred, hs, raws


    def train_step(self, sequence, target, lr=0.0001):

        y_pred, hs, raws = self.forward_sequence(sequence)


        # ----- Loss -----

        loss = (y_pred - target)**2


        # ----- Gradients -----

        dL_dy = 2 * (y_pred - target)  # scalar


        # Output layer grads

        dW_hy = dL_dy * hs[-1].reshape(1, -1)

        db_y  = dL_dy


        # Backprop into last hidden state

        dh_next = (self.W_hy.T * dL_dy).flatten()


        # Initialize grads for RNN cell

        dW_xh = np.zeros_like(self.cell.W_xh)

        dW_hh = np.zeros_like(self.cell.W_hh)

        db_h  = np.zeros_like(self.cell.b_h)


        # ----- BPTT -----

        for t in reversed(range(len(sequence))):

            raw = raws[t]

            h_prev = hs[t]

        

            dtanh = (1 - np.tanh(raw)**2) * dh_next

        

            x_t = np.array([sequence[t]])

            dW_xh += dtanh.reshape(-1,1) @ x_t.reshape(1,-1)

            dW_hh += dtanh.reshape(-1,1) @ h_prev.reshape(1,-1)

            db_h  += dtanh

        

            dh_next = self.cell.W_hh.T @ dtanh

        

        # ----- Gradient Clipping -----

        clip_value = 1.0

        dW_xh = np.clip(dW_xh, -clip_value, clip_value)

        dW_hh = np.clip(dW_hh, -clip_value, clip_value)

        db_h  = np.clip(db_h,  -clip_value, clip_value)

        dW_hy = np.clip(dW_hy, -clip_value, clip_value)

        db_y  = np.clip(db_y,  -clip_value, clip_value)

        

        # ----- Update weights -----

        self.W_hy -= lr * dW_hy

        self.b_y  -= lr * db_y

        self.cell.W_xh -= lr * dW_xh

        self.cell.W_hh -= lr * dW_hh

        self.cell.b_h  -= lr * db_h



        return loss, y_pred

____________________________________________________________________________

Training the NumPy RNN

We create a dataset of sliding windows:

  • Each input sequence contains 5 consecutive temperatures

  • Each target is the next temperature

____________________________________________________________________________


def make_dataset(temps, seq_len=5):

    X = []

    y = []

    for i in range(len(temps) - seq_len):

        X.append(temps[i:i+seq_len])

        y.append(temps[i+seq_len])

    return np.array(X), np.array(y)


X, y = make_dataset(temps, seq_len=5)

print(X.shape, y.shape)

____________________________________________________________________________


Training loop:

  1. Forward pass

  2. Compute loss

  3. BPTT

  4. Gradient clipping

  5. Gradient descent update

After training, we test the model by predicting day 6 from days 1 through 5.

____________________________________________________________________________

model = RNNPredictor(input_size=1, hidden_size=20)


for epoch in range(2500):

    total_loss = 0


    for seq, target in zip(X, y):

        loss, pred = model.train_step(seq, target)

        total_loss += loss


    if epoch % 250 == 0:

        print(f"epoch {epoch}, total_loss={total_loss}")

___________________________________________________________________________


test_seq = temps[:5]          # or any 5‑day window

pred, hs, raws = model.forward_sequence(test_seq)


print("Input:", test_seq)

print("Prediction:", pred)

print("True next value:", temps[5])


error = (pred - temps[5])

error


Summary : What You Have Built

By completing this section, you have:

  • built an RNN from scratch

  • implemented recurrence manually

  • implemented BPTT manually

  • understood how hidden state flows through time

  • wrapped the model into clean, reusable classes

  • trained a real sequence model

This forms the foundation for the next section, where we move to PyTorch and explore more advanced options with RNN.

project 8: Section 2 : PyTorch Version exploring more with RNN


Comments

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives