Project 7: Introduction to PyTorch

Tensors, Autograd, and Rebuilding the Project 6 Network

In Project 6, we built a neural network framework completely from scratch.

We implemented:

Dense layers
Activation functions
Forward and backward passes
A training loop
Gradient updates
Model saving and loading

By the end, we had a tiny version of PyTorch that worked exactly like a real deep‑learning library.

Project 7 is where we switch from building the tools… to using the tools.

How PyTorch Maps to Project 6

Everything you built manually now has a PyTorch equivalent:

import torch

import torch.nn as nn

import torch.optim as optim

Imports:

- torch.nn as nn [PyTorch docs :torch.nn ](https://docs.pytorch.org/docs/stable/nn.html)

- torch.optim [PyTorch docs](https://docs.pytorch.org/docs/stable/optim.html#module-torch.optim)

- a package implementing various optimization algorithms.

Containers :

- nn.Module : Base class for all neural network modules.

- nn.Sequential() : [pytorch doc :Sequential](https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential)

Non-Linear Activations :

- nn.ReLU

- nn.Sigmoid

Loss Functions :

- nn.BCELoss() : Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:

- nn.BCEWithLogitsLoss() : This loss combines a Sigmoid layer and the BCELoss in one single class.

What Is a Tensor?

A tensor is PyTorch’s fundamental data structure.
It looks like a NumPy array, but with two superpowers:

1. Tensors can run on a GPU

This allows PyTorch to scale from XOR → CNNs → Transformers without changing your code.

2. Tensors track operations for autograd

If you set:

x = torch.tensor([1., 2., 3.], requires_grad=True)

PyTorch builds a computation graph behind the scenes.

Every operation is recorded so PyTorch can compute gradients automatically during backprop.

This is the key difference:

NumPy array: just numbers
PyTorch tensor: numbers + history of operations

This is why PyTorch can compute derivatives without us writing a single gradient formula.

# Dataset (Same as Project 6)

X = torch.tensor([

[0., 0.],

[0., 1.],

[1., 0.],

[1., 1.]

])

y = torch.tensor([

[0.],

[1.],

[0.]

])

1. set up a class with nn.Module

2. Initialize the class

3. initialize the parent class.

4. Define the network

self.name_of_network = nn.Sequential()

- Sequential takes the layers and activation separately
- The layer nn.Linear() takes the input and output dimensions.
- Followed by an activation function.

Since the BCEWithLogitsLoss() has a built-in sigmoid layer we can leave the sigmoid out of the model.

5. Define a method for forward

class XORNet_simple(nn.Module): #nn.Module Base class for all neural network modules

def __init__(self):

super().__init__()

self.net = nn.Sequential(

nn.Linear(2, 3), # Input → Hidden

nn.ReLU(),

nn.Linear(3, 1), # Hidden → Output

#nn.Sigmoid()

)

def forward(self, x):

return self.net(x)

What happens when you use BCEWithLogitsLoss()

BCEWithLogitsLoss does two things in one:

1. Applies the sigmoid activation:

- σ(z2)=11+e−z2

2. Computes binary cross‑entropy:

- loss=−[ylog⁡(σ(z2))+(1−y)log⁡(1−σ(z2))]

So when you do:

python:

loss_fn = nn.BCEWithLogitsLoss()

loss = loss_fn(model(X), y)

PyTorch internally performs:

sigmoid on your raw outputs
then BCE

You do not need to put a Sigmoid() in your model.

Adding the sigmoid manually would be incorrect because you’d be applying sigmoid twice; this is exactly what I did when I first set up this project.

loss_fn = nn.BCEWithLogitsLoss() # stable version of BCE

optimizer = optim.SGD(model.parameters(), lr=0.1) # matches your scratch trainer

print(optimizer.param_groups)

The learning Loop

just like all the other projects we have:

- intilized weights
- defined the model with a forward pass
- defined the loss function (BCEWithLogitsLoss())
- defined the update rule (optimizer)

epochs = 3000

for epoch in range(epochs):

optimizer.zero_grad()

output = model(X)

loss = loss_fn(output, y)

loss.backward()

optimizer.step()

if epoch % 200 == 0:

print(f"Epoch {epoch}: Loss = {loss.item():.6f}")

for name, param in model.named_parameters():

print(f"name {name} : params: {param.data}")

What the Model Actually Returns: Understanding Logits

This is the dot product plus bias before any activation function.

Logits can be any real number.
They are not between 0 and 1.
They are not yet interpretable as probabilities.

This is intentional. PyTorch wants logits because they are numerically stable for training.

Making Predictions

During inference, we apply sigmoid manually to convert logits into probabilities.

with torch.no_grad():

logits = model(X)

preds = torch.sigmoid(logits)

print("\nPredictions:")

for inp, pred in zip(X, preds):

print(f"Input: {inp.tolist()} -> Prediction: {pred.item():.4f}")

Search This Blog

Human Side of Tech

Project 7: Introduction to PyTorch

In Project 6, we built a neural network framework completely from scratch.

How PyTorch Maps to Project 6

What Is a Tensor?

What happens when you use BCEWithLogitsLoss()

The learning Loop

What the Model Actually Returns: Understanding Logits

Making Predictions

Comments

Post a Comment

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives