Project 7: Introduction to PyTorch

 Project 7: Introduction to PyTorch




Tensors, Autograd, and Rebuilding the Project 6 Network

In Project 6, we built a neural network framework completely from scratch.


We implemented:


  • Dense layers

  • Activation functions

  • Forward and backward passes

  • A training loop

  • Gradient updates

  • Model saving and loading


By the end, we had a tiny version of PyTorch that worked exactly like a real deep‑learning library.

Project 7 is where we switch from building the tools… to using the tools.



How PyTorch Maps to Project 6

  • Everything you built manually now has a PyTorch equivalent:



import torch

import torch.nn as nn

import torch.optim as optim




Imports:

- torch.nn as nn [PyTorch docs :torch.nn ](https://docs.pytorch.org/docs/stable/nn.html)

- torch.optim [PyTorch docs](https://docs.pytorch.org/docs/stable/optim.html#module-torch.optim)

    - a package implementing various optimization algorithms.


Containers :

- nn.Module : Base class for all neural network modules.

- nn.Sequential() : [pytorch doc :Sequential](https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential)


Non-Linear Activations :

- nn.ReLU

- nn.Sigmoid


Loss Functions :

- nn.BCELoss() : Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:

- nn.BCEWithLogitsLoss() : This loss combines a Sigmoid layer and the BCELoss in one single class.



What Is a Tensor?

A tensor is PyTorch’s fundamental data structure.
It looks like a NumPy array, but with two superpowers:


1. Tensors can run on a GPU

This allows PyTorch to scale from XOR → CNNs → Transformers without changing your code.


2. Tensors track operations for autograd

If you set:

x = torch.tensor([1., 2., 3.], requires_grad=True)


PyTorch builds a computation graph behind the scenes.


Every operation is recorded so PyTorch can compute gradients automatically during backprop.


This is the key difference:


  • NumPy array: just numbers

  • PyTorch tensor: numbers + history of operations


This is why PyTorch can compute derivatives without us writing a single gradient formula.





# Dataset (Same as Project 6)

X = torch.tensor([

    [0., 0.],

    [0., 1.],

    [1., 0.],

    [1., 1.]

])


y = torch.tensor([

    [0.],

    [1.],

    [1.],

    [0.]

])



1. set up a class with nn.Module

2. Initialize the class

3. initialize the parent class. 

4. Define the network

self.name_of_network = nn.Sequential()


  • - Sequential takes the layers and activation separately

  • - The layer nn.Linear() takes the input and output dimensions.

  • - Followed by an activation function. 


Since the BCEWithLogitsLoss() has a built-in sigmoid layer we can leave the sigmoid out of the model.


5. Define a method for forward


class XORNet_simple(nn.Module):  #nn.Module Base class for all neural network modules

    def __init__(self):

        super().__init__()

        self.net = nn.Sequential(

            nn.Linear(2, 3),     # Input → Hidden

            nn.ReLU(),

            nn.Linear(3, 1),     # Hidden → Output

            #nn.Sigmoid()

        )


    def forward(self, x): 

        return self.net(x)

What happens when you use BCEWithLogitsLoss()


BCEWithLogitsLoss does two things in one:


1. Applies the sigmoid activation:

    - σ(z2)=11+e−z2

2. Computes binary cross‑entropy:

    - loss=−[ylog⁡(σ(z2))+(1−y)log⁡(1−σ(z2))]


So when you do:


python:

loss_fn = nn.BCEWithLogitsLoss()

loss = loss_fn(model(X), y)


PyTorch internally performs:

  • sigmoid on your raw outputs

  • then BCE


You do not need to put a Sigmoid() in your model.

Adding the sigmoid manually would be incorrect because you’d be applying sigmoid twice; this is exactly what I did when I first set up this project.



loss_fn = nn.BCEWithLogitsLoss()  # stable version of BCE

optimizer = optim.SGD(model.parameters(), lr=0.1)  # matches your scratch trainer

print(optimizer.param_groups)



The learning Loop


just like all the other projects we have:

  • - intilized weights

  • - defined the model with a forward pass

  • - defined the loss function (BCEWithLogitsLoss())

  • - defined the update rule (optimizer)




epochs = 3000


for epoch in range(epochs):

    optimizer.zero_grad()


    output = model(X)

    loss = loss_fn(output, y)


    loss.backward()

    optimizer.step()

    

    if epoch % 200 == 0:

        print(f"Epoch {epoch}: Loss = {loss.item():.6f}")



for name, param in model.named_parameters():

    print(f"name {name} : params: {param.data}")

What the Model Actually Returns: Understanding Logits


This is the dot product plus bias before any activation function.


  • Logits can be any real number. 

  • They are not between 0 and 1. 

  • They are not yet interpretable as probabilities.


This is intentional. PyTorch wants logits because they are numerically stable for training.


Making Predictions

During inference, we apply sigmoid manually to convert logits into probabilities.



with torch.no_grad():

    logits = model(X)

    preds = torch.sigmoid(logits)


    print("\nPredictions:")

    for inp, pred in zip(X, preds):

        print(f"Input: {inp.tolist()} -> Prediction: {pred.item():.4f}")


Comments

Popular posts from this blog

How an AI Agent Works Without a Framework

Linear Regression: One Idea, Three Perspectives