Project 7: Introduction to PyTorch
Project 7: Introduction to PyTorch
Tensors, Autograd, and Rebuilding the Project 6 Network
In Project 6, we built a neural network framework completely from scratch.
We implemented:
Dense layers
Activation functions
Forward and backward passes
A training loop
Gradient updates
Model saving and loading
By the end, we had a tiny version of PyTorch that worked exactly like a real deep‑learning library.
Project 7 is where we switch from building the tools… to using the tools.
How PyTorch Maps to Project 6
Everything you built manually now has a PyTorch equivalent:
import torch
import torch.nn as nn
import torch.optim as optim
Imports:
- torch.nn as nn [PyTorch docs :torch.nn ](https://docs.pytorch.org/docs/stable/nn.html)
- torch.optim [PyTorch docs](https://docs.pytorch.org/docs/stable/optim.html#module-torch.optim)
- a package implementing various optimization algorithms.
Containers :
- nn.Module : Base class for all neural network modules.
- nn.Sequential() : [pytorch doc :Sequential](https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential)
Non-Linear Activations :
- nn.ReLU
- nn.Sigmoid
Loss Functions :
- nn.BCELoss() : Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:
- nn.BCEWithLogitsLoss() : This loss combines a Sigmoid layer and the BCELoss in one single class.
What Is a Tensor?
A tensor is PyTorch’s fundamental data structure.
It looks like a NumPy array, but with two superpowers:
1. Tensors can run on a GPU
This allows PyTorch to scale from XOR → CNNs → Transformers without changing your code.
2. Tensors track operations for autograd
If you set:
x = torch.tensor([1., 2., 3.], requires_grad=True)
PyTorch builds a computation graph behind the scenes.
Every operation is recorded so PyTorch can compute gradients automatically during backprop.
This is the key difference:
NumPy array: just numbers
PyTorch tensor: numbers + history of operations
This is why PyTorch can compute derivatives without us writing a single gradient formula.
# Dataset (Same as Project 6)
X = torch.tensor([
[0., 0.],
[0., 1.],
[1., 0.],
[1., 1.]
])
y = torch.tensor([
[0.],
[1.],
[1.],
[0.]
])
1. set up a class with nn.Module
2. Initialize the class
3. initialize the parent class.
4. Define the network
self.name_of_network = nn.Sequential()
- Sequential takes the layers and activation separately
- The layer nn.Linear() takes the input and output dimensions.
- Followed by an activation function.
Since the BCEWithLogitsLoss() has a built-in sigmoid layer we can leave the sigmoid out of the model.
5. Define a method for forward
class XORNet_simple(nn.Module): #nn.Module Base class for all neural network modules
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(2, 3), # Input → Hidden
nn.ReLU(),
nn.Linear(3, 1), # Hidden → Output
#nn.Sigmoid()
)
def forward(self, x):
return self.net(x)
What happens when you use BCEWithLogitsLoss()
BCEWithLogitsLoss does two things in one:
1. Applies the sigmoid activation:
- σ(z2)=11+e−z2
2. Computes binary cross‑entropy:
- loss=−[ylog(σ(z2))+(1−y)log(1−σ(z2))]
So when you do:
python:
loss_fn = nn.BCEWithLogitsLoss()
loss = loss_fn(model(X), y)
PyTorch internally performs:
sigmoid on your raw outputs
then BCE
You do not need to put a Sigmoid() in your model.
Adding the sigmoid manually would be incorrect because you’d be applying sigmoid twice; this is exactly what I did when I first set up this project.
loss_fn = nn.BCEWithLogitsLoss() # stable version of BCE
optimizer = optim.SGD(model.parameters(), lr=0.1) # matches your scratch trainer
print(optimizer.param_groups)
The learning Loop
just like all the other projects we have:
- intilized weights
- defined the model with a forward pass
- defined the loss function (BCEWithLogitsLoss())
- defined the update rule (optimizer)
epochs = 3000
for epoch in range(epochs):
optimizer.zero_grad()
output = model(X)
loss = loss_fn(output, y)
loss.backward()
optimizer.step()
if epoch % 200 == 0:
print(f"Epoch {epoch}: Loss = {loss.item():.6f}")
for name, param in model.named_parameters():
print(f"name {name} : params: {param.data}")
What the Model Actually Returns: Understanding Logits
This is the dot product plus bias before any activation function.
Logits can be any real number.
They are not between 0 and 1.
They are not yet interpretable as probabilities.
This is intentional. PyTorch wants logits because they are numerically stable for training.
Making Predictions
During inference, we apply sigmoid manually to convert logits into probabilities.
with torch.no_grad():
logits = model(X)
preds = torch.sigmoid(logits)
print("\nPredictions:")
for inp, pred in zip(X, preds):
print(f"Input: {inp.tolist()} -> Prediction: {pred.item():.4f}")
Comments
Post a Comment