What You'll Build
In this tutorial, you'll train an image classifier using PyTorch on a custom dataset. We'll cover the full pipeline: environment setup, data loading with transforms, defining a model (using transfer learning), writing a training loop, and evaluating results. This guide assumes basic Python familiarity and that you have PyTorch installed.
Prerequisites
- Python 3.9+
- PyTorch 2.x and torchvision installed (
pip install torch torchvision) - A dataset organized in ImageFolder format (subfolders = class labels)
- A CUDA-capable GPU is recommended but not required
Step 1: Organize Your Dataset
PyTorch's ImageFolder expects your data structured as:
data/
train/
cats/ img1.jpg img2.jpg ...
dogs/ img1.jpg img2.jpg ...
val/
cats/ ...
dogs/ ...
A typical split is 80% training, 20% validation. Ensure class balance where possible.
Step 2: Define Data Transforms and Load Data
Transforms normalize inputs and apply augmentation during training to improve generalization:
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
train_transforms = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
val_transforms = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
train_dataset = datasets.ImageFolder("data/train", transform=train_transforms)
val_dataset = datasets.ImageFolder("data/val", transform=val_transforms)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)
The normalization values are ImageNet statistics — use these when doing transfer learning from ImageNet-pretrained weights.
Step 3: Load a Pretrained Model and Modify the Head
Transfer learning from ResNet-50 is a powerful starting point. We freeze the backbone and replace the final fully connected layer for our number of classes:
import torch
import torch.nn as nn
from torchvision import models
num_classes = len(train_dataset.classes)
model = models.resnet50(weights="IMAGENET1K_V2")
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace the final layer
model.fc = nn.Linear(model.fc.in_features, num_classes)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
Step 4: Define Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)
We only pass model.fc.parameters() to the optimizer since the backbone is frozen. Use a cosine annealing or ReduceLROnPlateau scheduler for longer training runs.
Step 5: Write the Training Loop
def train_epoch(model, loader, criterion, optimizer, device):
model.train()
total_loss, correct = 0, 0
for images, labels in loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
total_loss += loss.item() * images.size(0)
correct += (outputs.argmax(1) == labels).sum().item()
return total_loss / len(loader.dataset), correct / len(loader.dataset)
for epoch in range(10):
loss, acc = train_epoch(model, train_loader, criterion, optimizer, device)
scheduler.step()
print(f"Epoch {epoch+1}: Loss={loss:.4f}, Acc={acc:.4f}")
Step 6: Evaluate and Save
model.eval()
correct = 0
with torch.no_grad():
for images, labels in val_loader:
images, labels = images.to(device), labels.to(device)
preds = model(images).argmax(1)
correct += (preds == labels).sum().item()
print(f"Val Accuracy: {correct / len(val_dataset):.4f}")
torch.save(model.state_dict(), "classifier.pth")
Next Steps
- Unfreeze the backbone after initial head training and fine-tune end-to-end with a lower learning rate (1e-5).
- Use mixed precision (
torch.cuda.amp) to accelerate training on modern GPUs. - Track experiments with Weights & Biases or MLflow for reproducibility.
- Export to ONNX for deployment:
torch.onnx.export(model, ...).