PyTorch 卷积神经网络示例-谢先斌的博客

本文详细介绍如何基于 PyTorch CPU 构建一个卷积神经网络 (CNN) 来对 CIFAR-10 图片进行分类。

介绍

构建一个经典的卷积神经网络 (CNN, convolution neural network)，对 CIFAR-10 数据集中的图像进行分类
CNN 是一类神经网络，定义为用于检测数据中复杂特征的多层神经网络，它最常用于计算机视觉应用
本次示例 CNN 网络结构将包括以下 14 层：

Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm -> ReLU -> MaxPool -> Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm -> ReLU -> Linear.

CIFAR-10 数据集是一个常用的计算机视觉数据集，包含 60000 张 32x32 彩色图片，分为 10 个类别，每个类别有 6000 张图片。其中 50000 张用于训练，10000 张用于测试。

实现

整个训练流程将包括以下几个步骤：

准备环境：安装必要的库
数据加载与预处理：下载 CIFAR-10 数据集，并进行必要的转换（如如将图片转换为 PyTorch 的 Tensor 格式，并进行归一化）
定义神经网络模型：构建一个基于卷积层的神经网络
1. 卷积神经网络：通常由卷积层 (Conv2d)、激活函数 (ReLU)、池化层 (MaxPool2d) 和全连接层 (Linear) 组成
定义损失函数与优化器：选择合适的损失函数和优化算法
1. 损失函数，如 Adam、交叉熵损失 (CrossEntropyLoss) 作为损失函数
训练模型：在训练集上迭代训练模型
评估模型：在测试集上评估模型的性能

项目目录结构：

$ tree .
.
├── cnn-demo.pth
├── data # 数据集可以自动下载
│   └── cifar-10-python.tar.gz
└── PyTorchTraining.py

准备环境

确保你已经安装了 PyTorch。如果没有，可以使用 pip 安装：

pip install torch torchvision matplotlib numpy===1.26.4

下载数据集

在 https://www.cs.toronto.edu/~kriz/cifar.html 下载 cifar-10-python.tar.gz，放到 data 目录下

源码

PyTorchTraining.py

from torchvision.datasets import CIFAR10
from torchvision.transforms import transforms
from torch.utils.data import DataLoader

# Loading and normalizing the data.
# Define transformations for the training and test sets
# 定义数据转换操作
# transforms.Compose 将多个转换操作按顺序组合起来
transformations = transforms.Compose([
    transforms.ToTensor(),  # 将 PIL Image 或 numpy.ndarray 转换为 FloatTensor，并将其像素值从 [0, 255] 缩放到 [0.0, 1.0]
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # 对张量进行归一化，参数为 (mean, std)
                                                            # 这里我们对 RGB 三个通道分别进行归一化，均值和标准差都设为 0.5，
                                                            # 这样可以将像素值从 [0, 1] 映射到 [-1, 1]
])

# 加载 CIFAR-10 训练数据集
# root: 数据集存储路径
# train: 是否加载训练集
# download: 如果数据集不存在，是否自动下载
# transform: 应用于数据集的转换
# CIFAR10 dataset consists of 50K training images. We define the batch size of 10 to load 5,000 batches of images. 设置批次大小
batch_size = 10
number_of_labels = 10

# Create an instance for training.
# When we run this code for the first time, the CIFAR10 train dataset will be downloaded locally.
train_set =CIFAR10(root="./data",train=True,transform=transformations,download=True)

# 创建训练数据加载器
# shuffle=True: 每个 epoch 都会打乱数据，有助于防止模型过拟合
# Create a loader for the training set which will read the data within batch size and put into memory.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0)  # num_workers: 用于数据加载的子进程数量
print("The number of images in a training set is: ", len(train_loader)*batch_size)

# 加载 CIFAR-10 测试数据集
# Create an instance for testing, note that train is set to False.
# When we run this code for the first time, the CIFAR10 test dataset will be downloaded locally.
test_set = CIFAR10(root="./data", train=False, transform=transformations, download=True)

# 创建测试数据加载器
# Create a loader for the test set which will read the data within batch size and put into memory.
# Note that each shuffle is set to false for the test loader.
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=0)
print("The number of images in a test set is: ", len(test_loader)*batch_size)

# 定义 CIFAR-10 的类别名称
print("The number of batches per epoch is: ", len(train_loader))
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

import torch
import torch.nn as nn
import torchvision
import torch.nn.functional as F

# 定义神经网络模型
# 定义卷积神经网络模型
class Network(nn.Module):
    """
    一个简单的卷积神经网络，用于 CIFAR-10 图像分类。

    结构：
    - 卷积层 1: 3 输入通道, 12 输出通道, 5x5 卷积核
    - 批量归一化 1
    - 卷积层 2: 12 输入通道, 12 输出通道, 5x5 卷积核
    - 批量归一化 2
    - 最大池化层: 2x2 池化窗口
    - 卷积层 4: 12 输入通道, 24 输出通道, 5x5 卷积核
    - 批量归一化 4
    - 卷积层 5: 24 输入通道, 24 输出通道, 5x5 卷积核
    - 批量归一化 5
    - 全连接层: 24*10*10 输入特征, 10 输出特征 (对应 10 个类别)
    """
    def __init__(self):
        """
        初始化神经网络的各个层。
        """
        super(Network, self).__init__()

        # 第一个卷积层
        # 输入通道数 = 3 (RGB 图像)
        # 输出通道数 = 12
        # 卷积核大小 = 5x5
        # 步长 = 1
        # 填充 (padding) = 1
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(12) # 批量归一化

        # 第二个卷积层
        # 输入通道数 = 12 (上一层的输出通道数)
        # 输出通道数 = 12
        # 卷积核大小 = 5x5
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=5, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(12) # 批量归一化
        self.pool = nn.MaxPool2d(2,2) # 最大池化层，2x2 窗口，步长为 2
        self.conv4 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5, stride=1, padding=1)
        self.bn4 = nn.BatchNorm2d(24) # 批量归一化
        self.conv5 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=5, stride=1, padding=1)
        self.bn5 = nn.BatchNorm2d(24) # 批量归一化

        # 全连接层 (线性层)
        # 输入特征数：根据卷积层输出的特征图大小计算
        # 原始图像 32x32 -> 卷积层 conv1 28x28 -> 最大池化层 pool 14x14 -> 卷积层 conv2 10x10 -> 最大池化层 pool 5x5
        # 所以，全连接层输入特征数是 24 (通道数) * 10 * 10 = 2400
        # 输出特征数 = 10 (类别数)
        self.fc1 = nn.Linear(24*10*10, 10)

    def forward(self, input):
        """
        定义网络的前向传播过程。

        参数:
            input (Tensor): 输入张量，形状为 (批大小, 通道数, 高度, 宽度)。

        返回值:
            Tensor: 输出张量，形状为 (批大小, 类别数)。
        """
        output = F.relu(self.bn1(self.conv1(input))) # 卷积 -> 批量归一化 -> ReLU 激活
        output = F.relu(self.bn2(self.conv2(output))) # 卷积 -> 批量归一化 -> ReLU 激活
        output = self.pool(output) # 最大池化
        output = F.relu(self.bn4(self.conv4(output))) # 卷积 -> 批量归一化 -> ReLU 激活
        output = F.relu(self.bn5(self.conv5(output))) # 卷积 -> 批量归一化 -> ReLU 激活
        output = output.view(-1, 24*10*10) # 将多维张量展平为二维张量，-1 表示自动计算批大小
        output = self.fc1(output) # 全连接层

        return output

# 实例化网络并将其移动到 CPU (因为我们是基于 CPU 构建)
# Instantiate a neural network model
model = Network()
print("神经网络模型定义完成。")

from torch.optim import Adam

# 定义损失函数 (交叉熵损失)
# Define the loss function with Classification Cross-Entropy loss and an optimizer with Adam optimizer
loss_fn = nn.CrossEntropyLoss()
# 定义优化器
# net.parameters(): 传入网络的参数，这些参数在训练过程中会被优化器更新
# lr: 学习率 (learning rate)，控制每次参数更新的步长
optimizer = Adam(model.parameters(), lr=0.001, weight_decay=0.0001)
print("损失函数和优化器定义完成。")

from torch.autograd import Variable

# Function to save the model
def saveModel():
    path = "./cnn-demo.pth"
    torch.save(model.state_dict(), path)
    print("")

# Function to test the model with the test dataset and print the accuracy for the test images
def testAccuracy():

    model.eval()
    accuracy = 0.0
    total = 0.0
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            # run the model on the test set to predict labels
            outputs = model(images.to(device))
            # the label with the highest energy will be our prediction
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            accuracy += (predicted == labels.to(device)).sum().item()

    # compute the accuracy over all test images
    accuracy = (100 * accuracy / total)
    return(accuracy)

# 训练模型
# Training function. We simply have to loop over our data iterator and feed the inputs to the network and optimize.
def train(num_epochs):

    print(f"开始训练模型，共 {num_epochs} 个 epoch...")

    best_accuracy = 0.0

    # Define your execution device
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("The model will be running on", device, "device")
    # Convert model parameters and buffers to CPU or Cuda
    model.to(device)

    for epoch in range(num_epochs):  # loop over the dataset multiple times
        running_loss = 0.0
        running_acc = 0.0

        for i, (images, labels) in enumerate(train_loader, 0):

            # get the inputs
            images = Variable(images.to(device))
            labels = Variable(labels.to(device))

            # zero the parameter gradients
            optimizer.zero_grad()
            # predict classes using images from the training set
            outputs = model(images)
            # compute the loss based on model output and real labels
            loss = loss_fn(outputs, labels)
            # backpropagate the loss
            loss.backward()
            # adjust parameters based on the calculated gradients
            optimizer.step()

            # Let's print statistics for every 1,000 images
            running_loss += loss.item()     # extract the loss value
            if i % 1000 == 999:
                # print every 1000 (twice per epoch)
                print('[%d/%d, %5d] loss: %.3f' %
                      (epoch + 1, num_epochs, i + 1, running_loss / 1000))
                # zero the loss
                running_loss = 0.0

        # Compute and print the average accuracy fo this epoch when tested over all 10000 test images
        accuracy = testAccuracy()
        print('For epoch', epoch+1,'the test accuracy over the whole test set is %d %%' % (accuracy))

        # we want to save the model if the accuracy is the best
        if accuracy > best_accuracy:
            saveModel()
            best_accuracy = accuracy

# Test the model on the test data
import matplotlib.pyplot as plt
import numpy as np

# Function to show the images
def imageshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# Function to test the model with a batch of images and show the labels predictions
def testBatch():
    # get batch of images from the test DataLoader
    images, labels = next(iter(test_loader))

    # show all images as one image grid
    imageshow(torchvision.utils.make_grid(images))

    # Show the real labels on the screen
    print('Real labels: ', ' '.join('%5s' % classes[labels[j]]
                               for j in range(batch_size)))

    # Let's see what if the model identifiers the  labels of those example
    outputs = model(images)

    # We got the probability for every 10 labels. The highest (max) probability should be correct label
    _, predicted = torch.max(outputs, 1)

    # Let's show the predicted labels on the screen to compare with the real ones
    print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(batch_size)))


if __name__ == "__main__":

    # Let's build our model
    train(5)
    print('Finished Training')

    # Test which classes performed well
    # testAccuracy()

    # Let's load the model we just created and test the accuracy per label
    model = Network()
    path = "cnn-demo.pth"
    model.load_state_dict(torch.load(path))

    # 输出模型结构
    print("模型结构：")
    for name, module in model.named_children():
        print(f"  {name}: {module}")

    # Test with batch of images
    testBatch()
source repo / download raw

训练模型

将迭代多个 epoch，在每个 epoch 中遍历训练数据，执行前向传播、计算损失、反向传播和参数更新。

$ python3 PyTorchTraining.py
Files already downloaded and verified
The number of images in a training set is:  50000
Files already downloaded and verified
The number of images in a test set is:  10000
The number of batches per epoch is:  5000
神经网络模型定义完成。
损失函数和优化器定义完成。
开始训练模型，共 5 个 epoch...
The model will be running on cpu device
[1/5,  1000] loss: 1.764
[1/5,  2000] loss: 1.472
[1/5,  3000] loss: 1.303
[1/5,  4000] loss: 1.168
[1/5,  5000] loss: 1.094
For epoch 1 the test accuracy over the whole test set is 64 %

[2/5,  1000] loss: 1.027
[2/5,  2000] loss: 1.001
[2/5,  3000] loss: 0.981
[2/5,  4000] loss: 0.983
[2/5,  5000] loss: 0.941
For epoch 2 the test accuracy over the whole test set is 67 %

[3/5,  1000] loss: 0.868
[3/5,  2000] loss: 0.865
[3/5,  3000] loss: 0.865
[3/5,  4000] loss: 0.868
[3/5,  5000] loss: 0.859
For epoch 3 the test accuracy over the whole test set is 69 %

[4/5,  1000] loss: 0.802
[4/5,  2000] loss: 0.785
[4/5,  3000] loss: 0.785
[4/5,  4000] loss: 0.792
[4/5,  5000] loss: 0.777
For epoch 4 the test accuracy over the whole test set is 70 %

[5/5,  1000] loss: 0.721
[5/5,  2000] loss: 0.719
[5/5,  3000] loss: 0.739
[5/5,  4000] loss: 0.744
[5/5,  5000] loss: 0.740
For epoch 5 the test accuracy over the whole test set is 71 %

Finished Training
模型结构：
  conv1: Conv2d(3, 12, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
  bn1: BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  conv2: Conv2d(12, 12, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
  bn2: BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  pool: MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  conv4: Conv2d(12, 24, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
  bn4: BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  conv5: Conv2d(24, 24, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
  bn5: BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  fc1: Linear(in_features=2400, out_features=10, bias=True)
Real labels:    cat  ship  ship plane  frog  frog   car  frog   cat   car
Predicted:    dog   car plane plane  deer   dog   car  deer   dog   car

貌似预测的一点也不准，有时间试试这个
该模型上传到 https://huggingface.co/xiexianbin/cnn-demo

结束

通过上述代码和详细注释，你应该能够理解如何基于 PyTorch CPU 构建和训练一个简单的卷积神经网络来完成 CIFAR-10 图片分类任务。你可以尝试修改网络结构、超参数（如学习率、epoch 数量、批次大小）来进一步优化模型性能。

PyTorch 卷积神经网络示例

介绍

实现

准备环境

下载数据集

源码

训练模型

结束

参考

Cookie Notice!