使用 PyTorch 保存和加载模型 | 附完整代码_飞道的博客

使用 PyTorch 保存和加载模型 | 附完整代码

2021-03-20 10:45 861人阅读评论(0)

欢迎关注 “小白玩转Python”，发现更多 “有趣”

本文的目的是展示如何保存一个模型并加载它，以便在上一个 epoch 之后继续训练并进行预测。如果您正在阅读本文，我假定您熟悉深度学习和 PyTorch 的基本知识。

你是否经历过这样的情况：你花了几个小时或几天的时间来训练你的模型，然后它在中途停止了？或者你对自己的模型表现不满意，想继续训练？出于多种原因，我们可能需要一种灵活的方式来保存和加载我们的模型。

现在有很多免费的云服务，如 Kaggle、 Google Colab 等都有空闲超时功能，这会导致你的笔记本电脑断开连接，而且一旦超时，笔记本电脑就会被断开或中断。除非你用 GPU 训练一小段 epoch，否则这个过程需要时间。能够保存模型会给你带来巨大的优势，从而挽救局面。为了灵活起见，我将同时保存最新的 ckpt 和最好的 ckpt。

本文中的数据集使用比较常用的 Fashion_MNIST_data，我们将从导入数据中编写一个完整的流程来进行预测。（本文将使用 Kaggle 进行训练）

第一步：准备

在 Kaggle 默认情况下，您正在处理的文件被称为__notebook__.ipyn


   
    
     
      
     
     
      
       #
        uncomment if you want to create directory checkpoint, best_model
      
     
    
     
      
     
     
      
       %
       mkdir checkpoint best_model

第二步：导入相关库并创建辅助函数

导入库


   
    
     
      
     
     
      
       %matplotlib inline
      
     
    
     
      
     
     
      
       %config 
       InlineBackend.figure_format = 'retina'
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       import matplotlib.pyplot as plt
      
     
    
     
      
     
     
      
       import torch
      
     
    
     
      
     
     
      
       import shutil
      
     
    
     
      
     
     
      
       from torch 
       import nn
      
     
    
     
      
     
     
      
       from torch 
       import optim
      
     
    
     
      
     
     
      
       import torch.nn.functional as F
      
     
    
     
      
     
     
      
       from torchvision 
       import datasets, transforms
      
     
    
     
      
     
     
      
       import numpy as np


   
    
     
      
     
     
      
       # check if CUDA is available
      
     
    
     
      
     
     
      
       use_cuda = torch.cuda.is_available()

保存功能

save_ckp 是为了保存 ckpt 文件而创建的，它是最新的也是最好的。这创建了灵活性：您可能对最新 ckpt 的状态感兴趣，也可能对最好的 ckpt 感兴趣。


   
    
     
      
     
     
      
       def save_ckp(state, is_best, checkpoint_path, best_model_path):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        state: checkpoint we want to save
      
     
    
     
      
     
     
      
        is_best: is this the best checkpoint; min validation loss
      
     
    
     
      
     
     
      
        checkpoint_path: path to save checkpoint
      
     
    
     
      
     
     
      
        best_model_path: path to save best model
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
      
           f_path = checkpoint_path
      
     
    
     
      
     
     
          
       # save checkpoint data to the path given, checkpoint_path
      
     
    
     
      
     
     
      
           torch.save(state, f_path)
      
     
    
     
      
     
     
          
       # if it is a best model, min validation loss
      
     
    
     
      
     
     
          
       if is_best:
      
     
    
     
      
     
     
      
               best_fpath = best_model_path
      
     
    
     
      
     
     
              
       # copy that checkpoint file to best path given, best_model_path
      
     
    
     
      
     
     
      
               shutil.copyfile(f_path, best_fpath)

在我们的例子中，我们希望保存一个 ckpt，允许我们使用这些信息来继续我们的模型训练。以下是我们需要的信息：

epoch：所有训练向量用于更新权重的次数
valid_loss_min：最小的验证损失，这是必需的，以便在我们继续训练时，可以从此值开始，而不是从np.Inf值开始。
state_dict：模型架构信息。它包括每个图层的参数矩阵。
optimizer：需要保存优化器参数，特别是在使用 Adam 作为优化器时。Adam 是一个在线机机器学习率方法，也就是说，它为不同的参数计算个人的学习率，如果我们想继续我们的训练，我们就需要这些参数。

加载函数


   
    
     
      
     
     
      
       def load_ckp(checkpoint_fpath, model, optimizer):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        checkpoint_path: path to save checkpoint
      
     
    
     
      
     
     
      
        model: model that we want to load checkpoint parameters into 
      
     
    
     
      
     
     
      
        optimizer: optimizer we defined in previous training
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
          
       # load check point
      
     
    
     
      
     
     
      
           checkpoint = torch.load(checkpoint_fpath)
      
     
    
     
      
     
     
          
       # initialize state_dict from checkpoint to model
      
     
    
     
      
     
     
      
           model.load_state_dict(checkpoint[
       'state_dict'])
      
     
    
     
      
     
     
          
       # initialize optimizer from checkpoint to optimizer
      
     
    
     
      
     
     
      
           optimizer.load_state_dict(checkpoint[
       'optimizer'])
      
     
    
     
      
     
     
          
       # initialize valid_loss_min from checkpoint to valid_loss_min
      
     
    
     
      
     
     
      
           valid_loss_min = checkpoint[
       'valid_loss_min']
      
     
    
     
      
     
     
          
       # return model, optimizer, epoch value, min validation loss 
      
     
    
     
      
     
     
          
       return model, optimizer, checkpoint[
       'epoch'], valid_loss_min.item()

为加载模型创建 load_chkp。它需要：

被保存的 ckpt 的位置
要将状态加载到的模型实例
优化器

第三步：导入数据集 Fashion _MNIST_ data 并创建数据加载器


   
    
     
      
     
     
      
       # Define a transform to normalize the data
      
     
    
     
      
     
     
      
       transform = transforms.Compose([transforms.ToTensor(),
      
     
    
     
      
     
     
      
                                       transforms.Normalize((
       0.5, 
       0.5, 
       0.5), (
       0.5, 
       0.5, 
       0.5))])
      
     
    
     
      
     
     
      
       # Download and load the training data
      
     
    
     
      
     
     
      
       trainset = datasets.FashionMNIST(
       'F_MNIST_data/', download=
       True, train=
       True, transform=transform)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       # Download and load the test data
      
     
    
     
      
     
     
      
       testset = datasets.FashionMNIST(
       'F_MNIST_data/', download=
       True, train=
       False, transform=transform)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       loaders = {
      
     
    
     
      
     
     
          
       'train' : torch.utils.data.DataLoader(trainset,batch_size = 
       64,shuffle=
       True),
      
     
    
     
      
     
     
          
       'test'  : torch.utils.data.DataLoader(testset,batch_size = 
       64,shuffle=
       True),
      
     
    
     
      
     
     
      
       }

第四步：定义和创建模型


   
    
     
      
     
     
      
       # Define your network ( Simple Example )
      
     
    
     
      
     
     
      
       class FashionClassifier(nn.Module):
      
     
    
     
      
     
     
          
       def __init__(self):
      
     
    
     
      
     
     
              
       super().__init_
       _()
      
     
    
     
      
     
     
      
               input_size = 
       784
      
     
    
     
      
     
     
              
       self.fc1 = nn.Linear(input_size, 
       512)
      
     
    
     
      
     
     
              
       self.fc2 = nn.Linear(
       512, 
       256)
      
     
    
     
      
     
     
              
       self.fc3 = nn.Linear(
       256, 
       128)
      
     
    
     
      
     
     
              
       self.fc4 = nn.Linear(
       128, 
       64)
      
     
    
     
      
     
     
              
       self.fc5 = nn.Linear(
       64,
       10)
      
     
    
     
      
     
     
              
       self.dropout = nn.Dropout(p=
       0.
       2)
      
     
    
     
      
     
     
              
      
     
    
     
      
     
     
          
       def forward(self, x):
      
     
    
     
      
     
     
      
               x = x.view(x.shape[
       0], -
       1)
      
     
    
     
      
     
     
      
               x = 
       self.dropout(F.relu(
       self.fc1(x)))
      
     
    
     
      
     
     
      
               x = 
       self.dropout(F.relu(
       self.fc2(x)))
      
     
    
     
      
     
     
      
               x = 
       self.dropout(F.relu(
       self.fc3(x)))
      
     
    
     
      
     
     
      
               x = 
       self.dropout(F.relu(
       self.fc4(x)))
      
     
    
     
      
     
     
      
               x = F.log_softmax(
       self.fc5(x), dim=
       1)
      
     
    
     
      
     
     
              
       return x


   
    
     
      
     
     
      
       # Create the network, define the criterion and optimizer
      
     
    
     
      
     
     
      
       model = 
       FashionClassifier()
      
     
    
     
      
     
     
      
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       # move model to GPU if CUDA is available
      
     
    
     
      
     
     
      
       if 
       use_cuda:
      
     
    
     
      
     
     
          
       model = 
       model.cuda()
      
     
    
     
      
     
     
          
      
     
    
     
      
     
     
      
       print(model)

模型结构输出：


   
    
     
      
     
     
      
       FashionClassifier(
      
     
    
     
      
     
     
      
         (fc1): Linear(in_features=
       784, out_features=
       512, bias=
       True)
      
     
    
     
      
     
     
      
         (fc2): Linear(in_features=
       512, out_features=
       256, bias=
       True)
      
     
    
     
      
     
     
      
         (fc3): Linear(in_features=
       256, out_features=
       128, bias=
       True)
      
     
    
     
      
     
     
      
         (fc4): Linear(in_features=
       128, out_features=
       64, bias=
       True)
      
     
    
     
      
     
     
      
         (fc5): Linear(in_features=
       64, out_features=
       10, bias=
       True)
      
     
    
     
      
     
     
      
         (dropout): Dropout(p=
       0.2)
      
     
    
     
      
     
     
      
       )

第五步：训练网络并保存模型

训练函数使我们能够设置 epoch 值、学习率和其他参数。

定义损失函数和优化器

下面，我们将使用 Adam 优化器和交叉熵损失，因为我们将类别得分作为输出。我们计算损失并执行反向传播。


   
    
     
      
     
     
      
       #define loss function and optimizer
      
     
    
     
      
     
     
      
       criterion = nn.NLLLoss()
      
     
    
     
      
     
     
      
       optimizer = optim.Adam(model.parameters(), lr=
       0.001)

定义训练方法


   
    
     
      
     
     
      
       def train(start_epochs, n_epochs, valid_loss_min_input, loaders, model, optimizer, criterion, use_cuda, checkpoint_path, best_model_path):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        Keyword arguments:
      
     
    
     
      
     
     
      
        start_epochs -- the real part (default 0.0)
      
     
    
     
      
     
     
      
        n_epochs -- the imaginary part (default 0.0)
      
     
    
     
      
     
     
      
        valid_loss_min_input
      
     
    
     
      
     
     
      
        loaders
      
     
    
     
      
     
     
      
        model
      
     
    
     
      
     
     
      
        optimizer
      
     
    
     
      
     
     
      
        criterion
      
     
    
     
      
     
     
      
        use_cuda
      
     
    
     
      
     
     
      
        checkpoint_path
      
     
    
     
      
     
     
      
        best_model_path
      
     
    
     
      
     
     
      
        
      
     
    
     
      
     
     
      
        returns trained model
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
          
       # initialize tracker for minimum validation loss
      
     
    
     
      
     
     
      
           valid_loss_min = valid_loss_min_input 
      
     
    
     
      
     
     
          
      
     
    
     
      
     
     
          
       for epoch 
       in range(start_epochs, n_epochs+
       1):
      
     
    
     
      
     
     
              
       # initialize variables to monitor training and validation loss
      
     
    
     
      
     
     
      
               train_loss = 
       0.0
      
     
    
     
      
     
     
      
               valid_loss = 
       0.0
      
     
    
     
      
     
     
              
      
     
    
     
      
     
     
              
       ###################
      
     
    
     
      
     
     
              
       # train the model #
      
     
    
     
      
     
     
              
       ###################
      
     
    
     
      
     
     
      
               model.train()
      
     
    
     
      
     
     
              
       for batch_idx, (data, target) 
       in enumerate(loaders[
       'train']):
      
     
    
     
      
     
     
                  
       # move to GPU
      
     
    
     
      
     
     
                  
       if use_cuda:
      
     
    
     
      
     
     
      
                       data, target = data.cuda(), target.cuda()
      
     
    
     
      
     
     
                  
       ## find the loss and update the model parameters accordingly
      
     
    
     
      
     
     
                  
       # clear the gradients of all optimized variables
      
     
    
     
      
     
     
      
                   optimizer.zero_grad()
      
     
    
     
      
     
     
                  
       # forward pass: compute predicted outputs by passing inputs to the model
      
     
    
     
      
     
     
      
                   output = model(data)
      
     
    
     
      
     
     
                  
       # calculate the batch loss
      
     
    
     
      
     
     
      
                   loss = criterion(output, target)
      
     
    
     
      
     
     
                  
       # backward pass: compute gradient of the loss with respect to model parameters
      
     
    
     
      
     
     
      
                   loss.backward()
      
     
    
     
      
     
     
                  
       # perform a single optimization step (parameter update)
      
     
    
     
      
     
     
      
                   optimizer.step()
      
     
    
     
      
     
     
                  
       ## record the average training loss, using something like
      
     
    
     
      
     
     
                  
       ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
      
     
    
     
      
     
     
      
                   train_loss = train_loss + ((
       1 / (batch_idx + 
       1)) * (loss.data - train_loss))
      
     
    
     
      
     
     
              
      
     
    
     
      
     
     
              
       ###################### 
      
     
    
     
      
     
     
              
       # validate the model #
      
     
    
     
      
     
     
              
       ######################
      
     
    
     
      
     
     
      
               model.eval()
      
     
    
     
      
     
     
              
       for batch_idx, (data, target) 
       in enumerate(loaders[
       'test']):
      
     
    
     
      
     
     
                  
       # move to GPU
      
     
    
     
      
     
     
                  
       if use_cuda:
      
     
    
     
      
     
     
      
                       data, target = data.cuda(), target.cuda()
      
     
    
     
      
     
     
                  
       ## update the average validation loss
      
     
    
     
      
     
     
                  
       # forward pass: compute predicted outputs by passing inputs to the model
      
     
    
     
      
     
     
      
                   output = model(data)
      
     
    
     
      
     
     
                  
       # calculate the batch loss
      
     
    
     
      
     
     
      
                   loss = criterion(output, target)
      
     
    
     
      
     
     
                  
       # update average validation loss 
      
     
    
     
      
     
     
      
                   valid_loss = valid_loss + ((
       1 / (batch_idx + 
       1)) * (loss.data - valid_loss))
      
     
    
     
      
     
     
                  
      
     
    
     
      
     
     
              
       # calculate average losses
      
     
    
     
      
     
     
      
               train_loss = train_loss/len(loaders[
       'train'].dataset)
      
     
    
     
      
     
     
      
               valid_loss = valid_loss/len(loaders[
       'test'].dataset)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
              
       # print training/validation statistics 
      
     
    
     
      
     
     
      
               print(
       'Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
      
     
    
     
      
     
     
      
                   epoch, 
      
     
    
     
      
     
     
      
                   train_loss,
      
     
    
     
      
     
     
      
                   valid_loss
      
     
    
     
      
     
     
      
                   ))
      
     
    
     
      
     
     
              
      
     
    
     
      
     
     
              
       # create checkpoint variable and add important data
      
     
    
     
      
     
     
      
               checkpoint = {
      
     
    
     
      
     
     
                  
       'epoch': epoch + 
       1,
      
     
    
     
      
     
     
                  
       'valid_loss_min': valid_loss,
      
     
    
     
      
     
     
                  
       'state_dict': model.state_dict(),
      
     
    
     
      
     
     
                  
       'optimizer': optimizer.state_dict(),
      
     
    
     
      
     
     
      
               }
      
     
    
     
      
     
     
              
      
     
    
     
      
     
     
              
       # save checkpoint
      
     
    
     
      
     
     
      
               save_ckp(checkpoint, 
       False, checkpoint_path, best_model_path)
      
     
    
     
      
     
     
              
      
     
    
     
      
     
     
              
       ## TODO: save the model if validation loss has decreased
      
     
    
     
      
     
     
              
       if valid_loss <= valid_loss_min:
      
     
    
     
      
     
     
      
                   print(
       'Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(valid_loss_min,valid_loss))
      
     
    
     
      
     
     
                  
       # save checkpoint as best model
      
     
    
     
      
     
     
      
                   save_ckp(checkpoint, 
       True, checkpoint_path, best_model_path)
      
     
    
     
      
     
     
      
                   valid_loss_min = valid_loss
      
     
    
     
      
     
     
                  
      
     
    
     
      
     
     
          
       # return trained model
      
     
    
     
      
     
     
          
       return model

训练模型

trained_model = train(1, 3, np.Inf, loaders, model, optimizer, criterion, use_cuda, "./checkpoint/current_checkpoint.pt", "./best_model/best_model.pt")

输出：


   
    
     
      
     
     
      
       Epoch: 1  
       Training 
       Loss: 0
       .000010  
       Validation 
       Loss: 0
       .000044
      
     
    
     
      
     
     
      
       Validation 
       loss 
       decreased (
       inf 
       --> 0
       .000044).  
       Saving 
       model ...
      
     
    
     
      
     
     
      
       Epoch: 2  
       Training 
       Loss: 0
       .000007  
       Validation 
       Loss: 0
       .000040
      
     
    
     
      
     
     
      
       Validation 
       loss 
       decreased (0
       .000044 
       --> 0
       .000040).  
       Saving 
       model ...
      
     
    
     
      
     
     
      
       Epoch: 3  
       Training 
       Loss: 0
       .000007  
       Validation 
       Loss: 0
       .000040
      
     
    
     
      
     
     
      
       Validation 
       loss 
       decreased (0
       .000040 
       --> 0
       .000040).  
       Saving 
       model ...

让我们关注一下我们上面使用的几个参数：

start_epoch：训练 epoch 的起始值
n_epochs：用于设置训练的 epoch 的结束值
valid_loss_min_input = np.Inf
checkpoint_path：保存训练的最新 ckpt 状态的完整路径
best_model_path：保存训练的最佳 ckpt 状态的完整路径

验证是否保存了模型

列出 best_model 目录中的所有文件

%ls ./best_model/

输出：

best_model.pt

%ls ./checkpoint/

输出：

current_checkpoint.pt

第六步：加载模型

重构模型


   
    
     
      
     
     
      
       model = 
       FashionClassifier()
      
     
    
     
      
     
     
      
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       # move model to GPU if CUDA is available
      
     
    
     
      
     
     
      
       if 
       use_cuda:
      
     
    
     
      
     
     
          
       model = 
       model.cuda()
      
     
    
     
      
     
     
          
      
     
    
     
      
     
     
      
       print(model)

输出：


   
    
     
      
     
     
      
       FashionClassifier(
      
     
    
     
      
     
     
      
         (fc1): Linear(in_features=
       784, out_features=
       512, bias=
       True)
      
     
    
     
      
     
     
      
         (fc2): Linear(in_features=
       512, out_features=
       256, bias=
       True)
      
     
    
     
      
     
     
      
         (fc3): Linear(in_features=
       256, out_features=
       128, bias=
       True)
      
     
    
     
      
     
     
      
         (fc4): Linear(in_features=
       128, out_features=
       64, bias=
       True)
      
     
    
     
      
     
     
      
         (fc5): Linear(in_features=
       64, out_features=
       10, bias=
       True)
      
     
    
     
      
     
     
      
         (dropout): Dropout(p=
       0.2)
      
     
    
     
      
     
     
      
       )

定义优化器和检查点文件路径


   
    
     
      
     
     
      
       # define optimzer
      
     
    
     
      
     
     
      
       optimizer = optim.Adam(model.parameters(), lr=
       0.001)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       # define checkpoint saved path
      
     
    
     
      
     
     
      
       ckp_path = 
       "./checkpoint/current_checkpoint.pt"

使用 load_ckp 函数加载模型


   
    
     
      
     
     
      
       # load the saved checkpoint
      
     
    
     
      
     
     
      
       model, optimizer, start_epoch, valid_loss_min = load_ckp(ckp_path, model, optimizer)

我打印出了从 load_ckp 得到的值，以确保一切正确。


   
    
     
      
     
     
      
       print(
       "model = ", model)
      
     
    
     
      
     
     
      
       print(
       "optimizer = ", optimizer)
      
     
    
     
      
     
     
      
       print(
       "start_epoch = ", start_epoch)
      
     
    
     
      
     
     
      
       print(
       "valid_loss_min = ", valid_loss_min)
      
     
    
     
      
     
     
      
       print(
       "valid_loss_min = {:.6f}".format(valid_loss_min))

输出：


   
    
     
      
     
     
      
       model =  
       FashionClassifier(
      
     
    
     
      
     
     
        
       (fc1): 
       Linear(in_features=784, out_features=512, bias=True)
      
     
    
     
      
     
     
        
       (fc2): 
       Linear(in_features=512, out_features=256, bias=True)
      
     
    
     
      
     
     
        
       (fc3): 
       Linear(in_features=256, out_features=128, bias=True)
      
     
    
     
      
     
     
        
       (fc4): 
       Linear(in_features=128, out_features=64, bias=True)
      
     
    
     
      
     
     
        
       (fc5): 
       Linear(in_features=64, out_features=10, bias=True)
      
     
    
     
      
     
     
        
       (dropout): 
       Dropout(p=0.2)
      
     
    
     
      
     
     
      
       )
      
     
    
     
      
     
     
      
       optimizer =  
       Adam (
      
     
    
     
      
     
     
      
       Parameter 
       Group 0
      
     
    
     
      
     
     
          
       amsgrad: 
       False
      
     
    
     
      
     
     
          
       betas: 
       (0.9, 0.999)
      
     
    
     
      
     
     
          
       eps: 
       1e-08
      
     
    
     
      
     
     
          
       lr: 
       0.001
      
     
    
     
      
     
     
          
       weight_decay: 
       0
      
     
    
     
      
     
     
      
       )
      
     
    
     
      
     
     
      
       start_epoch =  
       4
      
     
    
     
      
     
     
      
       valid_loss_min =  
       3.952759288949892e-05
      
     
    
     
      
     
     
      
       valid_loss_min = 
       0.000040

加载所有需要的信息之后，我们也可以继续训练，从 epoch = 4开始。之前，我们把模型从1训练到3。

第七步：继续训练和/或推理

继续训练

我们可以继续使用训练函数来训练我们的模型，并提供我们从上面的 load_ckp 函数得到的 ckpt 值。

trained_model = train(start_epoch, 6, valid_loss_min, loaders, model, optimizer, criterion, use_cuda, "./checkpoint/current_checkpoint.pt", "./best_model/best_model.pt")

输出：


   
    
     
      
     
     
      
       Epoch: 4   
       Training 
       Loss: 0
       .000006   
       Validation 
       Loss: 0
       .000040
      
     
    
     
      
     
     
      
       Epoch: 5   
       Training 
       Loss: 0
       .000006   
       Validation 
       Loss: 0
       .000037
      
     
    
     
      
     
     
      
       Validation 
       loss 
       decreased (0
       .000040 
       --> 0
       .000037).  
       Saving 
       model ...
      
     
    
     
      
     
     
      
       Epoch: 6   
       Training 
       Loss: 0
       .000006   
       Validation 
       Loss: 0
       .000036
      
     
    
     
      
     
     
      
       Validation 
       loss 
       decreased (0
       .000037 
       --> 0
       .000036).  
       Saving 
       model ...

注意：epoch 现在从4开始到6结束 (start _ epoch = 4)
验证损失从上一个训练 ckpt 继续
在epoch = 3时，最小验证损失是0.000040
在这里，最小验证损失以0.000040开始，而不是 INF

模型推理

在运行推理之前，必须调用 model.eval()将 dropout 和 batch、 normalization 层设置为 evaluation 模式。不这样做将导致不一致的推论结果。

trained_model.eval()


   
    
     
      
     
     
      
       test_acc = 
       0.
       0
      
     
    
     
      
     
     
      
       for samples, labels in loaders[
       'test']:
      
     
    
     
      
     
     
      
           with torch.no_grad():
      
     
    
     
      
     
     
      
               samples, labels = samples.cuda(), labels.cuda()
      
     
    
     
      
     
     
      
               output = trained_model(samples)
      
     
    
     
      
     
     
              
       # calculate accuracy
      
     
    
     
      
     
     
      
               pred = torch.argmax(output, dim=
       1)
      
     
    
     
      
     
     
      
               correct = pred.e
       q(labels)
      
     
    
     
      
     
     
      
               test_acc += torch.mean(correct.float())
      
     
    
     
      
     
     
      
       print(
       'Accuracy of the network on {} test images: {}%'.format(len(testset), round(test_acc.item()*
       100.0/len(loaders[
       'test']), 
       2)))

输出：

Accuracy of the network on 10000 test images: 86.58%

在哪里可以找到 Kaggle 笔记本的输出/保存文件：

在你的 Kaggle 笔记本中，你可以向下滚动到页面的底部。前面的操作中保存了一些文件。

完整代码链接：https://www.kaggle.com/vortanasay/saving-loading-and-cont-training-model-in-pytorch

转载：https://blog.csdn.net/weixin_38739735/article/details/114317581

查看评论

飞道的博客

飞道的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章

使用 PyTorch 保存和加载模型 | 附完整代码

* 以上用户言论只代表其个人观点，不代表本网站的观点或立场