飞道的博客

深度理解卷积神经网络

368人阅读  评论(0)

神经网络包括卷积层,池化层,全连接层。一个最简单的神经元结构,假如有三个输入,都对应一个权重参数,然后通过权重加起来,经过一个激活函数,最后输出y。

CNN中独特的结构就是卷积层,就是一个卷积核然后根据步幅进行扫描运算,最后输出特征矩阵。卷积核的深度和输入特征矩阵的深度相同,而输出的特征矩阵深度和卷积核个数相同。如果加上偏移量bias的话,就在输出的特征矩阵进行相加减即可。

使用激活函数的目的是引用非线性因素,具备解决非线性的能力。主要有sigmoid激活函数和Relu激活函数。sigmoid激活函数饱和时梯度值非常小,当网络层较深时容易出现梯度消失。Relu激活函数,当反向传播过程中有一个非常大的梯度经过时,反向传播更新后可能导致权重分布中心小于零,该处的导数始终为零,反向传播无法更新权重,进入失活状态。

卷积后的矩阵大小计算公式为:

W代表图片大小,F是卷积核的大小,P是填充的像素数。S是步长。当卷积的时候越界的时候,可以用padding进行补0处理

池化层,我理解的就是个缩小矩阵,将特征图进行稀疏处理,减少运算量。可以进行取最大值池化,也可以取平均值池化。经过池化层,不改变深度,只改变高度和宽度,一般来说,poolsize和stride相同。

然后对于误差的计算,真是一个复杂的过程,还好计算机会帮我们计算的,推理过程比较麻烦。。。

反向传播的时候有一个交叉熵损失函数,对于多分类问题(softmax输出,所有的输出概率和为1),损失计算公式:

对于二分类问题(sigmoid输出,每个输出结点之间互不相干),计算公式如下:

Oi*是真实标签值,Oi为预测值。

反向传播的过程中权重会不断地更新,分批次进行求解,通常还需要一个优化器,比如SGD,Adam等,这些直接调用即可,使用优化器的目的就是使得网络更快的收敛。

为了深度理解反向传播过程,再完整推导一遍:

LeNet

LeNet基本结构就是一个5*5卷积层,padding为2,然后是2*2的池化层,步幅为2,再接一个5*5卷积层,再接一个2*2的池化层,步幅为2,最后是三个全连接层。


   
  1. import torch.nn as nn
  2. import torch.nn.functional as F
  3. # 定义类,初始化函数,继承于nn.Module
  4. class LeNet(nn.Module):
  5. def __init__( self):
  6. # 解决调用父类函数时可能出现的一系列问题
  7. super(LeNet, self).__init__()
  8. # 定义卷积层
  9. # 第一个参数是输入特征层的深度,16个卷积核,卷积核尺寸为5*5
  10. self.conv1 = nn.Conv2d( 3, 16, 5)
  11. # 池化层,池化核的大小2*2,步幅为2
  12. self.pool1 = nn.MaxPool2d( 2, 2)
  13. # 上一层是卷积核为16,所以输入深度应该是16。32个卷积核,卷积核尺寸为5*5
  14. self.conv2 = nn.Conv2d( 16, 32, 5)
  15. # 池化层
  16. self.pool2 = nn.MaxPool2d( 2, 2)
  17. # 三层全连接层
  18. # 全连接层需要把特征矩阵展平,32 * 5 * 5就是展平操作,120是全连接层的结点个数
  19. self.fc1 = nn.Linear( 32 * 5 * 5, 120)
  20. self.fc2 = nn.Linear( 120, 84)
  21. # 最后一层设置为10,根据训练集的类别个数来定义
  22. self.fc3 = nn.Linear( 84, 10)
  23. # 正向传播过程
  24. def forward( self, x):
  25. # 经过卷积之后的矩阵尺寸大小计算公式N=(W-F+2P)/S+1,输入图片大小W*W,F是卷积核的大小,padding的像素数p,s是步长
  26. # 假如传入第一个卷积层input(3, 32, 32) output(16, 28, 28),输出深度为16,卷积后的矩阵大小为32*32
  27. x = F.relu(self.conv1(x))
  28. # 深度不变,高度宽度改变为原来的一半
  29. # output(16, 14, 14)
  30. x = self.pool1(x)
  31. # relu激活函数
  32. x = F.relu(self.conv2(x)) # output(32, 10, 10)
  33. x = self.pool2(x) # output(32, 5, 5)
  34. # view代表展成一维向量,-1是第一个维度,是自动推理的
  35. x = x.view(- 1, 32 * 5 * 5) # output(32*5*5)
  36. x = F.relu(self.fc1(x)) # output(120)
  37. x = F.relu(self.fc2(x)) # output(84)
  38. # 全连接层3没用激活函数,理论应该接一个激活函数,但是在计算交叉熵损失函数时,实现了一个softmax方法,这里就不用定义了
  39. x = self.fc3(x) # output(10)
  40. return x
  41. import torch
  42. # batch为32,深度为3,高度32,宽度32
  43. input = torch.rand([ 32, 3, 32, 32])
  44. # 实例化模型
  45. model = LeNet()
  46. # 打印模型
  47. print(model)
  48. # 前向传播
  49. output = model( input)
  50. # nn.Conv2d
  51. # def __init__(
  52. # self,
  53. # in_channels: int,//深度
  54. # out_channels: int,//代表卷积核的个数,使用几个卷积核,生成深度多少维的特征矩阵
  55. # kernel_size: _size_2_t,//代表卷积核的大小
  56. # stride: _size_2_t = 1,//步距
  57. # padding: Union[str, _size_2_t] = 0,//四周补零处理
  58. # dilation: _size_2_t = 1,
  59. # groups: int = 1,
  60. # bias: bool = True,//偏置,默认是使用的
  61. # padding_mode: str = 'zeros', # TODO: refine this type
  62. # device=None,
  63. # dtype=None
  64. # ) -> None:
  65. # MaxPool2d
  66. # def __init__(
  67. # self,
  68. # in_channels: int,
  69. # out_channels: int,
  70. # kernel_size: _size_2_t,池化核的大小
  71. # stride: _size_2_t = 1,步距
  72. # padding: Union[str, _size_2_t] = 0,
  73. # dilation: _size_2_t = 1,
  74. # groups: int = 1,
  75. # bias: bool = True,
  76. # padding_mode: str = 'zeros', # TODO: refine this type
  77. # device=None,
  78. # dtype=None
  79. # ) -> None:

AlexNet

AlexNet网络的结构还是有一点点复杂的,先是11*11的卷积层,步幅为4,然后3*3的池化层,卷积层,池化层,然后又接着3个卷积层,再接一个池化层,最后是三个全连接层。

AlexNet网络的优点在于,使用了ReLU激活函数,而不是传统的Sigmoid激活函数以及Tanh激活函数,在全连接层的前两层中使用了Dropout,进行随机失活神经元,减少过拟合,就是在正向传播的时候随机失活一部分神经元。

过拟合的原因往往是特征维度过多,模型假设过于复杂,参数多,训练数据少,噪声过多,过度的拟合了训练数据,而没有考虑泛化能力。

还有就是预测的时候记得别把图片路径写错就行了,写绝对路径记得写\\,不要写成\,这样怎么样都不会出错。


   
  1. import torch.nn as nn
  2. import torch
  3. class AlexNet(nn.Module):
  4. def __init__( self, num_classes=1000, init_weights=False):
  5. super(AlexNet, self).__init__()
  6. # 网络多时,可以定义nn.Sequential
  7. self.features = nn.Sequential( # 将一系列的层结构打包
  8. nn.Conv2d( 3, 48, kernel_size= 11, stride= 4, padding= 2), # input[3, 224, 224] output[48, 55, 55]
  9. nn.ReLU(inplace= True), # 代表载入更大的模型
  10. nn.MaxPool2d(kernel_size= 3, stride= 2), # output[48, 27, 27]
  11. nn.Conv2d( 48, 128, kernel_size= 5, padding= 2), # output[128, 27, 27]
  12. nn.ReLU(inplace= True),
  13. nn.MaxPool2d(kernel_size= 3, stride= 2), # output[128, 13, 13]
  14. nn.Conv2d( 128, 192, kernel_size= 3, padding= 1), # output[192, 13, 13]
  15. nn.ReLU(inplace= True),
  16. nn.Conv2d( 192, 192, kernel_size= 3, padding= 1), # output[192, 13, 13]
  17. nn.ReLU(inplace= True),
  18. nn.Conv2d( 192, 128, kernel_size= 3, padding= 1), # output[128, 13, 13]
  19. nn.ReLU(inplace= True),
  20. nn.MaxPool2d(kernel_size= 3, stride= 2), # output[128, 6, 6]
  21. )
  22. self.classifier = nn.Sequential(
  23. nn.Dropout(p= 0.5), # p随机失活的比例
  24. nn.Linear( 128 * 6 * 6, 2048),
  25. nn.ReLU(inplace= True),
  26. nn.Dropout(p= 0.5),
  27. nn.Linear( 2048, 2048), # 第一个2048是上一层的输出,第二个是这一层的结点个数
  28. nn.ReLU(inplace= True),
  29. nn.Linear( 2048, num_classes), # num_classes:输出是数据集类别的个数
  30. )
  31. if init_weights:
  32. self._initialize_weights()
  33. def forward( self, x):
  34. x = self.features(x)
  35. x = torch.flatten(x, start_dim= 1) # 展平
  36. x = self.classifier(x) # 传入全连接层
  37. return x
  38. def _initialize_weights( self):
  39. for m in self.modules(): # 迭代每一层结构
  40. if isinstance(m, nn.Conv2d):
  41. nn.init.kaiming_normal_(m.weight, mode= 'fan_out', nonlinearity= 'relu')
  42. if m.bias is not None:
  43. nn.init.constant_(m.bias, 0)
  44. elif isinstance(m, nn.Linear):
  45. nn.init.normal_(m.weight, 0, 0.01)
  46. nn.init.constant_(m.bias, 0)

VGG

VGG网络,算是一个替代的思想吧,比如可以堆叠两个3*3的卷积核替代5*5的卷积核,堆叠三个3*3的卷积核替代7*7的卷积核,也可以减少参数量。

以VGG-16为例,先是2个卷积层,每个有64个卷积核,再接池化层,然后接2个卷积层,128个卷积核,池化层,3个卷积层,256个卷积核,池化层,三个卷积层,512个卷积核,池化层,然后3个卷积层,512个卷积核,再接池化层,最后三个全连接层。这个网络还是比较大的,用GPU跑也得好久。


   
  1. import torch.nn as nn
  2. import torch
  3. class VGG(nn.Module):
  4. def __init__( self, features, num_classes=1000, init_weights=False):
  5. super(VGG, self).__init__()
  6. self.features = features
  7. self.classifier = nn.Sequential(
  8. nn.Linear( 512 * 7 * 7, 4096),
  9. nn.ReLU( True),
  10. nn.Dropout(p= 0.5), # 随机失活
  11. nn.Linear( 4096, 4096),
  12. nn.ReLU( True),
  13. nn.Dropout(p= 0.5),
  14. nn.Linear( 4096, num_classes)
  15. )
  16. if init_weights:
  17. self._initialize_weights()
  18. def forward( self, x):
  19. # N x 3 x 224 x 224
  20. x = self.features(x)
  21. # N x 512 x 7 x 7
  22. x = torch.flatten(x, start_dim= 1) # 展平处理
  23. # N x 512*7*7
  24. x = self.classifier(x)
  25. return x
  26. def _initialize_weights( self):
  27. for m in self.modules():
  28. if isinstance(m, nn.Conv2d):
  29. # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
  30. nn.init.xavier_uniform_(m.weight)
  31. if m.bias is not None:
  32. nn.init.constant_(m.bias, 0)
  33. elif isinstance(m, nn.Linear):
  34. nn.init.xavier_uniform_(m.weight)
  35. # nn.init.normal_(m.weight, 0, 0.01)
  36. nn.init.constant_(m.bias, 0)
  37. # make_features生成提取特征网络结构
  38. def make_features( cfg: list): # 配置列表
  39. layers = [] # 存放创建的每一层结构
  40. in_channels = 3 # 输入的是RGB彩色通道,所以是3
  41. for v in cfg:
  42. if v == "M": # 说明是池化层
  43. layers += [nn.MaxPool2d(kernel_size= 2, stride= 2)]
  44. else:
  45. conv2d = nn.Conv2d(in_channels, v, kernel_size= 3, padding= 1)
  46. layers += [conv2d, nn.ReLU( True)]
  47. in_channels = v
  48. return nn.Sequential(*layers)
  49. def vgg( model_name="vgg16", **kwargs):
  50. assert model_name in cfgs, "Warning: model number {} not in cfgs dict!". format(model_name)
  51. cfg = cfgs[model_name]
  52. model = VGG(make_features(cfg), **kwargs)
  53. return model

GoogLeNet

GoogLeNet网络的优点在于,引入了Inception结构,使用1*1的卷积核进行降维以及映射处理,而且添加了两个辅助分类器帮助训练,丢弃了全连接层,使用平均池化层,GoogLeNet有三个输出层。


   
  1. import torch.nn as nn
  2. import torch
  3. import torch.nn.functional as F
  4. class GoogLeNet(nn.Module):
  5. def __init__( self, num_classes=1000, aux_logits=True, init_weights=False):
  6. super(GoogLeNet, self).__init__()
  7. self.aux_logits = aux_logits
  8. self.conv1 = BasicConv2d( 3, 64, kernel_size= 7, stride= 2, padding= 3)
  9. self.maxpool1 = nn.MaxPool2d( 3, stride= 2, ceil_mode= True)
  10. self.conv2 = BasicConv2d( 64, 64, kernel_size= 1)
  11. self.conv3 = BasicConv2d( 64, 192, kernel_size= 3, padding= 1)
  12. self.maxpool2 = nn.MaxPool2d( 3, stride= 2, ceil_mode= True)
  13. self.inception3a = Inception( 192, 64, 96, 128, 16, 32, 32)
  14. self.inception3b = Inception( 256, 128, 128, 192, 32, 96, 64)
  15. self.maxpool3 = nn.MaxPool2d( 3, stride= 2, ceil_mode= True)
  16. self.inception4a = Inception( 480, 192, 96, 208, 16, 48, 64)
  17. self.inception4b = Inception( 512, 160, 112, 224, 24, 64, 64)
  18. self.inception4c = Inception( 512, 128, 128, 256, 24, 64, 64)
  19. self.inception4d = Inception( 512, 112, 144, 288, 32, 64, 64)
  20. self.inception4e = Inception( 528, 256, 160, 320, 32, 128, 128)
  21. self.maxpool4 = nn.MaxPool2d( 3, stride= 2, ceil_mode= True)
  22. self.inception5a = Inception( 832, 256, 160, 320, 32, 128, 128)
  23. self.inception5b = Inception( 832, 384, 192, 384, 48, 128, 128)
  24. if self.aux_logits:
  25. self.aux1 = InceptionAux( 512, num_classes)
  26. self.aux2 = InceptionAux( 528, num_classes)
  27. self.avgpool = nn.AdaptiveAvgPool2d(( 1, 1))
  28. self.dropout = nn.Dropout( 0.4)
  29. self.fc = nn.Linear( 1024, num_classes)
  30. if init_weights:
  31. self._initialize_weights()
  32. def forward( self, x):
  33. # N x 3 x 224 x 224
  34. x = self.conv1(x)
  35. # N x 64 x 112 x 112
  36. x = self.maxpool1(x)
  37. # N x 64 x 56 x 56
  38. x = self.conv2(x)
  39. # N x 64 x 56 x 56
  40. x = self.conv3(x)
  41. # N x 192 x 56 x 56
  42. x = self.maxpool2(x)
  43. # N x 192 x 28 x 28
  44. x = self.inception3a(x)
  45. # N x 256 x 28 x 28
  46. x = self.inception3b(x)
  47. # N x 480 x 28 x 28
  48. x = self.maxpool3(x)
  49. # N x 480 x 14 x 14
  50. x = self.inception4a(x)
  51. # N x 512 x 14 x 14
  52. if self.training and self.aux_logits: # eval model lose this layer
  53. aux1 = self.aux1(x)
  54. x = self.inception4b(x)
  55. # N x 512 x 14 x 14
  56. x = self.inception4c(x)
  57. # N x 512 x 14 x 14
  58. x = self.inception4d(x)
  59. # N x 528 x 14 x 14
  60. if self.training and self.aux_logits: # eval model lose this layer
  61. aux2 = self.aux2(x)
  62. x = self.inception4e(x)
  63. # N x 832 x 14 x 14
  64. x = self.maxpool4(x)
  65. # N x 832 x 7 x 7
  66. x = self.inception5a(x)
  67. # N x 832 x 7 x 7
  68. x = self.inception5b(x)
  69. # N x 1024 x 7 x 7
  70. x = self.avgpool(x)
  71. # N x 1024 x 1 x 1
  72. x = torch.flatten(x, 1)
  73. # N x 1024
  74. x = self.dropout(x)
  75. x = self.fc(x)
  76. # N x 1000 (num_classes)
  77. if self.training and self.aux_logits: # eval model lose this layer
  78. return x, aux2, aux1
  79. return x
  80. def _initialize_weights( self):
  81. for m in self.modules():
  82. if isinstance(m, nn.Conv2d):
  83. nn.init.kaiming_normal_(m.weight, mode= 'fan_out', nonlinearity= 'relu')
  84. if m.bias is not None:
  85. nn.init.constant_(m.bias, 0)
  86. elif isinstance(m, nn.Linear):
  87. nn.init.normal_(m.weight, 0, 0.01)
  88. nn.init.constant_(m.bias, 0)
  89. class Inception(nn.Module):
  90. def __init__( self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
  91. super(Inception, self).__init__()
  92. self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size= 1)
  93. self.branch2 = nn.Sequential(
  94. BasicConv2d(in_channels, ch3x3red, kernel_size= 1),
  95. BasicConv2d(ch3x3red, ch3x3, kernel_size= 3, padding= 1) # 保证输出大小等于输入大小
  96. )
  97. self.branch3 = nn.Sequential(
  98. BasicConv2d(in_channels, ch5x5red, kernel_size= 1),
  99. # 在官方的实现中,其实是3x3的kernel并不是5x5,这里我也懒得改了,具体可以参考下面的issue
  100. # Please see https://github.com/pytorch/vision/issues/906 for details.
  101. BasicConv2d(ch5x5red, ch5x5, kernel_size= 5, padding= 2) # 保证输出大小等于输入大小
  102. )
  103. self.branch4 = nn.Sequential(
  104. nn.MaxPool2d(kernel_size= 3, stride= 1, padding= 1),
  105. BasicConv2d(in_channels, pool_proj, kernel_size= 1)
  106. )
  107. def forward( self, x):
  108. branch1 = self.branch1(x)
  109. branch2 = self.branch2(x)
  110. branch3 = self.branch3(x)
  111. branch4 = self.branch4(x)
  112. outputs = [branch1, branch2, branch3, branch4]
  113. return torch.cat(outputs, 1)
  114. class InceptionAux(nn.Module):
  115. def __init__( self, in_channels, num_classes):
  116. super(InceptionAux, self).__init__()
  117. self.averagePool = nn.AvgPool2d(kernel_size= 5, stride= 3)
  118. self.conv = BasicConv2d(in_channels, 128, kernel_size= 1) # output[batch, 128, 4, 4]
  119. self.fc1 = nn.Linear( 2048, 1024)
  120. self.fc2 = nn.Linear( 1024, num_classes)
  121. def forward( self, x):
  122. # aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14
  123. x = self.averagePool(x)
  124. # aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4
  125. x = self.conv(x)
  126. # N x 128 x 4 x 4
  127. x = torch.flatten(x, 1)
  128. x = F.dropout(x, 0.5, training=self.training)
  129. # N x 2048
  130. x = F.relu(self.fc1(x), inplace= True)
  131. x = F.dropout(x, 0.5, training=self.training)
  132. # N x 1024
  133. x = self.fc2(x)
  134. # N x num_classes
  135. return x
  136. class BasicConv2d(nn.Module):
  137. def __init__( self, in_channels, out_channels, **kwargs):
  138. super(BasicConv2d, self).__init__()
  139. self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
  140. self.relu = nn.ReLU(inplace= True)
  141. def forward( self, x):
  142. x = self.conv(x)
  143. x = self.relu(x)
  144. return x

ResNet

到目前为止,我觉得这个网络是所有网络中最厉害的一个,迭代一次精度就到了90%多,最高94%左右,也可以利用迁移学习加速进行训练。

首先它的网络结构可以突破一百层,运用了残差块的思想,丢弃dropout,使用Batch Normalization加速训练。

18层和34层的结构,在conv2_x这一层,没有经过一个1*1的卷积层,通常使用实线直接标注,后面层的第一个残差块,都是用了一个1*1的卷积核,得到我们想要的维数。

50层,101层以及152层的结构,第一个残差块都是用了一个1*1的卷积核,注意的是,conv2_x这一层对应的1*1卷积层只改变了深度,高宽没变,接下来的几层不仅深度改变,高度宽度都改变。

运用迁移学习,可以快速的训练出一个理想的结果,当数据集较小时也能训练出理想的结果。常见的迁移学习方式有载入权重后训练所有参数,或者载入权重后只训练最后几层参数,还有就是载入权重后在原网络基础上再添加一层全连接层,仅训练一个全连接层。

还有一个ResNeXt网络,这个网络是对ResNet的一个改进,但是训练的时候,我感觉好像有点。。。基本是一个groups分组的思想,能减少一部分参数,降低错误率。


   
  1. import torch.nn as nn
  2. import torch
  3. # 18层,34层
  4. class BasicBlock(nn.Module):
  5. expansion = 1
  6. def __init__( self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
  7. super(BasicBlock, self).__init__()
  8. self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
  9. kernel_size= 3, stride=stride, padding= 1, bias= False)
  10. self.bn1 = nn.BatchNorm2d(out_channel)
  11. self.relu = nn.ReLU()
  12. self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
  13. kernel_size= 3, stride= 1, padding= 1, bias= False)
  14. self.bn2 = nn.BatchNorm2d(out_channel)
  15. self.downsample = downsample
  16. def forward( self, x):
  17. identity = x
  18. if self.downsample is not None:
  19. identity = self.downsample(x)
  20. out = self.conv1(x)
  21. out = self.bn1(out)
  22. out = self.relu(out)
  23. out = self.conv2(out)
  24. out = self.bn2(out)
  25. out += identity
  26. out = self.relu(out)
  27. return out
  28. class ResNet(nn.Module):
  29. def __init__( self,
  30. block,
  31. blocks_num,
  32. num_classes=1000,
  33. include_top=True,
  34. groups=1,
  35. width_per_group=64):
  36. super(ResNet, self).__init__()
  37. self.include_top = include_top
  38. self.in_channel = 64
  39. self.groups = groups
  40. self.width_per_group = width_per_group
  41. self.conv1 = nn.Conv2d( 3, self.in_channel, kernel_size= 7, stride= 2,
  42. padding= 3, bias= False)
  43. self.bn1 = nn.BatchNorm2d(self.in_channel)
  44. self.relu = nn.ReLU(inplace= True)
  45. self.maxpool = nn.MaxPool2d(kernel_size= 3, stride= 2, padding= 1)
  46. self.layer1 = self._make_layer(block, 64, blocks_num[ 0])
  47. self.layer2 = self._make_layer(block, 128, blocks_num[ 1], stride= 2)
  48. self.layer3 = self._make_layer(block, 256, blocks_num[ 2], stride= 2)
  49. self.layer4 = self._make_layer(block, 512, blocks_num[ 3], stride= 2)
  50. if self.include_top:
  51. self.avgpool = nn.AdaptiveAvgPool2d(( 1, 1)) # output size = (1, 1)
  52. self.fc = nn.Linear( 512 * block.expansion, num_classes)
  53. for m in self.modules():
  54. if isinstance(m, nn.Conv2d):
  55. nn.init.kaiming_normal_(m.weight, mode= 'fan_out', nonlinearity= 'relu')
  56. def _make_layer( self, block, channel, block_num, stride=1):
  57. downsample = None
  58. if stride != 1 or self.in_channel != channel * block.expansion:
  59. downsample = nn.Sequential(
  60. nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size= 1, stride=stride, bias= False),
  61. nn.BatchNorm2d(channel * block.expansion))
  62. layers = []
  63. layers.append(block(self.in_channel,
  64. channel,
  65. downsample=downsample,
  66. stride=stride,
  67. groups=self.groups,
  68. width_per_group=self.width_per_group))
  69. self.in_channel = channel * block.expansion
  70. for _ in range( 1, block_num):
  71. layers.append(block(self.in_channel,
  72. channel,
  73. groups=self.groups,
  74. width_per_group=self.width_per_group))
  75. return nn.Sequential(*layers)
  76. def forward( self, x):
  77. x = self.conv1(x)
  78. x = self.bn1(x)
  79. x = self.relu(x)
  80. x = self.maxpool(x)
  81. x = self.layer1(x)
  82. x = self.layer2(x)
  83. x = self.layer3(x)
  84. x = self.layer4(x)
  85. if self.include_top:
  86. x = self.avgpool(x)
  87. x = torch.flatten(x, 1)
  88. x = self.fc(x)
  89. return x
  90. # 3463代表残差结构的个数
  91. def resnet34( num_classes=1000, include_top=True):
  92. # https://download.pytorch.org/models/resnet34-333f7ec4.pth
  93. return ResNet(BasicBlock, [ 3, 4, 6, 3], num_classes=num_classes, include_top=include_top)

MobileNetV1、V2、V3

MobileNetV1网络的亮点主要是采用DW卷积,增加超参数α和β,这俩参数是人为设定的,虽然准确率稍微减少了一点,但是模型参数少了很多。

MobileNetV1网络采用了到残差结构,准确率更高,模型更小。MobileNetV3网络采用了更进一步的更新,加入了注意力机制,更新了激活函数等等。MobileNetV2实现如下。


   
  1. from torch import nn
  2. import torch
  3. class ConvBNReLU(nn.Sequential):
  4. def __init__( self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
  5. padding = (kernel_size - 1) // 2
  6. super(ConvBNReLU, self).__init__(
  7. nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias= False),
  8. nn.BatchNorm2d(out_channel),
  9. nn.ReLU6(inplace= True)
  10. )
  11. class InvertedResidual(nn.Module):
  12. def __init__( self, in_channel, out_channel, stride, expand_ratio):
  13. super(InvertedResidual, self).__init__()
  14. hidden_channel = in_channel * expand_ratio
  15. self.use_shortcut = stride == 1 and in_channel == out_channel
  16. layers = []
  17. if expand_ratio != 1:
  18. # 1x1 pointwise conv
  19. layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size= 1))
  20. layers.extend([
  21. # 3x3 depthwise conv
  22. ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
  23. # 1x1 pointwise conv(linear)
  24. nn.Conv2d(hidden_channel, out_channel, kernel_size= 1, bias= False),
  25. nn.BatchNorm2d(out_channel),
  26. ])
  27. self.conv = nn.Sequential(*layers)
  28. def forward( self, x):
  29. if self.use_shortcut:
  30. return x + self.conv(x)
  31. else:
  32. return self.conv(x)
  33. class MobileNetV2(nn.Module):
  34. def __init__( self, num_classes=1000, alpha=1.0, round_nearest=8):
  35. super(MobileNetV2, self).__init__()
  36. block = InvertedResidual
  37. input_channel = _make_divisible( 32 * alpha, round_nearest)
  38. last_channel = _make_divisible( 1280 * alpha, round_nearest)
  39. inverted_residual_setting = [
  40. # t, c, n, s
  41. [ 1, 16, 1, 1],
  42. [ 6, 24, 2, 2],
  43. [ 6, 32, 3, 2],
  44. [ 6, 64, 4, 2],
  45. [ 6, 96, 3, 1],
  46. [ 6, 160, 3, 2],
  47. [ 6, 320, 1, 1],
  48. ]
  49. features = []
  50. # conv1 layer
  51. features.append(ConvBNReLU( 3, input_channel, stride= 2))
  52. # building inverted residual residual blockes
  53. for t, c, n, s in inverted_residual_setting:
  54. output_channel = _make_divisible(c * alpha, round_nearest)
  55. for i in range(n):
  56. stride = s if i == 0 else 1
  57. features.append(block(input_channel, output_channel, stride, expand_ratio=t))
  58. input_channel = output_channel
  59. # building last several layers
  60. features.append(ConvBNReLU(input_channel, last_channel, 1))
  61. # combine feature layers
  62. self.features = nn.Sequential(*features)
  63. # building classifier
  64. self.avgpool = nn.AdaptiveAvgPool2d(( 1, 1))
  65. self.classifier = nn.Sequential(
  66. nn.Dropout( 0.2),
  67. nn.Linear(last_channel, num_classes)
  68. )
  69. # weight initialization
  70. for m in self.modules():
  71. if isinstance(m, nn.Conv2d):
  72. nn.init.kaiming_normal_(m.weight, mode= 'fan_out')
  73. if m.bias is not None:
  74. nn.init.zeros_(m.bias)
  75. elif isinstance(m, nn.BatchNorm2d):
  76. nn.init.ones_(m.weight)
  77. nn.init.zeros_(m.bias)
  78. elif isinstance(m, nn.Linear):
  79. nn.init.normal_(m.weight, 0, 0.01)
  80. nn.init.zeros_(m.bias)
  81. def forward( self, x):
  82. x = self.features(x)
  83. x = self.avgpool(x)
  84. x = torch.flatten(x, 1)
  85. x = self.classifier(x)
  86. return x

卷积神经网络算是正式完结了,争取下周把目标检测也尽快弄清楚hhh。


转载:https://blog.csdn.net/weixin_63967970/article/details/128721489
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场