【目标检测】英雄联盟能用YOLOv5实时目标检测了 支持onnx推理






        这篇文章主要介绍原理,使用方式请跳转另一篇文章:【目标检测】英雄联盟能用YOLOv5实时目标检测了 支持onnx推理

        项目链接:     https://github.com/oaifaye/dcmyolo




模型 通道数系数 深度系数 通道数[bc] 深度[Depth]
Yolov5n 0.25 0.33 16 1
Yolov5s 0.50 0.33 32 1
Yolov5m 0.75 0.67 48 2
Yolov5l 1.00 1.00 64 3
Yolov5x 1.25 1.33 80 4







        [cls]分类数=3        ​​​​​​











        最原始的CSPDarknet激活函数是LeakyReLU,后期改为SiLU。SiLU是Sigmoid和LeakyReLU的改进版。SiLU具备无上界有下界、平滑、非单调的特性。SiLU在深层模型上的效果优于 LeakyReLU。可以看做是平滑的ReLU激活函数。





  1. class C3(nn.Module):
  2. # CSP Bottleneck with 3 convolutions
  3. def __init__( self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
  4. super(C3, self).__init__()
  5. c_ = int(c2 * e) # hidden channels
  6. self.cv1 = Conv(c1, c_, 1, 1)
  7. self.cv2 = Conv(c1, c_, 1, 1)
  8. self.cv3 = Conv( 2 * c_, c2, 1) # act=FReLU(c2)
  9. self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e= 1.0) for _ in range(n)])
  10. # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])
  11. def forward( self, x):
  12. return self.cv3(torch.cat(
  13. (
  14. self.m(self.cv1(x)),
  15. self.cv2(x)
  16. )
  17. , dim= 1))

3.SPPF(Spatial Pyramid Pooling - Fast)






  1. class SPPF(nn.Module):
  2. # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
  3. def __init__( self, c1, c2, k=5): # equivalent to SPP(k=(5, 9, 13))
  4. super().__init__()
  5. c_ = c1 // 2 # hidden channels
  6. self.cv1 = Conv(c1, c_, 1, 1)
  7. self.cv2 = Conv(c_ * 4, c2, 1, 1)
  8. self.m = nn.MaxPool2d(kernel_size=k, stride= 1, padding=k // 2)
  9. def forward( self, x):
  10. x = self.cv1(x)
  11. with warnings.catch_warnings():
  12. warnings.simplefilter( 'ignore') # suppress torch 1.9.0 max_pool2d() warning
  13. y1 = self.m(x)
  14. y2 = self.m(y1)
  15. return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))





        PAN结构是在FPN的基础上引入了 Bottom-up path augmentation 结构,不仅会对特征进行上采样实现特征融合,还会对特征再次进行下采样实现特征融合。

        FPN主要是通过融合高低层特征提升目标检测的效果,尤其可以提高小尺寸目标的检测效果。Bottom-up path augmentation结构可以充分利用网络浅特征进行分割,网络浅层特征信息对于目标检测非常重要,因为目标检测是像素级别的分类浅层特征多是边缘形状等特征。PAN FPN 的基础上加了一个自底向上方向的增强,使得顶层 feature map 也可以享受到底层带来的丰富的位置信息,从而提升了大物体的检测效果。




        Head没有太大变化,还是比较传统的YoloHead(用YOLOvX的话是耦合头:Couped Head)。



        24 = (4+1+3)*3,前4个参数用于判断每一个特征点的回归参数,分别对应着先验框中心坐标在xy方向的偏移和预测框高宽;第5个参数预测否包含物体;最后3个参数预测每个类别的概率;*3是因为9个预设anchors分3组,每组有3个先验框。




  1. class YoloBody(nn.Module):
  2. def __init__( self, anchors_mask, num_classes, phi, anchors=None, input_shape=None, backbone_model_dir='', need_detect_box=False):
  3. super(YoloBody, self).__init__()
  4. depth_dict = { 'n': 0.33, 's': 0.33, 'm': 0.67, 'l': 1.00, 'x': 1.33}
  5. width_dict = { 'n': 0.25, 's': 0.50, 'm': 0.75, 'l': 1.00, 'x': 1.25}
  6. dep_mul, wid_mul = depth_dict[phi], width_dict[phi]
  7. base_channels = int(wid_mul * 64) # 64
  8. base_depth = max( round(dep_mul * 3), 1) # 3
  9. #-----------------------------------------------#
  10. # 输入图片是3, 640, 640
  11. # 初始的基本通道是64
  12. #-----------------------------------------------#
  13. self.backbone = self._get_backbone(base_channels, base_depth, phi, backbone_model_dir)
  14. self.upsample = nn.Upsample(scale_factor= 2, mode= "nearest")
  15. self.conv_for_feat3 = Conv(base_channels * 16, base_channels * 8, 1, 1)
  16. self.conv3_for_upsample1 = C3(base_channels * 16, base_channels * 8, base_depth, shortcut= False)
  17. self.conv_for_feat2 = Conv(base_channels * 8, base_channels * 4, 1, 1)
  18. self.conv3_for_upsample2 = C3(base_channels * 8, base_channels * 4, base_depth, shortcut= False)
  19. self.down_sample1 = Conv(base_channels * 4, base_channels * 4, 3, 2)
  20. self.conv3_for_downsample1 = C3(base_channels * 8, base_channels * 8, base_depth, shortcut= False)
  21. self.down_sample2 = Conv(base_channels * 8, base_channels * 8, 3, 2)
  22. self.conv3_for_downsample2 = C3(base_channels * 16, base_channels * 16, base_depth, shortcut= False)
  23. # 256, 80, 80 => 3 * (5 + num_classes), 80, 80
  24. self.yolo_head_P3 = nn.Conv2d(base_channels * 4, len(anchors_mask[ 2]) * ( 5 + num_classes), 1)
  25. # 512, 40, 40 => 3 * (5 + num_classes), 40, 40
  26. self.yolo_head_P4 = nn.Conv2d(base_channels * 8, len(anchors_mask[ 1]) * ( 5 + num_classes), 1)
  27. # 1024, 20, 20, => 3 * (5 + num_classes), 20, 20
  28. self.yolo_head_P5 = nn.Conv2d(base_channels * 16, len(anchors_mask[ 0]) * ( 5 + num_classes), 1)
  29. self.need_detect_box = need_detect_box
  30. if need_detect_box:
  31. self.detectBox = DetectBox(anchors, num_classes, input_shape)
  32. def _get_backbone( self, channels, depth, phi, backbone_model_dir):
  33. """
  34. 初始化backbone
  35. Returns
  36. -------
  37. """
  38. backbone_model_path = os.path.join(backbone_model_dir, 'cspdarknet_'+phi+ '_backbone.pth')
  39. return CSPDarknet(channels, depth, backbone_model_path)
  40. def forward( self, x):
  41. # backbone
  42. feat1, feat2, feat3 = self.backbone(x)
  43. # 1024, 20, 20 -> 512, 20, 20
  44. P5 = self.conv_for_feat3(feat3)
  45. # 512, 20, 20 -> 512, 40, 40
  46. P5_upsample = self.upsample(P5)
  47. # 512, 40, 40 -> 1024, 40, 40
  48. P4 = torch.cat([P5_upsample, feat2], 1)
  49. # 1024, 40, 40 -> 512, 40, 40
  50. P4 = self.conv3_for_upsample1(P4)
  51. # 512, 40, 40 -> 256, 40, 40
  52. P4 = self.conv_for_feat2(P4)
  53. # 256, 40, 40 -> 256, 80, 80
  54. P4_upsample = self.upsample(P4)
  55. # 256, 80, 80 concat 256, 80, 80 -> 512, 80, 80
  56. P3 = torch.cat([P4_upsample, feat1], 1)
  57. # 512, 80, 80 -> 256, 80, 80
  58. P3 = self.conv3_for_upsample2(P3)
  59. # 256, 80, 80 -> 256, 40, 40
  60. P3_downsample = self.down_sample1(P3)
  61. # 256, 40, 40 concat 256, 40, 40 -> 512, 40, 40
  62. P4 = torch.cat([P3_downsample, P4], 1)
  63. # 512, 40, 40 -> 512, 40, 40
  64. P4 = self.conv3_for_downsample1(P4)
  65. # 512, 40, 40 -> 512, 20, 20
  66. P4_downsample = self.down_sample2(P4)
  67. # 512, 20, 20 cat 512, 20, 20 -> 1024, 20, 20
  68. P5 = torch.cat([P4_downsample, P5], 1)
  69. # 1024, 20, 20 -> 1024, 20, 20
  70. P5 = self.conv3_for_downsample2(P5)
  71. #---------------------------------------------------#
  72. # 第三个特征层
  73. # y3=(batch_size,24,80,80)
  74. #---------------------------------------------------#
  75. out2 = self.yolo_head_P3(P3)
  76. #---------------------------------------------------#
  77. # 第二个特征层
  78. # y2=(batch_size,24,40,40)
  79. #---------------------------------------------------#
  80. out1 = self.yolo_head_P4(P4)
  81. #---------------------------------------------------#
  82. # 第一个特征层
  83. # y1=(batch_size,24,20,20)
  84. #---------------------------------------------------#
  85. out0 = self.yolo_head_P5(P5)
  86. if self.need_detect_box:
  87. return self.detectBox([out0, out1, out2])
  88. return out0, out1, out2



10,13, 16,30, 33,23,  30,61, 62,45, 59,119,  116,90, 156,198, 373,326


        (1,24,80,80)对应第一组10,13, 16,30, 33,23

        (1,24,40,40)对应第二组30,61, 62,45, 59,119

         (1,24,20,20)对应第三组116,90, 156,198, 373,326







  1. #---------------------------------------------------#
  2. # 对真实框进行归一化,调整到0-1之间
  3. #---------------------------------------------------#
  4. box[:, [ 0, 2]] = box[:, [ 0, 2]] / self.input_shape[ 1]
  5. box[:, [ 1, 3]] = box[:, [ 1, 3]] / self.input_shape[ 0]
  6. #---------------------------------------------------#
  7. # 序号为0、1的部分,为真实框的中心
  8. # 序号为2、3的部分,为真实框的宽高
  9. # 序号为4的部分,为真实框的种类
  10. #---------------------------------------------------#
  11. box[:, 2: 4] = box[:, 2: 4] - box[:, 0: 2]
  12. box[:, 0: 2] = box[:, 0: 2] + box[:, 2: 4] / 2



  1. def get_near_points( self, x, y, i, j):
  2. sub_x = x - i
  3. sub_y = y - j
  4. if sub_x > 0.5 and sub_y > 0.5:
  5. return [[ 0, 0], [ 1, 0], [ 0, 1]]
  6. elif sub_x < 0.5 and sub_y > 0.5:
  7. return [[ 0, 0], [- 1, 0], [ 0, 1]]
  8. elif sub_x < 0.5 and sub_y < 0.5:
  9. return [[ 0, 0], [- 1, 0], [ 0, - 1]]
  10. else:
  11. return [[ 0, 0], [ 1, 0], [ 0, - 1]]


  1. #-------------------------------------------------------#
  2. # 计算出正样本在特征层上的中心点
  3. #-------------------------------------------------------#
  4. batch_target[:, [ 0, 2]] = targets[:, [ 0, 2]] * in_w
  5. batch_target[:, [ 1, 3]] = targets[:, [ 1, 3]] * in_h




  1. # wh : num_true_box, 2
  2. # np.expand_dims(wh, 1) : num_true_box, 1, 2
  3. # anchors : 9, 2
  4. # np.expand_dims(anchors, 0) : 1, 9, 2
  5. #
  6. # ratios_of_gt_anchors代表每一个真实框和每一个先验框的宽高的比值
  7. # ratios_of_gt_anchors : num_true_box, 9, 2
  8. # ratios_of_anchors_gt代表每一个先验框和每一个真实框的宽高的比值
  9. # ratios_of_anchors_gt : num_true_box, 9, 2
  10. #
  11. # ratios : num_true_box, 9, 4
  12. # max_ratios代表每一个真实框和每一个先验框的宽高的比值的最大值
  13. # max_ratios : num_true_box, 9
  14. #-------------------------------------------------------#
  15. ratios_of_gt_anchors = np.expand_dims(batch_target[:, 2: 4], 1) / np.expand_dims(anchors, 0)
  16. ratios_of_anchors_gt = np.expand_dims(anchors, 0) / np.expand_dims(batch_target[:, 2: 4], 1)
  17. ratios = np.concatenate([ratios_of_gt_anchors, ratios_of_anchors_gt], axis = - 1)
  18. max_ratios = np. max(ratios, axis = - 1)
  19. for t, ratio in enumerate(max_ratios):
  20. # -------------------------------------------------------#
  21. # 和gt相比 去掉宽高相差太大的anchors
  22. # 这里阈值=4
  23. # 因为tw = (sigmoid(gtw) * 2) ** 2 th = (sigmoid(gth) * 2) ** 2
  24. # tw和th的取值是(0, 4)
  25. # -------------------------------------------------------#
  26. over_threshold = ratio < self.threshold
  27. over_threshold[np.argmin(ratio)] = True
  28. for k, mask in enumerate(self.anchors_mask[l]):
  29. if not over_threshold[mask]:
  30. continue
  31. #----------------------------------------#
  32. # 获得真实框属于哪个网格点
  33. # x 1.25 => 1
  34. # y 3.75 => 3
  35. #----------------------------------------#
  36. i = int(np.floor(batch_target[t, 0]))
  37. j = int(np.floor(batch_target[t, 1]))
  38. offsets = self.get_near_points(batch_target[t, 0], batch_target[t, 1], i, j)
  39. for offset in offsets:
  40. local_i = i + offset[ 0]
  41. local_j = j + offset[ 1]
  42. if local_i >= in_w or local_i < 0 or local_j >= in_h or local_j < 0:
  43. continue
  44. if box_best_ratio[l][k, local_j, local_i] != 0:
  45. if box_best_ratio[l][k, local_j, local_i] > ratio[mask]:
  46. y_true[l][k, local_j, local_i, :] = 0
  47. else:
  48. continue
  49. #----------------------------------------#
  50. # 取出真实框的种类
  51. #----------------------------------------#
  52. c = int(batch_target[t, 4])
  53. #----------------------------------------#
  54. # tx、ty代表中心调整参数的真实值
  55. #----------------------------------------#
  56. y_true[l][k, local_j, local_i, 0] = batch_target[t, 0]
  57. y_true[l][k, local_j, local_i, 1] = batch_target[t, 1]
  58. y_true[l][k, local_j, local_i, 2] = batch_target[t, 2]
  59. y_true[l][k, local_j, local_i, 3] = batch_target[t, 3]
  60. y_true[l][k, local_j, local_i, 4] = 1
  61. y_true[l][k, local_j, local_i, c + 5] = 1
  62. #----------------------------------------#
  63. # 获得当前先验框最好的比例
  64. #----------------------------------------#
  65. box_best_ratio[l][k, local_j, local_i] = ratio[mask]




        中点偏移的范围由原来的(0, 1)调整到了( −0.5, 1.5),公式如下:





        : 最终预测坐标信息


        : 中心点所在的网格的左上角坐标

        : anchor框的大小




        Location loss:定位损失,采用GIoU loss(CIoU的具体介绍可以参考https://blog.csdn.net/xian0710830114/article/details/128177705),只计算正样本的定位损失,利用前4个值计算损失。

        Objectness loss:obj置信度损失,采用BCE loss,计算的是否有物体的obj损失。利用第5个值计算损失。

        Classes loss:分类损失,采用BCE loss,只计算正样本的分类损失。利用第5个值后面的所有值计算损失。

        总的损失函数是一个Multi-task Loss:


  1. def box_giou( self, b1, b2):
  2. """
  3. 输入为:
  4. ----------
  5. b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
  6. b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
  7. 返回为:
  8. -------
  9. giou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
  10. """
  11. # ----------------------------------------------------#
  12. # 求出预测框左上角右下角
  13. # ----------------------------------------------------#
  14. print( "max:", torch. max(b1), torch. max(b2))
  15. b1_xy = b1[..., : 2]
  16. b1_wh = b1[..., 2: 4]
  17. b1_wh_half = b1_wh / 2.
  18. b1_mins = b1_xy - b1_wh_half
  19. b1_maxes = b1_xy + b1_wh_half
  20. # ----------------------------------------------------#
  21. # 求出真实框左上角右下角
  22. # ----------------------------------------------------#
  23. b2_xy = b2[..., : 2]
  24. b2_wh = b2[..., 2: 4]
  25. b2_wh_half = b2_wh / 2.
  26. b2_mins = b2_xy - b2_wh_half
  27. b2_maxes = b2_xy + b2_wh_half
  28. # ----------------------------------------------------#
  29. # 求真实框和预测框所有的iou
  30. # ----------------------------------------------------#
  31. intersect_mins = torch. max(b1_mins, b2_mins)
  32. intersect_maxes = torch. min(b1_maxes, b2_maxes)
  33. intersect_wh = torch. max(intersect_maxes - intersect_mins, torch.zeros_like(intersect_maxes))
  34. intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
  35. b1_area = b1_wh[..., 0] * b1_wh[..., 1]
  36. b2_area = b2_wh[..., 0] * b2_wh[..., 1]
  37. union_area = b1_area + b2_area - intersect_area
  38. iou = intersect_area / union_area
  39. # ----------------------------------------------------#
  40. # 找到包裹两个框的最小框的左上角和右下角
  41. # ----------------------------------------------------#
  42. enclose_mins = torch. min(b1_mins, b2_mins)
  43. enclose_maxes = torch. max(b1_maxes, b2_maxes)
  44. enclose_wh = torch. max(enclose_maxes - enclose_mins, torch.zeros_like(intersect_maxes))
  45. # ----------------------------------------------------#
  46. # 计算对角线距离
  47. # ----------------------------------------------------#
  48. enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1]
  49. giou = iou - (enclose_area - union_area) / enclose_area
  50. return giou
  51. def BCELoss( self, pred, target):
  52. epsilon = 1e-7
  53. pred = self.clip_by_tensor(pred, epsilon, 1.0 - epsilon)
  54. output = - target * torch.log(pred) - ( 1.0 - target) * torch.log( 1.0 - pred)
  55. return output
  56. def MSELoss( self, pred, target):
  57. return torch. pow(pred - target, 2)



        (1) 通过灵活的配置参数,可以得到不同复杂度的模型,Yolov5n、Yolov5s、Yolov5m、Yolov5l、Yolov5x。

        (2) Mosaic数据增强、Mosaic利用了四张图片进行拼接实现数据中增强,优点是丰富检测物体的背景,且在计算时一下子会计算四张图片的数据。

        (3) 使用SiLU激活函数。

        (4) 多正样本匹配:在之前的Yolo系列里面,在训练时每一个真实框对应一个正样本,即在训练时,每一个真实框仅由一个先验框负责预测。YoloV5中为了加快模型的训练效率,增加了正样本的数量,在训练时,每一个真实框可以由多个先验框负责预测。


