
【目标检测系列】yolov3,yolov4训练自己的数据(pytorch 版本)+ opencv调用训练结果方法+openvino推理引擎加速

二.原理:主要说说网络结构吧,因为不看这个的话,与代码对应不上啊。网络输出还是一张图吧。红色部分可以看成是网络主干部分,如darknet53蓝色部分是在13*13这个层上输出结果,橙色部分是在26*26这个层下输出的结果,绿色部分是在52*52这个层输出的结果。这个13*13的由来呢。原图416*416 缩放32倍就这样了。这张图与cfg文件中的好些参数一一对应的。对应关系看这个https://blog.csdn.net/gbz3300255/article/details/106255335主要就是关注输入与输出,因为要与代码对应去

之所以有三个颜色的圈圈是为了做多尺度检测。三次检测,每次对应的感受野不同,32倍降采样的感受野最大,适合检测大的目标16倍适合一般大小的物体8倍的感受野最小,适合检测小目标。具体的anchor box的大小在cfg文件中设置。如下图红色表示anchor中心所在。




输出:【1,(13*13 +26*26 + 52*52)*3, 85】  维的一个数据

(13*13 +26*26 + 52*52 *3)是啥呢  是 一共有多少个检测中心,乘3是每个中心有3种先验框。那么(13*13 +26*26 + 52*52)*3就是一共有这么多个检测框结果存在 。

85是啥呢。是上面13*13或26*26或512*512的特征图上一个点的特征值的维度。这个维度怎么来的呢,网络检测目标有80类,那么点对应的检测框有80个概率值,其对应每个类的可信度,每个点的检测框有4个关于框位置的值(x,y,w,h),还有1个此框的置信度(这个置信度就是预测结果框与groundtruch的iou值)。那么这个点对应的框的特征值就是(1 + 4 + 80 ) = 85维的.



1.准备数据集:先明白yolov3需要的数据集长啥样。说白了它就是要一张图对应一个标签文件。一堆图和对应的一堆文件组成了图像数据集和标签数据集。标签数据集名字和图像名一一对应,标签数据集内容为:类别标号  目标框x中心, 目标框y中心, 目标框宽度值, 目标框高度值。注意,前面的类别编号直接0 1 2 3 4....等就可以了,后面的值是除以宽或高后的浮点值。如下图超级实用。

因为为了快速上手看效果。直接下载现成数据集好了。我用的是CCTSDB 数据集。用程序将其box读出来写成上图中man007.txt文本内的形式。

代码不方便贴了,实现功能说一下很简单。读取CCTSDB数据集,读取每张图片,并读取对应的json文件,将类别以及框读出来,类别按0 1 2 ..编号,框数据按上图方法计算,将其写成一列,形如:

0 0.669 0.5785714285714286 0.032 0.08285714285714285

1.1 准备文本文件: train.txt    test.txt   val.txt   lables的文本文件


  1. BloodImage_00091
  2. BloodImage_00156
  3. BloodImage_00389
  4. BloodImage_00030
  5. BloodImage_00124
  6. BloodImage_00278
  7. BloodImage_00261


  1. BloodImage_00258
  2. BloodImage_00320
  3. BloodImage_00120


  1. BloodImage_00777
  2. BloodImage_00951


0 0.669 0.5785714285714286 0.032 0.08285714285714285


1.2 准备rbc.data文件,文件名随便取的,记得输入参数时候按这个文件名输入程序就好,内容如下,


  1. classes= 4
  2. train= data/train.txt
  3. valid= data/test.txt
  4. names= data/rbc.names
  5. backup=backup/
  6. eval=coco

1.3 准备rbc.names文件,文件名随便取的,记得输入参数时候按这个文件名输入程序就好,内容如下。

四类的类型,犯懒 就直接写作a,b,c,d了 根据自己的类别去改吧

  1. a
  2. b
  3. c
  4. d


1.4 准备图片数据,训练图放入images里,测试图放入samples里。images中的图与lables中的文本一一对应。



2.修改cfg文件:确定用哪个模型再去修改哪个cfg文件吧,例如我用yolov3做训练,那就去cfg文件夹下找到yolov3.cfg,修改它就行,我只修改了类别数以及filters的值,因为filters与类别数有关。yolov3看网络结构可知需要有3处修改。其他如anchor的大小等 如果原来的框与待检测目标差异较大,建议还是重新聚类计算一组anchors出来吧

  1. classes = 4
  2. #filters=3 * (5 + classes )
  3. filters= 27 #3 * (5 + 4)


  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. import random
  4. import argparse
  5. import os
  6. #参数名称
  7. parser = argparse.ArgumentParser(description= '使用该脚本生成YOLO-V3的anchor boxes\n')
  8. parser.add_argument( '--input_annotation_txt_dir',required= True,type=str,help= '输入存储图片的标注txt文件(注意不要有中文)')
  9. parser.add_argument( '--output_anchors_txt',required= True,type=str,help= '输出的存储Anchor boxes的文本文件')
  10. parser.add_argument( '--input_num_anchors',required= True,default= 6,type=int,help= '输入要计算的聚类(Anchor boxes的个数)')
  11. parser.add_argument( '--input_cfg_width',required= True,type=int,help= "配置文件中width")
  12. parser.add_argument( '--input_cfg_height',required= True,type=int,help= "配置文件中height")
  13. args = parser.parse_args()
  14. '''
  15. centroids 聚类点 尺寸是 numx2,类型是ndarray
  16. annotation_array 其中之一的标注框
  17. '''
  18. def IOU(annotation_array,centroids):
  19. #
  20. similarities = []
  21. #其中一个标注框
  22. w,h = annotation_array
  23. for centroid in centroids:
  24. c_w,c_h = centroid
  25. if c_w >=w and c_h >= h: #第1中情况
  26. similarity = w*h/(c_w*c_h)
  27. elif c_w >= w and c_h <= h: #第2中情况
  28. similarity = w*c_h/(w*h + (c_w - w)*c_h)
  29. elif c_w <= w and c_h >= h: #第3种情况
  30. similarity = c_w*h/(w*h +(c_h - h)*c_w)
  31. else: #第3种情况
  32. similarity = (c_w*c_h)/(w*h)
  33. similarities.append(similarity)
  34. #将列表转换为ndarray
  35. return np.array(similarities,np.float32) #返回的是一维数组,尺寸为(num,)
  36. '''
  37. k_means:k均值聚类
  38. annotations_array 所有的标注框的宽高,N个标注框,尺寸是Nx2,类型是ndarray
  39. centroids 聚类点 尺寸是 numx2,类型是ndarray
  40. '''
  41. def k_means(annotations_array,centroids,eps=0.00005,iterations=200000):
  42. #
  43. N = annotations_array.shape[ 0] #C=2
  44. num = centroids.shape[ 0]
  45. #损失函数
  46. distance_sum_pre = -1
  47. assignments_pre = -1*np.ones(N,dtype=np.int64)
  48. #
  49. iteration = 0
  50. #循环处理
  51. while( True):
  52. #
  53. iteration += 1
  54. #
  55. distances = []
  56. #循环计算每一个标注框与所有的聚类点的距离(IOU)
  57. for i in range(N):
  58. distance = 1 - IOU(annotations_array[i],centroids)
  59. distances.append(distance)
  60. #列表转换成ndarray
  61. distances_array = np.array(distances,np.float32) #该ndarray的尺寸为 Nxnum
  62. #找出每一个标注框到当前聚类点最近的点
  63. assignments = np.argmin(distances_array,axis= 1) #计算每一行的最小值的位置索引
  64. #计算距离的总和,相当于k均值聚类的损失函数
  65. distances_sum = np.sum(distances_array)
  66. #计算新的聚类点
  67. centroid_sums = np.zeros(centroids.shape,np.float32)
  68. for i in range(N):
  69. centroid_sums[assignments[i]] += annotations_array[i] #计算属于每一聚类类别的和
  70. for j in range(num):
  71. centroids[j] = centroid_sums[j]/(np.sum(assignments==j))
  72. #前后两次的距离变化
  73. diff = abs(distances_sum-distance_sum_pre)
  74. #打印结果
  75. print( "iteration: {},distance: {}, diff: {}, avg_IOU: {}\n".format(iteration,distances_sum,diff,np.sum( 1-distances_array)/(N*num)))
  76. #三种情况跳出while循环:1:循环20000次,2:eps计算平均的距离很小 3:以上的情况
  77. if (assignments==assignments_pre).all():
  78. print( "按照前后两次的得到的聚类结果是否相同结束循环\n")
  79. break
  80. if diff < eps:
  81. print( "按照eps结束循环\n")
  82. break
  83. if iteration > iterations:
  84. print( "按照迭代次数结束循环\n")
  85. break
  86. #记录上一次迭代
  87. distance_sum_pre = distances_sum
  88. assignments_pre = assignments.copy()
  89. if __name__== '__main__':
  90. #聚类点的个数,anchor boxes的个数
  91. num_clusters = args.input_num_anchors
  92. #索引出文件夹中的每一个标注文件的名字(.txt)
  93. names = os.listdir(args.input_annotation_txt_dir)
  94. #标注的框的宽和高
  95. annotations_w_h = []
  96. for name in names:
  97. txt_path = os.path.join(args.input_annotation_txt_dir,name)
  98. #读取txt文件中的每一行
  99. f = open(txt_path, 'r')
  100. for line in f.readlines():
  101. line = line.rstrip( '\n')
  102. w,h = line.split( ' ')[ 3:] #这时读到的w,h是字符串类型
  103. #eval()函数用来将字符串转换为数值型
  104. annotations_w_h.append((eval(w),eval(h)))
  105. f.close()
  106. #将列表annotations_w_h转换为numpy中的array,尺寸是(N,2),N代表多少框
  107. annotations_array = np.array(annotations_w_h,dtype=np.float32)
  108. N = annotations_array.shape[ 0]
  109. #对于k-means聚类,随机初始化聚类点
  110. random_indices = [random.randrange(N) for i in range(num_clusters)] #产生随机数
  111. centroids = annotations_array[random_indices]
  112. #k-means聚类
  113. k_means(annotations_array,centroids, 0.00005, 200000)
  114. #对centroids按照宽排序,并写入文件
  115. widths = centroids[:, 0]
  116. sorted_indices = np.argsort(widths)
  117. anchors = centroids[sorted_indices]
  118. #将anchor写入文件并保存
  119. f_anchors = open(args.output_anchors_txt, 'w')
  120. #
  121. for anchor in anchors:
  122. f_anchors.write( '%d,%d'%(int(anchor[ 0]*args.input_cfg_width),int(anchor[ 1]*args.input_cfg_height)))
  123. f_anchors.write( '\n')


python kmean.py --input_annotation_txt_dir data/labels --output_anchors_txt 123456.txt --input_num_anchors 9 --input_cfg_width 640 --input_cfg_height 320


  1. 12,15
  2. 14,20
  3. 18,25
  4. 24,32
  5. 24,18
  6. 33,44
  7. 39,28
  8. 59,49
  9. 115,72





parser.add_argument('--batch-size', type=int, default=16)  # effective bs = batch_size * accumulate = 16 * 4 = 64


parser.add_argument('--single-cls', action='store_false', help='train as single-class dataset')


parser.add_argument('--adam', action='store_true', help='use adam optimizer')


#hyp['lr0'] *= 0.1  # reduce lr (i.e. SGD=5E-3, Adam=5E-4)




python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --epochs 2000




parser.add_argument('--weights', type=str, default='weights/yolov3.weights', help='initial weights path')


python detect.py --cfg cfg/yolov3-tiny.cfg --weights weights/best.pt


RuntimeError: Error(s) in loading state_dict for Darknet



  1. 将model.load_state_dict(torch.load(weights, map_location=device)[ 'model'])
  2. 改为:
  3. model.load_state_dict(torch.load(weights, map_location=device)[ 'model'], False)



2.注意一点,下面这个对图像放缩的方法与yolov3的方法不一致,这个应该改一下,我犯懒没改。。。没改的结果就是检测不出目标来。我用的1280*720的图,将其按512放缩,结果放缩图恰好是512*288 符合32的倍数。记住一点,缩放的目的是将图的长和宽缩放成32的倍数,且不能改原图比例关系(形变是不允许的)。那么一般都需要按行或列缩放,然后在其中一个方向做填充,填成32的倍数。随手写了段代码,但是忽然发现,我用不到,放上吧有空填充齐了。


  1. void YoloResize(Mat in, Mat & out)
  2. {
  3. int w = in.cols;
  4. int h = in.rows;
  5. int target_w = 512;
  6. int target_h = 512;
  7. float ratio0 = ( float)target_w / w;
  8. float ratio1 = ( float)target_h / h;
  9. float scale = min(ratio0, ratio1); //转换的最小比例
  10. //保证长或宽,至少一个符合目标图像的尺寸
  11. int nw = int(w * scale);
  12. int nh = int(h * scale);
  13. //缩放图像
  14. cv::resize( in, out, cv::Size(nw, nh), ( 0, 0),( 0, 0),cv::INTER_CUBIC);
  15. //设置输出图像大小,凑足32的倍数。将缩放好的图像放在输出图中间。
  16. if (ratio0 <= ratio1) //
  17. {
  18. //上下填充
  19. int addh = nh % 32;
  20. int newh = nh + addh;
  21. }
  22. else
  23. {
  24. //左右填充
  25. }
  26. }



  1. // This code is written at BigVision LLC.
  2. //It is subject to the license terms in the LICENSE file found in this distribution and at http://opencv.org/license.html
  3. #include <fstream>
  4. #include <sstream>
  5. #include <iostream>
  6. #include <opencv2/dnn.hpp>
  7. #include <opencv2/imgproc.hpp>
  8. #include <opencv2/highgui.hpp>
  9. using namespace cv;
  10. using namespace dnn;
  11. using namespace std;
  12. // Initialize the parameters
  13. float confThreshold = 0.5; // Confidence threshold
  14. float nmsThreshold = 0.4; // Non-maximum suppression threshold
  15. int inpWidth = 512; // Width of network's input image
  16. int inpHeight = 192; // Height of network's input image
  17. vector<string> classes;
  18. // Remove the bounding boxes with low confidence using non-maxima suppression
  19. void postprocess(Mat& frame, const vector<Mat>& out);
  20. // Draw the predicted bounding box
  21. void drawPred( int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
  22. // Get the names of the output layers
  23. vector<String> getOutputsNames( const Net& net);
  24. int main( int argc, char** argv)
  25. {
  26. //*
  27. string classesFile = "E:\\LL\\rbc.names";
  28. ifstream ifs(classesFile.c_str());
  29. string line;
  30. while (getline(ifs, line)) classes.push_back(line);
  31. // Give the configuration and weight files for the model
  32. String modelConfiguration = "E:\\LL\\yolov3_new.cfg";
  33. String modelWeights = "E:\\LL\\converted.weights";
  34. // Load the network
  35. Net net = readNetFromDarknet(modelConfiguration, modelWeights);
  36. net.setPreferableBackend(DNN_BACKEND_OPENCV);
  37. net.setPreferableTarget(DNN_TARGET_CPU);
  38. // Open a video file or an image file or a camera stream.
  39. string str, outputFile;
  40. //VideoCapture cap("E:\\SSS.mp4");
  41. VideoWriter video;
  42. Mat frame, blob;
  43. // Create a window
  44. static const string kWinName = "Deep learning object detection in OpenCV";
  45. namedWindow(kWinName, WINDOW_NORMAL);
  46. // Process frames.
  47. while (waitKey( 1) != 27)
  48. {
  49. // get frame from the video
  50. //cap >> frame;
  51. frame = imread( "E:\\LL\\1.jpg");
  52. // Stop the program if reached end of video
  53. if (frame.empty()) {
  54. //waitKey(3000);
  55. break;
  56. }
  57. // Create a 4D blob from a frame.
  58. cout << "inpWidth = " << inpWidth << endl;
  59. cout << "inpHeight = " << inpHeight << endl;
  60. blobFromImage(frame, blob, 1 / 255.0, cv::Size(inpWidth, inpHeight), Scalar( 0, 0, 0), true, false);
  61. //Sets the input to the network
  62. net.setInput(blob);
  63. // Runs the forward pass to get output of the output layers
  64. vector<Mat> outs;
  65. net.forward(outs, getOutputsNames(net));
  66. // Remove the bounding boxes with low confidence
  67. postprocess(frame, outs);
  68. // Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
  69. vector< double> layersTimes;
  70. double freq = getTickFrequency() / 1000;
  71. double t = net.getPerfProfile(layersTimes) / freq;
  72. string label = format( "Inference time for a frame : %.2f ms", t);
  73. putText(frame, label, Point( 0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar( 0, 0, 255));
  74. // Write the frame with the detection boxes
  75. Mat detectedFrame;
  76. frame.convertTo(detectedFrame, CV_8U);
  77. imshow(kWinName, frame);
  78. waitKey( 100000);
  79. }
  80. //cap.release();
  81. //*/
  82. return 0;
  83. }
  84. // Remove the bounding boxes with low confidence using non-maxima suppression
  85. void postprocess(Mat& frame, const vector<Mat>& outs)
  86. {
  87. vector< int> classIds;
  88. vector< float> confidences;
  89. vector<Rect> boxes;
  90. for (size_t i = 0; i < outs.size(); ++i)
  91. {
  92. // Scan through all the bounding boxes output from the network and keep only the
  93. // ones with high confidence scores. Assign the box's class label as the class
  94. // with the highest score for the box.
  95. float* data = ( float*)outs[i].data;
  96. for ( int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
  97. {
  98. Mat scores = outs[i].row(j).colRange( 5, outs[i].cols);
  99. Point classIdPoint;
  100. double confidence;
  101. // Get the value and location of the maximum score
  102. minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
  103. if (confidence > 0)
  104. {
  105. confidence = confidence;
  106. }
  107. if (confidence > confThreshold)
  108. {
  109. int centerX = ( int)(data[ 0] * frame.cols);
  110. int centerY = ( int)(data[ 1] * frame.rows);
  111. int width = ( int)(data[ 2] * frame.rows);
  112. int height = ( int)(data[ 3] * frame.cols);
  113. int left = centerX - width / 2;
  114. int top = centerY - height / 2;
  115. classIds.push_back(classIdPoint.x);
  116. confidences.push_back(( float)confidence);
  117. boxes.push_back(Rect(left, top, width, height));
  118. }
  119. }
  120. }
  121. // Perform non maximum suppression to eliminate redundant overlapping boxes with
  122. // lower confidences
  123. vector< int> indices;
  124. NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
  125. for (size_t i = 0; i < indices.size(); ++i)
  126. {
  127. int idx = indices[i];
  128. Rect box = boxes[idx];
  129. drawPred(classIds[idx], confidences[idx], box.x, box.y,
  130. box.x + box.width, box.y + box.height, frame);
  131. }
  132. }
  133. // Draw the predicted bounding box
  134. void drawPred( int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
  135. {
  136. //Draw a rectangle displaying the bounding box
  137. rectangle(frame, Point(left, top), Point(right, bottom), Scalar( 255, 178, 50), 3);
  138. //Get the label for the class name and its confidence
  139. string label = format( "%.2f", conf);
  140. if (!classes.empty())
  141. {
  142. CV_Assert(classId < ( int)classes.size());
  143. label = classes[classId] + ":" + label;
  144. }
  145. //Display the label at the top of the bounding box
  146. int baseLine;
  147. Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
  148. top = max(top, labelSize.height);
  149. rectangle(frame, Point(left, top - round( 1.5*labelSize.height)), Point(left + round( 1.5*labelSize.width), top + baseLine), Scalar( 255, 255, 255), FILLED);
  150. putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.75, Scalar( 0, 0, 0), 1);
  151. }
  152. // Get the names of the output layers
  153. vector<String> getOutputsNames( const Net& net)
  154. {
  155. static vector<String> names;
  156. if (names.empty())
  157. {
  158. //Get the indices of the output layers, i.e. the layers with unconnected outputs
  159. vector< int> outLayers = net.getUnconnectedOutLayers();
  160. //get the names of all the layers in the network
  161. vector<String> layersNames = net.getLayerNames();
  162. // Get the names of the output layers in names
  163. names.resize(outLayers.size());
  164. for (size_t i = 0; i < outLayers.size(); ++i)
  165. names[i] = layersNames[outLayers[i] - 1];
  166. }
  167. return names;
  168. }





https://www.cnblogs.com/jsxyhelu/p/11340822.html mark一下



1. https://blog.csdn.net/zhangping1987/article/details/84942680   anchors计算

2.https://blog.csdn.net/sinat_34054843/article/details/88046041   导入模型错误解决方法

3.https://codeload.github.com/zqfang/YOLOv3_CPP/zip/master   yolov3的C++代码

4.https://blog.csdn.net/sue_kong/article/details/104401008  出错处理,安装opencv4.0的时候

5.https://blog.csdn.net/zmdsjtu/article/details/81913927 opencv调用编好的网络权值做预测

6.https://blog.csdn.net/hzqgangtiexia/article/details/80509211 关于学习率的问题

7.https://www.cnblogs.com/lvdongjie/p/11270447.html 也是学习率




