小言_互联网的博客

【Keras计算机视觉】Faster R-CNN神经网络实现目标检测实战(附源码和数据集 超详细)

332人阅读  评论(0)

需要源码请点赞关注收藏后评论区留言私信~~~

一、目标检测的概念

目标检测是计算机视觉和数字图像处理的一个热门方向,广泛应用于机器人导航、智能视频监控、工业检测、航空航天等诸多领域,通过计算机视觉减少对人力资本的消耗,具有重要的现实意义。因此,目标检测也就成为了近年来理论和应用的研究热点,它是图像处理和计算机视觉学科的重要分支,也是智能监控系统的核心部分,同时目标检测也是泛身份识别领域的一个基础性的算法,对后续的人脸识别、步态识别、人群计数、实例分割等任务起着至关重要的作用。

目标检测的任务是找出图像中所有感兴趣的目标,并确定它们的位置和类别,由于各类物体有不同的形状,姿态,加上成像时受光照,遮挡等因素的干扰,目标检测一直是计算机视觉领域最严峻的挑战之一

二、目标检测算法评价指标

目标检测需要预测出目标的具体位置以及目标类别,对于一个目标是否检测正确,首先要确定预测类别置信度是否达到阈值,之后确定预测框与实际框的重合度大小是否超过规定阈值,针对重合度的定义,通常采用IoU来代表,IoU是指对目标预测框与实际框之间的交集面积与两个框之间并集面积之比,IoU越大表示预测框与实际框之间重合度越高 检测得越准确

准确率为对于某个预测类别来说,预测正确的框占所有预测框的比例,而召回率为对于某个预测类别来说,预测正确的框占所有真实框的比例,两个指标计算方法如下

                  

其中TP表示正确预测到的正样本数量,FP表示错误预测的正样本数量,FN表示错误预测的负真实样本数量

AP表示平均精准度,简单来说就是对PR曲线上的Precision值求均值,对于PR曲线来说,我们使用积分来进行计算

           

 在实际应用中,我们并不直接对该PR曲线进行计算,而是对PR曲线进行平滑处理,即对PR曲线上的每个点,Precision的值取该点右侧最大的Precision的值

深度卷积神经网络目标检测算法性能对比如下

检测框架

mAP

FPS

R-FCN

79.4

7

Faster R-CNN

76.4

5

SSD500

76.8

19

YOLO

63.4

45

YOLO v2

78.6

40

YOLO v3

82.3

39

 三、目标检测项目实战

用到的训练集为VOC数据集,效果展示如下

数字代表了检测物体在图片中的相对坐标

 

 

分类如下

 四、代码

项目结构如下

keras_frcnn文件夹里面存放着实现Faster R-CNN所用的各种类和方法

下面testing文件夹里面放着测试代码

 

部分代码如下 需要全部代码请点赞关注收藏后评论区私信


  
  1. from keras.layers import Layer
  2. import keras.backend as K
  3. if K.backend() == 'tensorflow':
  4. import tensorflow as tf
  5. class RoiPoolingConv( Layer):
  6. '''ROI pooling layer for 2D inputs.
  7. See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,
  8. K. He, X. Zhang, S. Ren, J. Sun
  9. # Arguments
  10. pool_size: int
  11. Size of pooling region to use. pool_size = 7 will result in a 7x7 region.
  12. num_rois: number of regions of interest to be used
  13. # Input shape
  14. list of two 4D tensors [X_img,X_roi] with shape:
  15. X_img:
  16. `(1, channels, rows, cols)` if dim_ordering='th'
  17. or 4D tensor with shape:
  18. `(1, rows, cols, channels)` if dim_ordering='tf'.
  19. X_roi:
  20. `(1,num_rois,4)` list of rois, with ordering (x,y,w,h)
  21. # Output shape
  22. 3D tensor with shape:
  23. `(1, num_rois, channels, pool_size, pool_size)`
  24. '''
  25. def __init__( self, pool_size, num_rois, **kwargs):
  26. self.dim_ordering = K.common.image_dim_ordering()
  27. assert self.dim_ordering in { 'tf', 'th'}, 'dim_ordering must be in {tf, th}'
  28. self.pool_size = pool_size
  29. self.num_rois = num_rois
  30. super(RoiPoolingConv, self).__init__(**kwargs)
  31. def build( self, input_shape):
  32. if self.dim_ordering == 'th':
  33. self.nb_channels = input_shape[ 0][ 1]
  34. elif self.dim_ordering == 'tf':
  35. self.nb_channels = input_shape[ 0][ 3]
  36. def compute_output_shape( self, input_shape):
  37. if self.dim_ordering == 'th':
  38. return None, self.num_rois, self.nb_channels, self.pool_size, self.pool_size
  39. else:
  40. return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels
  41. def call( self, x, mask=None):
  42. assert( len(x) == 2)
  43. img = x[ 0]
  44. rois = x[ 1]
  45. input_shape = K.shape(img)
  46. outputs = []
  47. for roi_idx in range(self.num_rois):
  48. x = rois[ 0, roi_idx, 0]
  49. y = rois[ 0, roi_idx, 1]
  50. w = rois[ 0, roi_idx, 2]
  51. h = rois[ 0, roi_idx, 3]
  52. row_length = w / float(self.pool_size)
  53. col_length = h / float(self.pool_size)
  54. num_pool_regions = self.pool_size
  55. #NOTE: the RoiPooling implementation differs between theano and tensorflow due to the lack of a resize op
  56. # in theano. The theano implementation is much less efficient and leads to long compile times
  57. if self.dim_ordering == 'th':
  58. for jy in range(num_pool_regions):
  59. for ix in range(num_pool_regions):
  60. x1 = x + ix * row_length
  61. x2 = x1 + row_length
  62. y1 = y + jy * col_length
  63. y2 = y1 + col_length
  64. x1 = K.cast(x1, 'int32')
  65. x2 = K.cast(x2, 'int32')
  66. y1 = K.cast(y1, 'int32')
  67. y2 = K.cast(y2, 'int32')
  68. x2 = x1 + K.maximum( 1,x2-x1)
  69. y2 = y1 + K.maximum( 1,y2-y1)
  70. new_shape = [input_shape[ 0], input_shape[ 1],
  71. y2 - y1, x2 - x1]
  72. x_crop = img[:, :, y1:y2, x1:x2]
  73. xm = K.reshape(x_crop, new_shape)
  74. pooled_val = K. max(xm, axis=( 2, 3))
  75. outputs.append(pooled_val)
  76. elif self.dim_ordering == 'tf':
  77. x = K.cast(x, 'int32')
  78. y = K.cast(y, 'int32')
  79. w = K.cast(w, 'int32')
  80. h = K.cast(h, 'int32')
  81. rs = tf.image.resize(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
  82. outputs.append(rs)
  83. final_output = K.concatenate(outputs, axis= 0)
  84. final_output = K.reshape(final_output, ( 1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))
  85. if self.dim_ordering == 'th':
  86. final_output = K.permute_dimensions(final_output, ( 0, 1, 4, 2, 3))
  87. else:
  88. final_output = K.permute_dimensions(final_output, ( 0, 1, 2, 3, 4))
  89. return final_output
  90. def get_config( self):
  91. config = { 'pool_size': self.pool_size,
  92. 'num_rois': self.num_rois}
  93. base_config = super(RoiPoolingConv, self).get_config()
  94. return dict( list(base_config.items()) + list(config.items()))

 数据生成器代码如下


  
  1. from __future__ import absolute_import
  2. import numpy as np
  3. import cv2
  4. import random
  5. import copy
  6. from . import data_augment
  7. import threading
  8. import itertools
  9. def union( au, bu, area_intersection):
  10. area_a = (au[ 2] - au[ 0]) * (au[ 3] - au[ 1])
  11. area_b = (bu[ 2] - bu[ 0]) * (bu[ 3] - bu[ 1])
  12. area_union = area_a + area_b - area_intersection
  13. return area_union
  14. def intersection( ai, bi):
  15. x = max(ai[ 0], bi[ 0])
  16. y = max(ai[ 1], bi[ 1])
  17. w = min(ai[ 2], bi[ 2]) - x
  18. h = min(ai[ 3], bi[ 3]) - y
  19. if w < 0 or h < 0:
  20. return 0
  21. return w*h
  22. def iou( a, b):
  23. # a and b should be (x1,y1,x2,y2)
  24. if a[ 0] >= a[ 2] or a[ 1] >= a[ 3] or b[ 0] >= b[ 2] or b[ 1] >= b[ 3]:
  25. return 0.0
  26. area_i = intersection(a, b)
  27. area_u = union(a, b, area_i)
  28. return float(area_i) / float(area_u + 1e-6)
  29. def get_new_img_size( width, height, img_min_side=600):
  30. if width <= height:
  31. f = float(img_min_side) / width
  32. resized_height = int(f * height)
  33. resized_width = img_min_side
  34. else:
  35. f = float(img_min_side) / height
  36. resized_width = int(f * width)
  37. resized_height = img_min_side
  38. return resized_width, resized_height
  39. class SampleSelector:
  40. def __init__( self, class_count):
  41. # ignore classes that have zero samples
  42. self.classes = [b for b in class_count.keys() if class_count[b] > 0]
  43. self.class_cycle = itertools.cycle(self.classes)
  44. self.curr_class = next(self.class_cycle)
  45. def skip_sample_for_balanced_class( self, img_data):
  46. class_in_img = False
  47. for bbox in img_data[ 'bboxes']:
  48. cls_name = bbox[ 'class']
  49. if cls_name == self.curr_class:
  50. class_in_img = True
  51. self.curr_class = next(self.class_cycle)
  52. break
  53. if class_in_img:
  54. return False
  55. else:
  56. return True
  57. def calc_rpn( C, img_data, width, height, resized_width, resized_height, img_length_calc_function):
  58. downscale = float(C.rpn_stride)
  59. anchor_sizes = C.anchor_box_scales
  60. anchor_ratios = C.anchor_box_ratios
  61. num_anchors = len(anchor_sizes) * len(anchor_ratios)
  62. # calculate the output map size based on the network architecture
  63. (output_width, output_height) = img_length_calc_function(resized_width, resized_height)
  64. n_anchratios = len(anchor_ratios)
  65. # initialise empty output objectives
  66. y_rpn_overlap = np.zeros((output_height, output_width, num_anchors))
  67. y_is_box_valid = np.zeros((output_height, output_width, num_anchors))
  68. y_rpn_regr = np.zeros((output_height, output_width, num_anchors * 4))
  69. num_bboxes = len(img_data[ 'bboxes'])
  70. num_anchors_for_bbox = np.zeros(num_bboxes).astype( int)
  71. best_anchor_for_bbox = - 1*np.ones((num_bboxes, 4)).astype( int)
  72. best_iou_for_bbox = np.zeros(num_bboxes).astype(np.float32)
  73. best_x_for_bbox = np.zeros((num_bboxes, 4)).astype( int)
  74. best_dx_for_bbox = np.zeros((num_bboxes, 4)).astype(np.float32)
  75. # get the GT box coordinates, and resize to account for image resizing
  76. gta = np.zeros((num_bboxes, 4))
  77. for bbox_num, bbox in enumerate(img_data[ 'bboxes']):
  78. # get the GT box coordinates, and resize to account for image resizing
  79. gta[bbox_num, 0] = bbox[ 'x1'] * (resized_width / float(width))
  80. gta[bbox_num, 1] = bbox[ 'x2'] * (resized_width / float(width))
  81. gta[bbox_num, 2] = bbox[ 'y1'] * (resized_height / float(height))
  82. gta[bbox_num, 3] = bbox[ 'y2'] * (resized_height / float(height))
  83. # rpn ground truth
  84. for anchor_size_idx in range( len(anchor_sizes)):
  85. for anchor_ratio_idx in range(n_anchratios):
  86. anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][ 0]
  87. anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][ 1]
  88. for ix in range(output_width):
  89. # x-coordinates of the current anchor box
  90. x1_anc = downscale * (ix + 0.5) - anchor_x / 2
  91. x2_anc = downscale * (ix + 0.5) + anchor_x / 2
  92. # ignore boxes that go across image boundaries
  93. if x1_anc < 0 or x2_anc > resized_width:
  94. continue
  95. for jy in range(output_height):
  96. # y-coordinates of the current anchor box
  97. y1_anc = downscale * (jy + 0.5) - anchor_y / 2
  98. y2_anc = downscale * (jy + 0.5) + anchor_y / 2
  99. # ignore boxes that go across image boundaries
  100. if y1_anc < 0 or y2_anc > resized_height:
  101. continue
  102. # bbox_type indicates whether an anchor should be a target
  103. bbox_type = 'neg'
  104. # this is the best IOU for the (x,y) coord and the current anchor
  105. # note that this is different from the best IOU for a GT bbox
  106. best_iou_for_loc = 0.0
  107. for bbox_num in range(num_bboxes):
  108. # get IOU of the current GT box and the current anchor box
  109. curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc])
  110. # calculate the regression targets if they will be needed
  111. if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap:
  112. cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0
  113. cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0
  114. cxa = (x1_anc + x2_anc)/ 2.0
  115. cya = (y1_anc + y2_anc)/ 2.0
  116. tx = (cx - cxa) / (x2_anc - x1_anc)
  117. ty = (cy - cya) / (y2_anc - y1_anc)
  118. tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc))
  119. th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))
  120. if img_data[ 'bboxes'][bbox_num][ 'class'] != 'bg':
  121. # all GT boxes should be mapped to an anchor box, so we keep track of which anchor box was best
  122. if curr_iou > best_iou_for_bbox[bbox_num]:
  123. best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]
  124. best_iou_for_bbox[bbox_num] = curr_iou
  125. best_x_for_bbox[bbox_num,:] = [x1_anc, x2_anc, y1_anc, y2_anc]
  126. best_dx_for_bbox[bbox_num,:] = [tx, ty, tw, th]
  127. # we set the anchor to positive if the IOU is >0.7 (it does not matter if there was another better box, it just indicates overlap)
  128. if curr_iou > C.rpn_max_overlap:
  129. bbox_type = 'pos'
  130. num_anchors_for_bbox[bbox_num] += 1
  131. # we update the regression layer target if this IOU is the best for the current (x,y) and anchor position
  132. if curr_iou > best_iou_for_loc:
  133. best_iou_for_loc = curr_iou
  134. best_regr = (tx, ty, tw, th)
  135. # if the IOU is >0.3 and <0.7, it is ambiguous and no included in the objective
  136. if C.rpn_min_overlap < curr_iou < C.rpn_max_overlap:
  137. # gray zone between neg and pos
  138. if bbox_type != 'pos':
  139. bbox_type = 'neutral'
  140. # turn on or off outputs depending on IOUs
  141. if bbox_type == 'neg':
  142. y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
  143. y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
  144. elif bbox_type == 'neutral':
  145. y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
  146. y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
  147. elif bbox_type == 'pos':
  148. y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
  149. y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
  150. start = 4 * (anchor_ratio_idx + n_anchratios * anchor_size_idx)
  151. y_rpn_regr[jy, ix, start:start+ 4] = best_regr
  152. # we ensure that every bbox has at least one positive RPN region
  153. for idx in range(num_anchors_for_bbox.shape[ 0]):
  154. if num_anchors_for_bbox[idx] == 0:
  155. # no box with an IOU greater than zero ...
  156. if best_anchor_for_bbox[idx, 0] == - 1:
  157. continue
  158. y_is_box_valid[
  159. best_anchor_for_bbox[idx, 0], best_anchor_for_bbox[idx, 1], best_anchor_for_bbox[idx, 2] + n_anchratios *
  160. best_anchor_for_bbox[idx, 3]] = 1
  161. y_rpn_overlap[
  162. best_anchor_for_bbox[idx, 0], best_anchor_for_bbox[idx, 1], best_anchor_for_bbox[idx, 2] + n_anchratios *
  163. best_anchor_for_bbox[idx, 3]] = 1
  164. start = 4 * (best_anchor_for_bbox[idx, 2] + n_anchratios * best_anchor_for_bbox[idx, 3])
  165. y_rpn_regr[
  166. best_anchor_for_bbox[idx, 0], best_anchor_for_bbox[idx, 1], start:start+ 4] = best_dx_for_bbox[idx, :]
  167. y_rpn_overlap = np.transpose(y_rpn_overlap, ( 2, 0, 1))
  168. y_rpn_overlap = np.expand_dims(y_rpn_overlap, axis= 0)
  169. y_is_box_valid = np.transpose(y_is_box_valid, ( 2, 0, 1))
  170. y_is_box_valid = np.expand_dims(y_is_box_valid, axis= 0)
  171. y_rpn_regr = np.transpose(y_rpn_regr, ( 2, 0, 1))
  172. y_rpn_regr = np.expand_dims(y_rpn_regr, axis= 0)
  173. pos_locs = np.where(np.logical_and(y_rpn_overlap[ 0, :, :, :] == 1, y_is_box_valid[ 0, :, :, :] == 1))
  174. neg_locs = np.where(np.logical_and(y_rpn_overlap[ 0, :, :, :] == 0, y_is_box_valid[ 0, :, :, :] == 1))
  175. num_pos = len(pos_locs[ 0])
  176. # one issue is that the RPN has many more negative than positive regions, so we turn off some of the negative
  177. # regions. We also limit it to 256 regions.
  178. num_regions = 256
  179. if len(pos_locs[ 0]) > num_regions/ 2:
  180. val_locs = random.sample( range( len(pos_locs[ 0])), len(pos_locs[ 0]) - num_regions/ 2)
  181. y_is_box_valid[ 0, pos_locs[ 0][val_locs], pos_locs[ 1][val_locs], pos_locs[ 2][val_locs]] = 0
  182. num_pos = num_regions/ 2
  183. if len(neg_locs[ 0]) + num_pos > num_regions:
  184. val_locs = random.sample( range( len(neg_locs[ 0])), len(neg_locs[ 0]) - num_pos)
  185. y_is_box_valid[ 0, neg_locs[ 0][val_locs], neg_locs[ 1][val_locs], neg_locs[ 2][val_locs]] = 0
  186. y_rpn_cls = np.concatenate([y_is_box_valid, y_rpn_overlap], axis= 1)
  187. y_rpn_regr = np.concatenate([np.repeat(y_rpn_overlap, 4, axis= 1), y_rpn_regr], axis= 1)
  188. return np.copy(y_rpn_cls), np.copy(y_rpn_regr)
  189. class threadsafe_iter:
  190. """Takes an iterator/generator and makes it thread-safe by
  191. serializing call to the `next` method of given iterator/generator.
  192. """
  193. def __init__( self, it):
  194. self.it = it
  195. self.lock = threading.Lock()
  196. def __iter__( self):
  197. return self
  198. def next( self):
  199. with self.lock:
  200. return next(self.it)
  201. def threadsafe_generator( f):
  202. """A decorator that takes a generator function and makes it thread-safe.
  203. """
  204. def g( *a, **kw):
  205. return threadsafe_iter(f(*a, **kw))
  206. return g
  207. def get_anchor_gt( all_img_data, class_count, C, img_length_calc_function, backend, mode='train'):
  208. # The following line is not useful with Python 3.5, it is kept for the legacy
  209. # all_img_data = sorted(all_img_data)
  210. sample_selector = SampleSelector(class_count)
  211. while True:
  212. if mode == 'train':
  213. np.random.shuffle(all_img_data)
  214. for img_data in all_img_data:
  215. try:
  216. if C.balanced_classes and sample_selector.skip_sample_for_balanced_class(img_data):
  217. continue
  218. # read in image, and optionally add augmentation
  219. if mode == 'train':
  220. img_data_aug, x_img = data_augment.augment(img_data, C, augment= True)
  221. else:
  222. img_data_aug, x_img = data_augment.augment(img_data, C, augment= False)
  223. (width, height) = (img_data_aug[ 'width'], img_data_aug[ 'height'])
  224. (rows, cols, _) = x_img.shape
  225. assert cols == width
  226. assert rows == height
  227. # get image dimensions for resizing
  228. (resized_width, resized_height) = get_new_img_size(width, height, C.im_size)
  229. # resize the image so that smalles side is length = 600px
  230. x_img = cv2.resize(x_img, (resized_width, resized_height), interpolation=cv2.INTER_CUBIC)
  231. try:
  232. y_rpn_cls, y_rpn_regr = calc_rpn(C, img_data_aug, width, height, resized_width, resized_height, img_length_calc_function)
  233. except:
  234. continue
  235. # Zero-center by mean pixel, and preprocess image
  236. x_img = x_img[:,:, ( 2, 1, 0)] # BGR -> RGB
  237. x_img = x_img.astype(np.float32)
  238. x_img[:, :, 0] -= C.img_channel_mean[ 0]
  239. x_img[:, :, 1] -= C.img_channel_mean[ 1]
  240. x_img[:, :, 2] -= C.img_channel_mean[ 2]
  241. x_img /= C.img_scaling_factor
  242. x_img = np.transpose(x_img, ( 2, 0, 1))
  243. x_img = np.expand_dims(x_img, axis= 0)
  244. y_rpn_regr[:, y_rpn_regr.shape[ 1]// 2:, :, :] *= C.std_scaling
  245. if backend == 'tf':
  246. x_img = np.transpose(x_img, ( 0, 2, 3, 1))
  247. y_rpn_cls = np.transpose(y_rpn_cls, ( 0, 2, 3, 1))
  248. y_rpn_regr = np.transpose(y_rpn_regr, ( 0, 2, 3, 1))
  249. yield np.copy(x_img), [np.copy(y_rpn_cls), np.copy(y_rpn_regr)], img_data_aug
  250. except Exception as e:
  251. print(e)
  252. continue

创作不易 觉得有帮助请点赞关注收藏~~~


转载:https://blog.csdn.net/jiebaoshayebuhui/article/details/128236772
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场