飞道的博客

论文笔记:Histograms of Oriented Gradients for Human Detection(Hog特征)

513人阅读  评论(0)

前言

这篇论文提出了大名鼎鼎的HOG特征

Abstract

使用SVM作为分类器,实现行人检测

adopting linear SVM based human detec- tion as a test case

HOG特征提取非常非常好

we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors sig- nificantly outperform existing feature sets for human detec- tion

Introduction

  • 相对于其他的特征提取器,HOG要跟更好

a) We study the issue of feature sets for human detection, showing that lo- cally normalized Histogram of Oriented Gradient (HOG) de- scriptors provide excellent performance relative to other ex- isting feature sets

  • 已经被提出的两个非常厉害的算法,SIFT和Shape contexts都是类似于梯度方向直方图的思想

The proposed descriptors are reminiscent of edge orientation histograms [4,5], SIFT descriptors [12] and shape contexts [1], but they are computed on a dense grid of uniformly spaced cells and they use overlapping local contrast normalizations for im- proved performance

  • 为了让模型更快,作者使用了线性SVM模型对HOG提取出的特征进行分析

For sim- plicity and speed, we use linear SVM as a baseline classifier throughout the study.

Previous Work

  • See [6] for a survey. Papageorgiou et al [18] describe a pedestrian detector based on a polynomial SVM using rectified Haar wavelets as input descriptors >>

  • Depoortere et al give an optimized version of this

  • In contrast, our detector uses a simpler archi- tecture with a single detection window, but appears to give significantly higher performance on pedestrian images

Overview of the Method

  • HOG基于局部梯度直方图

The method is based on evaluating well-normalized local histograms of image gradient orienta- tions in a dense grid

  • 局部物体的外观形状特征可以被梯度、梯度方向很好的表示

The basic idea is that local object appearance and shape can often be characterized rather well by the distribution of local intensity gradients

把图片分成一个个小的cell,对每个cell计算梯度和梯度方向。然后几个cell合成一个block代表,在这个代表的基础上在进行对比度归一化,这可以让特征对光线强度具有不变性。

In practice this is im- plemented by dividing the image window into small spatial regions (“cells”), for each cell accumulating a local 1-D his- togram of gradient directions or edge orientations over the pixels of the cell. The combined histogram entries form the representation. For better invariance to illumination, shad- owing, etc., it is also useful to contrast-normalize the local responses before using them

  • 基本流程图如下

  • 使用梯度直方图比较成功的有知名的SIFT算法

  • Shape Context算法的基本思想也类似与cell和block,但是该算法只用到了像素计数,没有用到梯度方向直方图,但是这提高了算法的运行效率

  • HOG和SIFT算法的优点在于,他们可以捕捉局部物体梯度和方向

Overview of Reuslt

Implementation and Performance

1.Gamma Normalization
  • 使用RGB和LAB两种颜色的表示,对结果影响不大,但是如果使用灰度图像,会降低最后的performance

RGB and LAB colour spaces give comparable results, but restricting to grayscale reduces performance by 1.5% at 10−4FPPW

关闭Gamma校正,参见博客

2.Gradient Computation
  • 最终的performance对图像梯度计算的方式很敏感,不过有趣的,使用最简单的计算方式反而效果更好

Detector performance is sensitive to the way in which gradients are computed, but the simplest scheme turns out to be the best

  • 作者同时使用了梯度检测算子和高斯算子来计算图像梯度,用了好几种大小的模版和值,发现还σ = 0时,以及使用[-1,0,1]的mask时,performance是最好的

  • Using larger masks always seems to decrease performance

  • 图示就说明了,还是不要使用高斯平滑了

3.Spatial Orientation Binning
  • 定义一个cell,然后计算每个局部cell的梯度方向直方图

Each pixel calculates a weighted vote for an edge orientation histogram channel based on the orientation of the gradient element centred on it, and the votes are accumu- lated into orientation bins over local spatial regions that we call cells

  • vote是基于双线性插值法的

(2) To reduce aliasing, votes are interpolated bilinearly between the neighbouring bin centres in both orientation and posi- tion.

  • 注意,这里每个bin是基于每个像素的梯度幅度进行统计的。当然了,也可以是梯度幅度裁剪值,或者其平方值但是,通常还是使用其原始值还是最好的

  • bin数量的选择,对于最终的performance有着非常大的影响,所以我们需要尽可能选择合适数量的bin,事件表明,bin取9时是比较好的。至于角度的区间,我们可以选择[0,360],也可以选择[0,180],后者要更好一些

4.Normalization and Decriptor Blocks
  • 光照、对比度等会很大程度的影响梯度,所以需要使用局部对比度归一化,这可以大大提高performance局部对比度归一化,主要是基于block进行的,而block是多个cell构成的

(1) Gradient strengths vary over a wide range owing to local variations in illumination and foreground-background con- trast, so effective local contrast normalization turns out to be essential for good performance

  • 通常,每个cell可能会对多个block做贡献,比如说一幅图里大小为300x300,如果每个cell为3x3,那么cell构成的图就是100x100,如果每个block是由10x10个cell构成的,而且步长为5,那么最后就有14x14个block

(2) In fact, we typically overlap the blocks so that each scalar cell response contributes several components to the final de- scriptor vector, each normalized with respect to a different block

  • 下图就可以看到重叠率对performance的影响

  • 对Block块进行归一化主要有四种策略:L2-norm、L2-Hys(对前者进行裁剪)、L1-norm、L1-sqrt


图示可以看出,L2-norm、L2-hys、L1-sqrt三种方法效果差不多,但是L1-norm效果就不上很好了。 对比没有使用归一化的结果,可以看出局部对比度归一化还是很有比较的

5 .Detector Window and Context
  • 作者为探测窗口的四周增加了4个空白像素,共16个像素,这也提高了最后的performance,如果从16个空白像素降低到8个,最终的performance也会降低,这是因为适当增加空白像素会降低人的分辨率。如果保持探测窗口不变,但是增加人的大小(即人但分辨率变高)也会让performance变低。
6 .Classifier
  • 作者使用了软间隔的SVM和高斯核的SVM,其最终效果如上,使用了高斯核效果还是好一些

Discussion

  • HOG的performance比小波变换以及各种实现对图像进行平滑的方法都要好,因为一副图像中最有效的信息就是来源于变化大的边缘。如果要对其进行平滑模糊,会丢失很多的信息另外需要注意的是,梯度计算的算子的大小也要适中,不能太大,也不能太小

  • 另外,局部对比度归一化在提取HOG特征的时候也是非常重要的!它可以减少不同光照条件对图像梯度的影响

代码复现

因为时间有限,仓促的实现了一个比较粗糙的版本,主要是为了理解算法,以后有时间了再改改。

import numpy as np
import matplotlib.pyplot as plt
from math import pi
cell_size = (8, 8)
block_size = (4,4)
def gamma(input_img, g=1 / 2.2):
    out_img = np.mean(np.copy(input_img), axis=2)
    out_img = (out_img + 0.5) / 256.0
    out_img = np.power(out_img, g)
    return np.array(out_img * 256 - 0.5, int)
def compute_Hog(img):
    ker_x = [-1, 0, 1]
    height = img.shape[0]
    width = img.shape[1]
    img_x = np.zeros_like(img)
    img_y = np.zeros_like(img)
    for x in range(0, width - 2):
        for y in range(0, height):
            tile = img[y, x:x + 3]
            res = np.sum(tile * ker_x, axis=0)
            img_x[y, x + 1] = res
    for x in range(0, width):
        for y in range(0, height - 2):
            tile = img[y:y + 3, x]
            res = np.sum(tile * ker_x, axis=0)
            img_y[y + 1, x] = res
    img_res = np.array(np.sqrt(img_x ** 2 + img_y ** 2), dtype=int)
    plt.imshow(img_res,cmap='gray')
    plt.show()
    c_h = cell_size[0]
    c_w = cell_size[1]
    cell_all = []
    for y in range(0, height,c_h):
        cell_row = []
        for x in range(0, width, c_w):
            cell_x = img_x[y:y + c_h, x:x + c_w].flatten()
            cell_y = img_y[y:y + c_h, x:x + c_w].flatten()
            cell_g = img_res[y:y + c_h, x:x + c_w].flatten()
            cell = np.zeros(shape=[9,])
            for index,( gx, gy, amp) in enumerate(zip(cell_x, cell_y,cell_g)):
                if gx == 0 or gy == 0: continue
                theta = np.arctan(gx / gy)
                if theta < 0: theta =-2 * theta
                theta/=(pi / 9)
                cur = int(theta)
                cell[cur] = ( theta % 1 ) * amp
                cell[(cur + 1 ) % 8] = (1 - theta % 1) * amp
            cell_row.append(cell)
        cell_all.append(cell_row)
    res = []
    for y in range(0, len(cell_all)-block_size[0],2):
        for x in range(0, len(cell_all[0])-block_size[1]-3,2):
            block = []
            for b_y in range(y,y + block_size[0]):
                for b_x in range(x,x+block_size[1]):
                    block.extend(cell_all[b_y][b_x])
                    # L2-norm
            block_np = np.array(block)
            block_np =block_np / (np.sum(np.sqrt(block_np**2))+1e-3)
            res.extend(block_np)
    return np.array(res)


if __name__ == '__main__':
    img_path = 'F:\\DataSet\\MIT_persons_jpg\\per00001.jpg'
    img = plt.imread(img_path)
    img = gamma(img)
    compute_Hog(img)


转载:https://blog.csdn.net/qq_43409114/article/details/106416620
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场