论文笔记：Histograms of Oriented Gradients for Human Detection(Hog特征)_飞道的博客

论文笔记：Histograms of Oriented Gradients for Human Detection(Hog特征)

2020-05-29 08:46 929人阅读评论(0)

前言

这篇论文提出了大名鼎鼎的HOG特征

Abstract

使用SVM作为分类器，实现行人检测

adopting linear SVM based human detec- tion as a test case

HOG特征提取非常非常好

we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors sig- nificantly outperform existing feature sets for human detec- tion

Introduction

相对于其他的特征提取器，HOG要跟更好

a) We study the issue of feature sets for human detection, showing that lo- cally normalized Histogram of Oriented Gradient (HOG) de- scriptors provide excellent performance relative to other ex- isting feature sets

已经被提出的两个非常厉害的算法，SIFT和Shape contexts都是类似于梯度方向直方图的思想

The proposed descriptors are reminiscent of edge orientation histograms [4,5], SIFT descriptors [12] and shape contexts [1], but they are computed on a dense grid of uniformly spaced cells and they use overlapping local contrast normalizations for im- proved performance

为了让模型更快，作者使用了线性SVM模型对HOG提取出的特征进行分析

For sim- plicity and speed, we use linear SVM as a baseline classifier throughout the study.

Previous Work

See [6] for a survey. Papageorgiou et al [18] describe a pedestrian detector based on a polynomial SVM using rectified Haar wavelets as input descriptors >>
Depoortere et al give an optimized version of this
In contrast, our detector uses a simpler archi- tecture with a single detection window, but appears to give significantly higher performance on pedestrian images

Overview of the Method

HOG基于局部梯度直方图

The method is based on evaluating well-normalized local histograms of image gradient orienta- tions in a dense grid

局部物体的外观形状特征可以被梯度、梯度方向很好的表示

The basic idea is that local object appearance and shape can often be characterized rather well by the distribution of local intensity gradients

把图片分成一个个小的cell，对每个cell计算梯度和梯度方向。然后几个cell合成一个block代表，在这个代表的基础上在进行对比度归一化，这可以让特征对光线强度具有不变性。

In practice this is im- plemented by dividing the image window into small spatial regions (“cells”), for each cell accumulating a local 1-D his- togram of gradient directions or edge orientations over the pixels of the cell. The combined histogram entries form the representation. For better invariance to illumination, shad- owing, etc., it is also useful to contrast-normalize the local responses before using them

基本流程图如下
使用梯度直方图比较成功的有知名的SIFT算法
Shape Context算法的基本思想也类似与cell和block，但是该算法只用到了像素计数，没有用到梯度方向直方图，但是这提高了算法的运行效率
HOG和SIFT算法的优点在于，他们可以捕捉局部物体梯度和方向

Overview of Reuslt

Implementation and Performance

1.Gamma Normalization

使用RGB和LAB两种颜色的表示，对结果影响不大，但是如果使用灰度图像，会降低最后的performance

RGB and LAB colour spaces give comparable results, but restricting to grayscale reduces performance by 1.5% at 10−4FPPW

关闭Gamma校正，参见博客

2.Gradient Computation

最终的performance对图像梯度计算的方式很敏感，不过有趣的，使用最简单的计算方式反而效果更好

Detector performance is sensitive to the way in which gradients are computed, but the simplest scheme turns out to be the best

作者同时使用了梯度检测算子和高斯算子来计算图像梯度，用了好几种大小的模版和值，发现还σ = 0时，以及使用[-1,0,1]的mask时，performance是最好的
Using larger masks always seems to decrease performance
图示就说明了，还是不要使用高斯平滑了

3.Spatial Orientation Binning

定义一个cell，然后计算每个局部cell的梯度方向直方图

Each pixel calculates a weighted vote for an edge orientation histogram channel based on the orientation of the gradient element centred on it, and the votes are accumu- lated into orientation bins over local spatial regions that we call cells

vote是基于双线性插值法的

(2) To reduce aliasing, votes are interpolated bilinearly between the neighbouring bin centres in both orientation and posi- tion.

注意，这里每个bin是基于每个像素的梯度幅度进行统计的。当然了，也可以是梯度幅度裁剪值，或者其平方值但是，通常还是使用其原始值还是最好的
bin数量的选择，对于最终的performance有着非常大的影响，所以我们需要尽可能选择合适数量的bin，事件表明，bin取9时是比较好的。至于角度的区间，我们可以选择[0,360],也可以选择[0,180]，后者要更好一些

4.Normalization and Decriptor Blocks

光照、对比度等会很大程度的影响梯度，所以需要使用局部对比度归一化，这可以大大提高performance局部对比度归一化，主要是基于block进行的，而block是多个cell构成的

(1) Gradient strengths vary over a wide range owing to local variations in illumination and foreground-background con- trast, so effective local contrast normalization turns out to be essential for good performance

通常，每个cell可能会对多个block做贡献，比如说一幅图里大小为300x300，如果每个cell为3x3，那么cell构成的图就是100x100，如果每个block是由10x10个cell构成的，而且步长为5，那么最后就有14x14个block

(2) In fact, we typically overlap the blocks so that each scalar cell response contributes several components to the final de- scriptor vector, each normalized with respect to a different block

下图就可以看到重叠率对performance的影响
对Block块进行归一化主要有四种策略：L2-norm、L2-Hys(对前者进行裁剪)、L1-norm、L1-sqrt

图示可以看出，L2-norm、Ｌ2-hys、L1-sqrt三种方法效果差不多，但是L1-norm效果就不上很好了。对比没有使用归一化的结果，可以看出局部对比度归一化还是很有比较的

5 .Detector Window and Context

作者为探测窗口的四周增加了4个空白像素，共16个像素，这也提高了最后的performance，如果从16个空白像素降低到8个，最终的performance也会降低，这是因为适当增加空白像素会降低人的分辨率。如果保持探测窗口不变，但是增加人的大小（即人但分辨率变高）也会让performance变低。

6 .Classifier

作者使用了软间隔的SVM和高斯核的SVM，其最终效果如上，使用了高斯核效果还是好一些

Discussion

HOG的performance比小波变换以及各种实现对图像进行平滑的方法都要好，因为一副图像中最有效的信息就是来源于变化大的边缘。如果要对其进行平滑模糊，会丢失很多的信息另外需要注意的是，梯度计算的算子的大小也要适中，不能太大，也不能太小
另外，局部对比度归一化在提取HOG特征的时候也是非常重要的！它可以减少不同光照条件对图像梯度的影响

代码复现

因为时间有限，仓促的实现了一个比较粗糙的版本，主要是为了理解算法，以后有时间了再改改。

import numpy as np
import matplotlib.pyplot as plt
from math import pi
cell_size = (8, 8)
block_size = (4,4)
def gamma(input_img, g=1 / 2.2):
    out_img = np.mean(np.copy(input_img), axis=2)
    out_img = (out_img + 0.5) / 256.0
    out_img = np.power(out_img, g)
    return np.array(out_img * 256 - 0.5, int)
def compute_Hog(img):
    ker_x = [-1, 0, 1]
    height = img.shape[0]
    width = img.shape[1]
    img_x = np.zeros_like(img)
    img_y = np.zeros_like(img)
    for x in range(0, width - 2):
        for y in range(0, height):
            tile = img[y, x:x + 3]
            res = np.sum(tile * ker_x, axis=0)
            img_x[y, x + 1] = res
    for x in range(0, width):
        for y in range(0, height - 2):
            tile = img[y:y + 3, x]
            res = np.sum(tile * ker_x, axis=0)
            img_y[y + 1, x] = res
    img_res = np.array(np.sqrt(img_x ** 2 + img_y ** 2), dtype=int)
    plt.imshow(img_res,cmap='gray')
    plt.show()
    c_h = cell_size[0]
    c_w = cell_size[1]
    cell_all = []
    for y in range(0, height,c_h):
        cell_row = []
        for x in range(0, width, c_w):
            cell_x = img_x[y:y + c_h, x:x + c_w].flatten()
            cell_y = img_y[y:y + c_h, x:x + c_w].flatten()
            cell_g = img_res[y:y + c_h, x:x + c_w].flatten()
            cell = np.zeros(shape=[9,])
            for index,( gx, gy, amp) in enumerate(zip(cell_x, cell_y,cell_g)):
                if gx == 0 or gy == 0: continue
                theta = np.arctan(gx / gy)
                if theta < 0: theta =-2 * theta
                theta/=(pi / 9)
                cur = int(theta)
                cell[cur] = ( theta % 1 ) * amp
                cell[(cur + 1 ) % 8] = (1 - theta % 1) * amp
            cell_row.append(cell)
        cell_all.append(cell_row)
    res = []
    for y in range(0, len(cell_all)-block_size[0],2):
        for x in range(0, len(cell_all[0])-block_size[1]-3,2):
            block = []
            for b_y in range(y,y + block_size[0]):
                for b_x in range(x,x+block_size[1]):
                    block.extend(cell_all[b_y][b_x])
                    # L2-norm
            block_np = np.array(block)
            block_np =block_np / (np.sum(np.sqrt(block_np**2))+1e-3)
            res.extend(block_np)
    return np.array(res)


if __name__ == '__main__':
    img_path = 'F:\\DataSet\\MIT_persons_jpg\\per00001.jpg'
    img = plt.imread(img_path)
    img = gamma(img)
    compute_Hog(img)

转载：https://blog.csdn.net/qq_43409114/article/details/106416620

查看评论

飞道的博客

飞道的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章