Histograms of Oriented Gradients for Human Detection

Histograms of Oriented Gradients for Human Detection

本文是论文笔记。

转载注明:http://blog.csdn.net/stdcoutzyx/article/details/40299383

Hog算子进行行人检测,是05年在CVPR上发表的经典文章。使用Hog算子提取特征,然后使用SVM来进行分类。

1. Definition

Hog Descriptor:locally normalized histogram of gradient orientation in dense overlapping grids, 即局部归一化梯度方向直方图。

2. Important Properties

  1. Fine-scale gradients,较好尺度的梯度计算
  2. Fine orientation binning,较好的方向分区
  3. Relatively coarse spatial binning,相对粗粒度的空间分区
  4. High-quality local contrast normalization in overlapping descriptor blocks,在重叠块中的高质量局部对比度归一化

3. Algorithms

Default detector properties:
  • RGB colour space with no gamma correction.
  • [-1,0,1] gradient filter with no smoothing.
  • Linear gradient voting into 9 orientation bins in 0-180
  • 16×16 pixel blocks of four 8×8 pixel cells.
  • Gaussian spatial window with variance=8.
  • L2-Hys(Lowe-style clipped L2 norm) block normalization.
  • Block spacing stride of 8 pixels(hence 4-fold coverage of each cell).
  • 64×128 detection window
  • Linear SVM classifier

3.1 Process

基本概念:

  • Cell:统计梯度直方图的最小单元,论文中为8×8。
  • Blocks:做直方图归一化的单元,论文中为2×2个cell,即block大小为16×16.

Hog+SVM算法的过程分为如下几个步骤:

  • 使用gamma变换对图片进行归一化。
  • 计算每个像素点的梯度方向。
  • 在Cell中对梯度方向进行统计,得到直方图。
  • 在Block中对Cell的梯度直方图进行归一化,Block以窗口滑过每个Cell,Block可重叠。
  • 将每个block的直方图串联起来,形成整幅图片的特征向量。
  • 这里需要注意,每个Cell可能作为多个Block的子部分被归一化放到整幅图片的特征向量中。
  • 使用Linear SVM算法对特征向量进行分类,得到最终模型。

3.2 Gamma / Colour Normalization

  • Pixel representation: grayscale, RGB, LAB.
  • Optionally with power law (gamma) equalization or log compression
  • Best: LAB and RGB with square root gamma compression

3.3 Gradient Computation

Compare on schemes below:
  • Various 1-D point derivatives

    • uncentred [-1,1]
    • centred [-1,0,1]
    • cubic-corrected [1,-8,0,8,-1]).
    • Those with Gaussian derivatives.
  • 3×3 sobel masks.
  • 2×2 diagonal ones (the most compact centred 2-D derivative masks)

Simplest scheme turns out to be the best. Uncentred [-1,0,1] without Gaussian smooth work best. Larger mask and smoothing damages the performance significantly.

For colour images, calculate separate gradients for each channel, take the one with largest norm as the pixel’s gradient vector.

3.4 Spatial / Orientation Binning

对每个Cell统计得到直方图,步骤如下:
  • Calculate a weighted vote for each pixel based on the orientation of the gradient element centred on it.

    • Vote weight is function of the gradient magnitude at the pixel

      • Magnitude itself
      • Its square
      • Its square root
    • Magnitude itself gives the best result.
  • Accumulated into orientation bins over local spatial regions that called cells
    • Orientation bins have two kinds:

      • 0-180: unsigned gradient
      • 0-360: signed gradient
    • Unsigned gradient is better, for human’s wide range of clothing and background colour make the the signs of contrasts uninformative.
      • Include the signs information maybe helpful in some other object recognition task like cars, motobikes.
    • Number of orientation bins get to the best at 9 using unsigned gradient.
    • To reduce aliasing, votes are interpolated bilinearly between the neighbouring bin centers in both orientation and position.

3.5 Normalization and Descriptor Blocks

  • Gradient strengths vary over a wide range owing to local variations in illumination and fore-background contrast, so local contrast normalization is essential.
  • Grouping cells into larger spatial blocks and contrast normalization each block separately.
    • An alternative center-surround style cell normalization scheme is also investigated.

      • 以某cell为中心,使用高斯为周围的cell加权,用该cell和周边cell得到一个总值,以此总值归一化。这样,在最终的结果中,一个cell的权重只出现一次。也因此效果下降。
  • Overlapping of the blocks seems redundant but improves the performance significantly.
  • 论文中使用了两种算子:R-HOG和C-HOG。
  • Vertical cell (2×1) and horizontal cell (1×2) are also considered.
  • It’s useful to down-weight pixels near the edges of the blocks by applying a Gaussian spatial window to each pixel before accumulating orientation votes into cells.

3.6 Block Normalization schemes

把多个Cell组合成一个Block后,就形成一个Block向量v,对于v,有如下几种方法做归一化:
  • L2-norm
  • L2-Hys
  • L1-norm
  • L1-sqrt

4. Experiment

共进行如下几组实验:
  • Compare with previous algorithm

    • Generalized Haar Wavelets
    • PCA-Sift
    • Shape Contexts
  • Effect of gradient scale
    • 在计算梯度时高斯平滑带来的效果测试
  • Effect of orientation bins’ number
    • 考察角度分区带来的效果变化
  • Effect of normalization method
    • 在block归一化时,考察不同归一化方法的效果
  • Effect of overlap
    • 在block与cell进行组合时,不同的overlap带来的效果。此时cell的大小为8×8.
    • 注意:这里有一个参数stride,表示block每次滑动间隔的像素数。
  • Block大小与Cell大小不同带来的效果变化
  • 检测窗口大小不同带来的效果变化
  • SVM参数带来的效果变化

检验指标:

  • Miss Rate:错检率,所有判为有行人的sample中,被错判(没有行人被判为有行人)的样本比例。
  • FPPW:False Positives Per Window,平均每个窗口的漏检率,漏检率为所有有行人的sample中,被判为没有行人的样本比例。平均到每个检测窗口

5. Hog算子深入理解

Hog算子最重要的思想是,在一副图像中,局部目标的appearance和shape能够被梯度或边缘的方向密度分布很好的描述。

Hog算子有很多优点:
  • 由于其在图像的局部细胞单元上操作,所以对图像的几何(geometric)和光学(photometric)形变都能保持很好的不变形,因为这两种形变只会出现在更大的空间区域上。
  • 实验表明,粗的空域抽样(coarse spatial sampling)、精细的方向抽样(fine orientation sampling)以及较强的局部光学归一化(strong local photometric normalization)等条件下,行人只要大体能够保持直立的姿势,容许有一些细微的肢体动作而不影响检测效果。

R-HOG跟SIFT描述器看起来很相似,但他们的不同之处是:R-HOG是在单一尺度下、密集的网格内、没有对方向排序的情况下被计算出来(are computed in dense grids at some single scale without orientation alignment);而SIFT描述器是在多尺度下、稀疏的图像关键点上、对方向排序的情况下被计算出来(are computed at sparse, scale-invariant key image points and are rotated
to align orientation)。补充一点,R-HOG是各区间被组合起来用于对空域信息进行编码(are used in conjunction to encode spatial form information),而SIFT的各描述器是单独使用的(are used singly)。

6. Reference

  • [1]. Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.
时间: 2024-10-11 12:13:27

Histograms of Oriented Gradients for Human Detection的相关文章

[2005CVPR]Histograms of Oriented Gradients for Human Detection

HOG这种方法跟边缘方向直方图,尺度不变特征变换(SIFT)以及形状上下文方法(shape contexts)有很多相似之处,但与它们的不同点是:HOG描述器是在一个网格密集的大小统一的细胞单元上计算,而且为了提高性能,还采用了重叠的局部对比度归一化.HOG方法是在图像的局部细胞单元上操作,所以它对图像几何和光学的形变都能保持很好的不变性. 算法步骤step1:Gamma/Colour Normalization 作者分别在灰度空间.RGB色彩空间和LAB色彩空间上对图像进行色彩和 伽马归一化,

Histogram of Oriented Gradients for Human Detection 翻译

用于人体检测的方向梯度直方图 Navneet Dalal,Bill Triggs 摘要 我们研究了视觉目标检测的特征集问题,并用线性SVM方法进行人体检测来测试,通过与当前的基于边缘和梯度的描述子进行实验对比,得出方向梯度直方图(Histograms of Oriented Gradient,HOG)描述子在行人检测方面表现更加突出.我们研究了计算过程中每一阶段的影响,得出小尺度梯度(fine-scale gradients).精细方向采样(fine orientation binning).粗

行人检測之HOG特征(Histograms of Oriented Gradients)

之前的文章行人计数.计次提到HOG特征这个概念,这两天看了一下原版的论文,了解了一下HOG特征的原理,并依据自己的理解将这种方法的流程写了下来,假设有不正确的地方欢迎指正. HOG(Histograms of Oriented Gradients)特征的基本思想:The basic idea is that local object appearance and shape can often be characterized rather well by the distribution of

Face recognition using Histograms of Oriented Gradients

Face recognition using Histograms of Oriented Gradients 这篇论文的主要内容是将Hog算子应用到人脸识别上. 转载请注明:http://blog.csdn.net/stdcoutzyx/article/details/40757997 1. Main Contribution Extract Hog descriptors from a regular grid. Fusion of HOG descriptors at different

(转)梯度方向直方图HOG(Histograms of Oriented Gradients )

HOG(Histograms of Oriented Gradients )梯度方向直方图 方向梯度直方图(Histogram of Oriented Gradient, HOG)特征是一种在计算机视觉和图像处理中用来进行物体检测的特征描述子.此方法使用了图像的本身的梯度方向特征,类似于边缘方向直方图方法,SIFT描述子,和上下文形状方法,但其特征在于其在一个网格密集的大小统一的方格单元上计算,而且为了提高精确度使用了重叠的局部对比度归一化的方法. 这篇文章的作者Navneet Dalal和Bi

Histograms of Sparse Codes for Object Detection用于目标检测的稀疏码直方图

AbstractObject detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? We provide an affirmative answer by proposing a

Real-Time Human Detection using Relational Depth Similarity Features

Real-Time Human Detection using Relational Depth Similarity Features 本文是论文笔记. 转载请注明:http://blog.csdn.net/stdcoutzyx/article/details/40299191 本文使用TOF摄像头采集到的图像里有深度信息的特点,提出了一种新的深度相关的相似度特征(RDSF)来检测行人.同时,利用深度信息,可以判断人员遮挡的情况,将这种情况考虑进算法,可以同时提升Hog与RDSF的效果. 1

A Novel Human Detection Approach Based on Depth Map via Kinect

In this paper, a new method of human detection based on depth map from 3D sensor Kinect is proposed. First, the pixel filtering and context filtering are employed to roughly repair defects on the depth map due to information inaccuracy captured by Ki

Histogram of Oriented Gridients(HOG) 方向梯度直方图

Histogram of Oriented Gridients,缩写为HOG,是目前计算机视觉.模式识别领域很常用的一种描述图像局部纹理的特征.这个特征名字起的也很直白,就是说先计算图片某一区域中不同方向上梯度的值,然后进行累积,得到直方图,这个直方图呢,就可以代表这块区域了,也就是作为特征,可以输入到分类器里面了.那么,接下来介绍一下HOG的具体原理和计算方法,以及一些引申. 1.分割图像 因为HOG是一个局部特征,因此如果你对一大幅图片直接提取特征,是得不到好的效果的.原理很简单.从信息论角