计算Fisher vector和VLAD

This short tutorial shows how to compute Fisher vector and VLAD encodings with VLFeat MATLAB interface.

These encoding serve a similar purposes: summarizing in a vectorial statistic a number of local feature descriptors (e.g. SIFT). Similarly to bag of visual words, they assign local descriptor to elements in a visual dictionary, obtained with vector quantization (KMeans) in the case of VLAD or a Gaussian Mixture Models for Fisher Vectors. However, rather than storing visual word occurrences only, these representations store a statistics of the difference between dictionary elements and pooled local features.

Fisher encoding

The Fisher encoding uses GMM to construct a visual word dictionary. To exemplify constructing a GMM, consider a number of 2 dimensional data points (see also the GMM tutorial). In practice, these points would be a collection of SIFT or other local image features. The following code fits a GMM to the points:

numFeatures = 5000 ;
dimension = 2 ;
data = rand(dimension,numFeatures) ;

numClusters = 30 ;
[means, covariances, priors] = vl_gmm(data, numClusters);

Next, we create another random set of vectors, which should be encoded using the Fisher Vector representation and the GMM just obtained:

numDataToBeEncoded = 1000;
dataToBeEncoded = rand(dimension,numDataToBeEncoded);

The Fisher vector encoding enc of these vectors is obtained by calling the vl_fisher function using the output of the vl_gmm function:

encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);

The encoding vector is the Fisher vector representation of the data dataToBeEncoded.

Note that Fisher Vectors support several normalization options that can affect substantially the performance of the representation.

VLAD encoding

The Vector of Linearly Agregated Descriptors is similar to Fisher vectors but (i) it does not store second-order information about the features and (ii) it typically use KMeans instead of GMMs to generate the feature vocabulary (although the latter is also an option).

Consider the same 2D data matrix data used in the previous section to train the Fisher vector representation. To compute VLAD, we first need to obtain a visual word dictionary. This time, we use K-means:

numClusters = 30 ;
centers = vl_kmeans(dataLearn, numClusters);

Now consider the data dataToBeEncoded and use the vl_vlad function to compute the encoding. Differently from vl_fishervl_vlad requires the data-to-cluster assignments to be passed in. This allows using a fast vector quantization technique (e.g. kd-tree) as well as switching from soft to hard assignment.

In this example, we use a kd-tree for quantization:

kdtree = vl_kdtreebuild(centers) ;
nn = vl_kdtreequery(kdtree, centers, dataEncode) ;

Now we have in the nn the indexes of the nearest center to each vector in the matrix dataToBeEncoded. The next step is to create an assignment matrix:

assignments = zeros(numClusters,numDataToBeEncoded);
assignments(sub2ind(size(assignments), nn, 1:length(nn))) = 1;

It is now possible to encode the data using the vl_vlad function:

enc = vl_vlad(dataToBeEncoded,centers,assignments);

Note that, similarly to Fisher vectors, VLAD supports several normalization options that can affect substantially the performance of the representation.

from: http://www.vlfeat.org/overview/encodings.html

时间: 2024-12-30 15:54:09

计算Fisher vector和VLAD的相关文章

Fisher vector fundamentals

文章<Fisher Kernels on Visual Vocabularies for Image Categorization>中提到: Pattern classication techniques can be divided into the classes ofgenerative approaches anddiscriminative approaches. While the first class focuses onthe modeling of class-condit

Fisher Vector Encoding and Gaussian Mixture Model

一.背景知识 1. Discriminant  Learning Algorithms(判别式方法) and Generative Learning Algorithms(生成式方法) 现在常见的模式识别方法有两种,一种是判别式方法:一种是生成式方法.可以这样理解生成式方法主要是数据是如何生成的,从统计学的角度而言就是模拟数据的分布distribution;而判别式方法,不管数据是如何生成而是通过数据内在的差异直接进行分类或者回归.举个例子你现有的task是去识别一段语音属于哪一种语言.那么生成

Fisher Vector学习笔记

1,背景 现有的模式分类方法主要分为两类,一类是生成式方法,比如GMM,这类方法主要反映同类数据之间的相似度:一类是判别式方法,比如SVM,主要是反映异类数据之间的差异.fisher kernel是想要结合二者的优势(1,生成式方法可以处理长度不一的输入数据,2,判别式方法不能处理长度不一的数据但是分类效果较好.),将生成式模型用于判别式分类器中. 关于处理长度不一的数据,举例说明如下: 我们要对一个图片集I=X1,X2...中的图片做分类,考虑生成式的方法,GMM,是对每一幅图片Xi=x1,.

Fisher Vector的改进

<Fisher vector学习笔记>中介绍了fisher vector相关知识,本文接着这片学习笔记,来记录论文<Improving the Fisher Kernel for Large-Scale Image Classification>中第三部分提出的对fisher vector的3种改进. 1,L2 Normalization 首先假设一幅图像的特征们X=xt,t=1...T服从一个分布p,对于Large-Scale image,根据大数定律,样本数T增大时,样本均值收

[VLFeat]Fisher vector提取

matlab 代码 % 读入图片 I = vl_impattern('roofs1') ; I = single(vl_imdown(rgb2gray(I))) ; % 设置bin大小 binSize = 8 ; % sparse sift中bin大小是根据该层的高斯平滑的尺度sigma计算来的 % dense sift这里是设定binSize,反推sigma magnif = 3 ; % 对图像做高斯平滑 Is = vl_imsmooth(I, sqrt((binSize/magnif)^2

Fisher Vector 通俗学习

我写东西喜欢五颜六色,也喜欢通俗的来讲!哈哈.... 核心: Fisher vector本质上是用似然函数的梯度vector来表达一幅图像 基础知识的预备: 1. 高斯分布 生活和自然中,很多的事和物的分布都可以近似的看做是高斯分布.比如说:一个班的成绩的优良中差的分布.最优秀的和最差的往往都是少数,一般人是大多数. 高斯分布直观的感受是这样的:这是这种分布的概率情况的表示: 2. 混合高斯分布 问题是:一个班的成绩的分布他也可能是这样的:60分以下以及95分以上很少人,60-75很多人,突然7

【CV知识学习】Fisher Vector

在论文<action recognition with improved trajectories>中看到fisher vector,所以学习一下.但网上很多的资料我觉得都写的不好,查了一遍,按照自己的认识陈述一下,望大牛指正. 核函数: 先来看一下<统计学习方法>里叙述的核函数的概念, 可以看到,核函数其实是一个内积,在SVM的公式可以提炼出内积的部分.数据在低维输入空间可能线性不可分,而在高维希尔伯特空间可能线性可分的,因此会经过一个映射函数.事实上,内积中可以理解为相似性即距

Fisher vector

1.模式识别的方法分为:生成式模型(Generative Model)和判别式模型(Discrimitive) 1)生成式模型 对于输入x和类别标签y:生成式模型主要是估计它们的联合概率分布P(x,y) 主要的方法有:Gaussians. Naive Bayes.Mixtures of Multinomials.Mixtures of Gaussians.Mixtures of Experts.HMMs 2)判别式模型 对于输入x和类别标签y:判别式模型主要是估计条件概率分布P(y|x) 主要的

VLAD特征(vector of locally aggregated descriptors)

<Aggregating local descriptors into a compact image representation>论文笔记 这篇论文中提出了一种新的图片表示方法,也就是VLAD特征,并把它用于大规模图片检索中,得到了很好的实验结果. 目前,BOF在图片检索和分类中应用广泛,首先是因为BOF是基于比较powerful的local特征(如SIFT)得来的,所以表达能力很强:其次是因为计算BOF过程中用到的kmeans也是根据样本在样本空间的距离来聚类的,所以,BOF也可以输入S