Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification

引入

Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images

非线性SVM的计算代价巨大

In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes.

 

In recent years the bag-of-features (BoF) model has been extremely popular in image categorization. The method treats an image as a collection of unordered appearance descriptors extracted from local patches, quantizes them into discrete “visual words”, and then computes a compact histogram representation for semantic image classification

The method partitions an image into    segments in different scales L = 0; 1; 2, computes the BoF histogram within each of the 21 segments, and finally concatenates all the histograms to form a vector representation of the image. In case where only the scale L = 0 is used, SPM reduces to BoF.

用sparsecoding 替代VQ

Furthermore, unlike the original SPM that performs spatial pooling by computing histograms, our approach, called ScSPM, uses max spatial pooling that is more robust to local spatial translations and more biological plausible

用max pooling 来替代 spatial pooling

经过稀疏编码之后,用一个线性分类器就能取得很好的效果

Despite of such a popularity, SPM has to run together with nonlinear kernels, such
as the intersection kernel and the Chi-square kernel, in order to achieve a good performance, which requires intensive computation and a large storage.

交叉核,卡方核

 

Linear SPM Using SIFT Sparse Codes

VQ

在训练阶段主要是学习出基向量V,在测试阶段学习出基向量系数U

稀疏编码,给损失函数上加上了稀疏性的约束

同VQ一样,训练阶段学基(过完备),测试阶段得到稀疏

优点:重构误差少;捕获的图像特征突出;据说图像块就是稀疏信号

注意:local sparse coding

所以,用听投票的VQ会造成很大的量化误差,即使使用非线性的SVM效果也不明显,而且计算代价大

 

In this work, we defined the pooling function F as a max pooling function on the absolute sparse codes

据说这个 max pooling 有生物学依据 ~~ 而且更加鲁棒

Similar to the construction of histograms in SPM, we do max pooling Eq. on a spatial pyramid constructed for an image.

 

成功原因分析:

This success is largely due to three factors: (1) SC has much less quantization errors than VQ; (2) It is well known that image patches are sparse in nature, and thus sparse coding is particularly suitable for image data; (3) The computed statistics by max pooling are more salient and robust to local translations.

 

实现

1,Sparse Coding

求解SC的损失函数方程。当U固定或V固定时是凸的,但两者若都不固定则非凸。所以传统的解决办法是固定一个求解另一个,最新提出的 feature-sign search algorithm 计算速度更快

确定基V在线下,可以达到实时的确定特征的表达系数

2,Multi-class Linear SVM

LBFGS

时间: 2024-12-15 16:11:52

Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification的相关文章

Spatial Pyramid Matching 小结

Spatial Pyramid Matching 小结 稀疏编码系列: (一)----Spatial Pyramid 小结 (二)----图像的稀疏表示——ScSPM和LLC的总结 (三)----理解sparse coding (四)----稀疏模型与结构性稀疏模型 --------------------------------------------------------------------------- SPM [1]全称是Spatial Pyramid Matching,出现的背景

【Papers】Spatial Pyramid Matching

参考资料: [1]Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]//Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE, 2006, 2: 2169-2178. [2]htt

Spatial Pyramid Matching

转自:http://blog.csdn.net/jwh_bupt/article/details/9625469 SPM 全称是Spatial Pyramid Matching,出现的背景是bag of visual words模型被大量地用在了Image representation中,但是BOVW模型完全缺失了特征点的位置信息.文章的贡献,看完以后觉得其实挺简单的,和分块直方图其实是一个道理------将图像分成若干块(sub-regions),分别统计每一子块的特征,最后将所有块的特征拼接

理解sparse coding

理解sparse coding 稀疏编码系列: (一)----Spatial Pyramid 小结 (二)----图像的稀疏表示——ScSPM和LLC的总结 (三)----理解sparse coding (四)----稀疏模型与结构性稀疏模型 --------------------------------------------------------------------------- 本文的内容主要来自余凯老师在CVPR2012上给的Tutorial.前面在总结ScSPM和LLC的时候,

Locality-constrained Linear Coding for Image Classification

引入 This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. 特征量化机制 LLC utilizes the locality constraints to project each descriptor into itslocal-coordinate

空间金字塔池化(Spatial Pyramid Pooling, SPP)原理和代码实现(Pytorch)

想直接看公式的可跳至第三节 3.公式修正 一.为什么需要SPP 首先需要知道为什么会需要SPP. 我们都知道卷积神经网络(CNN)由卷积层和全连接层组成,其中卷积层对于输入数据的大小并没有要求,唯一对数据大小有要求的则是第一个全连接层,因此基本上所有的CNN都要求输入数据固定大小,例如著名的VGG模型则要求输入数据大小是 (224*224) . 固定输入数据大小有两个问题: 1.很多场景所得到数据并不是固定大小的,例如街景文字基本上其高宽比是不固定的,如下图示红色框出的文字. 2.可能你会说可以

Spatial pyramid pooling (SPP)-net (空间金字塔池化)笔记(转)

在学习r-cnn系列时,一直看到SPP-net的身影,许多有疑问的地方在这篇论文里找到了答案. 论文:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition 转自:http://blog.csdn.net/xzzppp/article/details/51377731 另可参考:http://zhangliliang.com/2014/09/13/paper-note-sppnet/ http:/

Sparse coding

”凸优化“ 是指一种比较特殊的优化,是指目标函数为凸函数且由约束条件得到的定义域为凸集的优化问题,也就是说目标函数和约束条件都是”凸”的. 稀疏编码算法是一种无监督学习方法,它用来寻找一组“超完备”基向量来更高效地表示样本数据.稀疏编码算法的目的就是找到一组基向量 ,使得我们能将输入向量 表示为这些基向量的线性组合: 虽然形如主成分分析技术(PCA)能使我们方便地找到一组“完备”基向量,但是这里我们想要做的是找到一组“超完备”基向量来表示输入向量 (也就是说,k > n).超完备基的好处是它们能

稀疏编码(Sparse Coding)的前世今生(一) 转自http://blog.csdn.net/marvin521/article/details/8980853

稀疏编码来源于神经科学,计算机科学和机器学习领域一般一开始就从稀疏编码算法讲起,上来就是找基向量(超完备基),但是我觉得其源头也比较有意思,知道根基的情况下,拓展其应用也比较有底气.哲学.神经科学.计算机科学.机器学习科学等领域的砖家.学生都想搞明白人类大脑皮层是如何处理外界信号的,大脑对外界的“印象”到底是什么东东.围绕这个问题,哲学家在那想.神经科学家在那用设备观察.计算机和机器学习科学家则是从数据理论和实验仿真上推倒.仿真.在神经编码和神经计算领域,我所能查到最早关于稀疏编码的文献是199