<转>卷积神经网络是如何学习到平移不变的特征

After some thought, I do not believe that pooling operations are responsible for the translation invariant property in CNNs. I believe that invariance (at least to translation) is due to the convolution filters (not specifically the pooling) and due to the fully-connected layer.

For instance, let‘s use the Fig. 1 as reference:

The blue volume represents the input image, while the green and yellow volumes represent layer 1 and layer 2 output activation volumes (see CS231n Convolutional Neural Networks for Visual Recognition if you are not familiar with these volumes). At the end, we have a fully-connected layer that is connected to all activation points of the yellow volume.

These volumes are build using a convolution plus a pooling operation. The pooling operation reduces the height and width of these volumes, while the increasing number of filters in each layer increases the volume depth.

For the sake of the argument, let‘s suppose that we have very "ludic" filters, as show in Fig. 2:

  • the first layer filters (which will generate the green volume) detect eyes, noses and other basic shapes (in real CNNs, first layer filters will match lines and very basic textures);
  • The second layer filters (which will generate the yellow volume) detect faces, legs and other objects that are aggregations of the first layer filters. Again, this is only an example: real life convolution filters may detect objects that have no meaning to humans.

Now suppose that there is a face at one of the corners of the image (represented by two red and a magenta point). The two eyes are detected by the first filter, and therefore will represent two activations at the first slice of the green volume. The same happens for the nose, except that it is detected for the second filter and it appears at the second slice. Next, the face filter will find that there are two eyes and a nose next to each other, and it generates an activation at the yellow volume (within the same region of the face at the input image). Finally, the fully-connected layer detects that there is a face (and maybe a leg and an arm detected by other filters) and it outputs that it has detected an human body.

Now suppose that the face has moved to another corner of the image, as shown in Fig. 3:

The same number of activations occurs in this example, however they occur in a different region of the green and yellow volumes. Therefore, any activation point at the first slice of the yellow volume means that a face was detected, INDEPENDENTLY of the face location. Then the fully-connected layer is responsible to "translate" a face and two arms to an human body. In both examples, an activation was received at one of the fully-connected neurons. However, in each example, the activation path inside the FC layer was different, meaning that a correct learning at the FC layer is essential to ensure the invariance property.

It must be noticed that the polling operation only "compresses" the activation volumes, if there was no polling in this example, an activation at the first slice of the yellow volume would still mean a face.

In conclusion, what makes a CNN invariant to object translation is the architecture of the neural network: the convolution filters and the fully-connected layer. Additionally, I believe that if a CNN is trained showing faces only at one corner, during the learning process, the fully-connected layer may become insensitive to faces in other corners.

source:

https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features/answer/Jean-Da-Rolt

时间: 2024-10-17 07:11:24

<转>卷积神经网络是如何学习到平移不变的特征的相关文章

深度学习笔记1(卷积神经网络)

深度学习笔记1(卷积神经网络) 在看完了UFLDL教程之后,决定趁热打铁,继续深度学习的学习,主要想讲点卷积神经网络,卷积神经网络是深度学习的模型之一,还有其它如AutoEncoding.Deep Belief Network.Restricted Boltzmann Machine和sparse coding等. 在UFLDL教程中提到了针对大型图像的处理,使用卷积和池化的概念.原因主要对于全连接网络,需要的参数就有很多.比如对于一副1000*1000的图像,hidden layer也为100

深度学习卷积神经网络大事件一览

深度学习(DeepLearning)尤其是卷积神经网络(CNN)作为近几年来模式识别中的研究重点,受到人们越来越多的关注,相关的参考文献也是层出不穷,连续几年都占据了CVPR的半壁江山,但是万变不离其宗,那些在深度学习发展过程中起到至关重要的推动作用的经典文献依然值得回味,这里依据时间线索,对CNN发展过程中出现的一些经典文献稍作总结,方便大家在研究CNN时追本溯源,在汲取最新成果的同时不忘经典. 首先这里给出CNN在发展过程中的一些具有里程碑意义的事件和文献: 对于CNN最早可以追溯到1986

《卷积神经网络的Python实现》PDF代码+《解析深度学习卷积神经网络原理与视觉实践》PDF分析

CNN正在革新几个应用领域,如视觉识别系统.自动驾驶汽车.医学发现.创新电子商务等.需要在专业项目或个人方案中利用复杂的图像和视频数据集来实现先进.有效和高效的CNN模型. 深度卷积网络DCNN是目前十分流行的深度神经网络架构,它的构造清晰直观,效果引人入胜,在图像.视频.语音.语言领域都有广泛应用. 深度学习,特别是深度卷积神经网络是人工智能的重要分支领域,卷积神经网络技术也被广泛应用于各种现实场景,在许多问题上都取得了超越人类智能的结果. <卷积神经网络的Python实现>作为深度学习领域

深度学习(一) 卷积神经网络CNN

Contents 图像数据集基础 全连接神经网络解决图片问题的弊端(前世) 卷积神经网络的今生 网络结构 卷积操作 池化操作 小结 图像数据集基础 数字图像划分为彩色图像.灰度图像.二值图像和索引图像几种.其中,像素是构成图像的基本单位,例如一张28×28像素的图片,即表示横向有28个像素点,纵向有28个像素点. 最常用的彩色图像和灰度图像: 彩色图像:每个像素由RGB三个分量来表示,即红绿蓝.每个分量介于(0,255).那么,对于一个28×28的彩色图像,便可以由三个表示RGB颜色分量的28×

卷积神经网络(CNN)基础介绍

本文是对卷积神经网络的基础进行介绍,主要内容包含卷积神经网络概念.卷积神经网络结构.卷积神经网络求解.卷积神经网络LeNet-5结构分析.卷积神经网络注意事项. 一.卷积神经网络概念 上世纪60年代.Hubel等人通过对猫视觉皮层细胞的研究,提出了感受野这个概念.到80年代.Fukushima在感受野概念的基础之上提出了神经认知机的概念,能够看作是卷积神经网络的第一个实现网络,神经认知机将一个视觉模式分解成很多子模式(特征),然后进入分层递阶式相连的特征平面进行处理,它试图将视觉系统模型化,使其

人工智能中卷积神经网络基本原理综述

人工智能Artificial Intelligence中卷积神经网络Convolutional Neural Network基本原理综述 人工智能(Artificial Intelligence,简称AI)的Deep Learning(深度学习)通过机器学习,把某一层的输出output当做下一层的输入input.在人工智能中,认为output是机器通过深度学习获得的某种"智慧".深度学习(Deep Learning)通过神经网络把海量数据分组,然后形成组合分层结果,这样就形成了神经网络

TensorFlow实战之实现AlexNet经典卷积神经网络

本文已同步本人另外一个博客(http://blog.csdn.net/qq_37608890/article/details/79371347) 本文根据最近学习TensorFlow书籍网络文章的情况,特将一些学习心得做了总结,详情如下.如有不当之处,请各位大拿多多指点,在此谢过. 一.AlexNet模型及其基本原理阐述 1.关于AlexNet 2012年,AlexKrizhevsky提出了深度卷积神经网络模型AlexNet,可以看作LeNet的一种更深更宽的版本.该模型包含了6亿3000万个连

卷积神经网络-解释1

[翻译] 神经网络的直观解释 2017/07/27 17:36 这篇文章原地址为An Intuitive Explanation of Convolutional Neural Networks,卷积神经网络的讲解非常通俗易懂. 什么是卷积神经网络?为什么它们很重要? 卷积神经网络(ConvNets 或者 CNNs)属于神经网络的范畴,已经在诸如图像识别和分类的领域证明了其高效的能力.卷积神经网络可以成功识别人脸.物体和交通信号,从而为机器人和自动驾驶汽车提供视力. 在上图中,卷积神经网络可以识

Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.1

3.Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.1 http://blog.csdn.net/sunbow0 Spark MLlib Deep Learning工具箱,是根据现有深度学习教程<UFLDL教程>中的算法,在SparkMLlib中的实现.具体Spark MLlib Deep Learning(深度学习)目录结构: 第一章Neural Net(NN) 1.源码 2.源码解析 3.实例 第二章D