Sparse Autoencoder(二)

Gradient checking and advanced optimization

In this section, we describe a method for numerically checking the derivatives computed by your code to make sure that your implementation is correct. Carrying out the derivative checking procedure described here will significantly increase your confidence in the correctness of your code.

Suppose we want to minimize as a function of . For this example, suppose , so that . In this 1-dimensional case, one iteration of gradient descent is given by

Suppose also that we have implemented some function that purportedly computes , so that we implement gradient descent using the update .

Recall the mathematical definition of the derivative as

Thus, at any specific value of , we can numerically approximate the derivative as follows:

Thus, given a function that is supposedly computing , we can now numerically verify its correctness by checking that

The degree to which these two values should approximate each other will depend on the details of . But assuming , you‘ll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).

Suppose we have a function that purportedly computes ; we‘d like to check if is outputting correct derivative values. Let , where

is the -th basis vector (a vector of the same dimension as , with a "1" in the -th position and "0"s everywhere else). So, is the same as , except its -th element has been incremented by EPSILON. Similarly, let be the corresponding vector with the -th element decreased by EPSILON. We can now numerically verify ‘s correctness by checking, for each , that:

参数为向量,为了验证每一维的计算正确性,可以控制其他变量

When implementing backpropagation to train a neural network, in a correct implementation we will have that

This result shows that the final block of psuedo-code in Backpropagation Algorithm is indeed implementing gradient descent. To make sure your implementation of gradient descent is correct, it is usually very helpful to use the method described above to numerically compute the derivatives of , and thereby verify that your computations of and are indeed giving the derivatives you want.

Autoencoders and Sparsity

Anautoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses .

Here is an autoencoder:

we will write to denote the activation of this hidden unit when the network is given a specific input . Further, let

be the average activation of hidden unit (averaged over the training set). We would like to (approximately) enforce the constraint

where is a sparsity parameter, typically a small value close to zero (say ). In other words, we would like the average activation of each hidden neuron to be close to 0.05 (say). To satisfy this constraint, the hidden unit‘s activations must mostly be near 0.

To achieve this, we will add an extra penalty term to our optimization objective that penalizes deviating significantly from . Many choices of the penalty term will give reasonable results. We will choose the following:

Here, is the number of neurons in the hidden layer, and the index is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written

Our overall cost function is now

where is as defined previously, and controls the weight of the sparsity penalty term. The term (implicitly) depends on also, because it is the average activation of hidden unit , and the activation of a hidden unit depends on the parameters .

Visualizing a Trained Autoencoder

Consider the case of training an autoencoder on images, so that . Each hidden unit computes a function of the input:

We will visualize the function computed by hidden unit ---which depends on the parameters (ignoring the bias term for now)---using a 2D image. In particular, we think of as some non-linear feature of the input

If we suppose that the input is norm constrained by , then one can show (try doing this yourself) that the input which maximally activates hidden unit is given by setting pixel (for all 100 pixels, ) to

By displaying the image formed by these pixel intensity values, we can begin to understand what feature hidden unit is looking for.

对一幅图像进行Autoencoder ,前面的隐藏结点一般捕获的是边缘等初级特征,越靠后隐藏结点捕获的特征语义更深。

时间: 2024-10-02 07:19:31

Sparse Autoencoder(二)的相关文章

UFLDL实验报告2:Sparse Autoencoder

Sparse Autoencoder稀疏自编码器实验报告 1.Sparse Autoencoder稀疏自编码器实验描述 自编码神经网络是一种无监督学习算法,它使用了反向传播算法,并让目标值等于输入值,比如 .自编码神经网络尝试学习一个 的函数.换句话说,它尝试逼近一个恒等函数,从而使得输出 接近于输入 .当我们为自编码神经网络加入某些限制,比如给隐藏神经元加入稀疏性限制,那么自编码神经网络即使在隐藏神经元数量较多的情况下仍然可以发现输入数据中一些有趣的结构.稀疏性可以被简单地解释如下.如果当神经

七、Sparse Autoencoder介绍

目前为止,我们已经讨论了神经网络在有监督学习中的应用.在有监督学习中,训练样本是有类别标签的.现在假设我们只有一个没有带类别标签的训练样本集合  ,其中  .自编码神经网络是一种无监督学习算法,它使用了反向传播算法,并让目标值等于输入值,比如  .下图是一个自编码神经网络的示例. 自编码神经网络尝试学习一个  的函数.换句话说,它尝试逼近一个恒等函数,从而使得输出  接近于输入  .恒等函数虽然看上去不太有学习的意义,但是当我们为自编码神经网络加入某些限制,比如限定隐藏神经元的数量,我们就可以从

【转帖】Andrew ng 【Sparse Autoencoder 】@UFLDL Tutorial

Neural Networks From Ufldl Jump to: navigation, search Consider a supervised learning problem where we have access to labeled training examples (x(i),y(i)).  Neural networks give a way of defining a complex, non-linear form of hypotheses hW,b(x), wit

【DeepLearning】Exercise:Sparse Autoencoder

习题的链接:http://deeplearning.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder 我的实现: sampleIMAGES.m function patches = sampleIMAGES() % sampleIMAGES % Returns 10000 patches for training load IMAGES; % load images from disk patchsize = 8; % we'll u

Exercise:Sparse Autoencoder

斯坦福deep learning教程中的自稀疏编码器的练习,主要是参考了   http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724.html,没有参考肯定编不出来...Σ( ° △ °|||)︴  也当自己理解了一下 这里的自稀疏编码器,练习上规定是64个输入节点,25个隐藏层节点(我实验中只有20个),输出层也是64个节点,一共有10000个训练样本 具体步骤: 首先在页面上下载sparseae_exercise.zip S

sparse autoencoder

1.autoencoder autoencoder的目标是通过学习函数,获得其隐藏层作为学习到的新特征. 从L1到L2的过程成为解构,从L2到L3的过程称为重构. 每一层的输出使用sigmoid方法,因为其输出介于0和1之间,所以需要对输入进行正规化 使用差的平方作为损失函数 2.sparse spare的含义是,要求隐藏层每次只有少数的神经元被激活: 隐藏层的输出a,a接近于0,称为未激活 a接近1,成为激活 使用如下方法衡量: 每个隐藏层的神经元有p的概率为激活,1-p的概率未激活(p一般取

Sparse Autoencoder稀疏自动编码

本系列文章都是关于UFLDL Tutorial的学习笔记 Neural Networks 对于一个有监督的学习问题,训练样本输入形式为(x(i),y(i)).使用神经网络我们可以找到一个复杂的非线性的假设h(x(i))可以拟合我们的数据y(i).我们先观察一个神经元的机制: 每个神经元是一个计算单元,输入为x1,x2,x3,输出为: 其中f()是激活函数,常用的激活函数是S函数: S函数的形状如下,它有一个很好的性质就是导数很方便求:f'(z) = f(z)(1 ? f(z)): 还有一个常见的

Sparse Autoencoder(一)

Neural Networks We will use the following diagram to denote a single neuron: This "neuron" is a computational unit that takes as input x1,x2,x3 (and a +1 intercept term), and outputs , where is called the activation function. In these notes, we

Deep Learning论文笔记之(二)Sparse Filtering稀疏滤波

Deep Learning论文笔记之(二)Sparse Filtering稀疏滤波          自己平时看了一些论文,但老感觉看完过后就会慢慢的淡忘,某一天重新拾起来的时候又好像没有看过一样.所以想习惯地把一些感觉有用的论文中的知识点总结整理一下,一方面在整理过程中,自己的理解也会更深,另一方面也方便未来自己的勘察.更好的还可以放到博客上面与大家交流.因为基础有限,所以对论文的一些理解可能不太正确,还望大家不吝指正交流,谢谢. 本文的论文来自: Sparse filtering, J. N