Autoencoders and Sparsity(一)

An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses .

Here is an autoencoder:

The autoencoder tries to learn a function . In other words, it is trying to learn an approximation to the identity function, so as to output that is similar to . The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.

例子&用途

As a concrete example, suppose the inputs are the pixel intensity values from a image (100 pixels) so , and there are hidden units in layer . Note that we also have . Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations , it must try to reconstruct the 100-pixel input . If the input were completely random---say, each comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs

约束

Our argument above relied on the number of hidden units being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.

Recall that denotes the activation of hidden unit in the autoencoder. However, this notation doesn‘t make explicit what was the input that led to that activation. Thus, we will write to denote the activation of this hidden unit when the network is given a specific input . Further, let

be the average activation of hidden unit (averaged over the training set). We would like to (approximately) enforce the constraint

where is a sparsity parameter, typically a small value close to zero (say ). In other words, we would like the average activation of each hidden neuron to be close to 0.05 (say). To satisfy this constraint, the hidden unit‘s activations must mostly be near 0.

To achieve this, we will add an extra penalty term to our optimization objective   that penalizes deviating significantly from . Many choices of the penalty term will give reasonable results. We will choose the following:

Here, is the number of neurons in the hidden layer, and the index is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written

where is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean and a Bernoulli random variable with mean . KL-divergence is a standard function for measuring how different two different distributions are.

偏离,惩罚

损失函数

无稀疏约束时网络的损失函数表达式如下:

带稀疏约束的损失函数如下:

where is as defined previously, and controls the weight of the sparsity penalty term. The term (implicitly) depends on also, because it is the average activation of hidden unit , and the activation of a hidden unit depends on the parameters .

损失函数的偏导数的求法

而加入了稀疏性后,神经元节点的误差表达式由公式:

变成公式:

梯度下降法求解

有了损失函数及其偏导数后就可以采用梯度下降法来求网络最优化的参数了,整个流程如下所示:

从上面的公式可以看出,损失函数的偏导其实是个累加过程,每来一个样本数据就累加一次。这是因为损失函数本身就是由每个训练样本的损失叠加而成的,而按照加法的求导法则,损失函数的偏导也应该是由各个训练样本所损失的偏导叠加而成。从这里可以看出,训练样本输入网络的顺序并不重要,因为每个训练样本所进行的操作是等价的,后面样本的输入所产生的结果并不依靠前一次输入结果(只是简单的累加而已,而这里的累加是顺序无关的)。

转自:http://www.cnblogs.com/tornadomeet/archive/2013/03/19/2970101.html

时间: 2024-10-06 06:15:20

Autoencoders and Sparsity(一)的相关文章

Autoencoders and Sparsity(二)

In this problem set, you will implement the sparse autoencoder algorithm, and show how it discovers that edges are a good representation for natural images. Step 1: Generate training set Step 2: Sparse autoencoder objective Step 3: Gradient checking

【转帖】Andrew ng 【Sparse Autoencoder 】@UFLDL Tutorial

Neural Networks From Ufldl Jump to: navigation, search Consider a supervised learning problem where we have access to labeled training examples (x(i),y(i)).  Neural networks give a way of defining a complex, non-linear form of hypotheses hW,b(x), wit

【转帖】UFLDL Tutorial(the main ideas of Unsupervised Feature Learning and Deep Learning)

UFLDL Tutorial From Ufldl Jump to: navigation, search Description: This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning.  By working through it, you will also get to implement several feature learning/deep le

【转帖】【面向代码】学习 Deep Learning(一)Neural Network

最近一直在看Deep Learning,各类博客.论文看得不少 但是说实话,这样做有些疏于实现,一来呢自己的电脑也不是很好,二来呢我目前也没能力自己去写一个toolbox 只是跟着Andrew Ng的UFLDL tutorial 写了些已有框架的代码(这部分的代码见github) 后来发现了一个matlab的Deep Learning的toolbox,发现其代码很简单,感觉比较适合用来学习算法 再一个就是matlab的实现可以省略掉很多数据结构的代码,使算法思路非常清晰 所以我想在解读这个too

【DeepLearning】UFLDL错误记录

Autoencoders and Sparsity章节公式错误: s2 应为 s3. 意为从第2层(隐藏层)i节点到输出层j节点的误差加权和.

Sparse Autoencoder(二)

Gradient checking and advanced optimization In this section, we describe a method for numerically checking the derivatives computed by your code to make sure that your implementation is correct. Carrying out the derivative checking procedure describe

【DeepLearning】Exercise:Learning color features with Sparse Autoencoders

Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with Sparse Autoencoders sparseAutoencoderLinearCost.m function [cost,grad] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ... lambda, spar

Extracting and composing robust features with denosing autoencoders 论文

这是一篇发表于2008年初的论文. 文章主要讲了利用 denosing autoencoder来学习 robust的中间特征..进上步,说明,利用这个方法,可以初始化神经网络的权值..这就相当于一种非监督学习的方法来训练神经网络. 当我们在用神经网络解决各种识别任务时,如果我们想要网络的性能更好,就需要更深层或更wider的神经网络来建模,Model出更复杂的分布.  网络变深以后,如何训练是一个很重要问题,如果训练不好,深层网络的性能真的不如浅层的神经网络.. 在训练深层网络的解决方法的道路上

A Statistical View of Deep Learning (II): Auto-encoders and Free Energy

A Statistical View of Deep Learning (II): Auto-encoders and Free Energy With the success of discriminative modelling using deep feedforward neural networks (or using an alternative statistical lens, recursive generalised linear models) in numerous in