Linear Decoders

Sparse Autoencoder Recap

In the sparse autoencoder, we had 3 layers of neurons: an input layer, a hidden layer and an output layer. In our previous description of autoencoders (and of neural networks), every neuron in the neural network used the same activation function. In these notes, we describe a modified version of the autoencoder in which some of the neurons use a different activation function. This will result in a model that is sometimes simpler to apply, and can also be more robust to variations in the parameters.

Recall that each neuron (in the output layer) computed the following:

where a⁽³⁾ is the output. In the autoencoder, a⁽³⁾ is our approximate reconstruction of the input x = a⁽¹⁾.

Because we used a sigmoid activation function for f(z⁽³⁾), we needed to constrain or scale the inputs to be in the range[0,1], since the sigmoid function outputs numbers in the range [0,1].

引入 —— 相同的activation function ，非线性映射会导致输入和输出不等

Linear Decoder

One easy fix for this problem is to set a⁽³⁾ = z⁽³⁾. Formally, this is achieved by having the output nodes use an activation function that‘s the identity function f(z) = z, so that a⁽³⁾ = f(z⁽³⁾) = z⁽³⁾. 输出结点不适用sigmoid 函数

This particular activation function is called the linear activation function。Note however that in the hidden layer of the network, we still use a sigmoid (or tanh) activation function, so that the hidden unit activations are given by (say) , where is the sigmoid function, x is the input, and W⁽¹⁾ andb⁽¹⁾ are the weight and bias terms for the hidden units. It is only in the output layer that we use the linear activation function.

An autoencoder in this configuration--with a sigmoid (or tanh) hidden layer and a linear output layer--is called a linear decoder.

In this model, we have . Because the output is a now linear function of the hidden unit activations, by varying W⁽²⁾, each output unit a⁽³⁾ can be made to produce values greater than 1 or less than 0 as well. This allows us to train the sparse autoencoder real-valued inputs without needing to pre-scale every example to a specific range.

Since we have changed the activation function of the output units, the gradients of the output units also change. Recall that for each output unit, we had set set the error terms as follows:

where y = x is the desired output, is the output of our autoencoder, and is our activation function. Because in the output layer we now have f(z) = z, that implies f‘(z) = 1 and thus the above now simplifies to:

output结点

Of course, when using backpropagation to compute the error terms for the hidden layer:

hidden结点

Because the hidden layer is using a sigmoid (or tanh) activation f, in the equation above should still be the derivative of the sigmoid (or tanh) function.

时间： 2024-11-10 05:22:38

Linear Decoders的相关文章

Deep Learning4: Linear Decoders with Autoencoders

对于 Linear Decoders设定,a(3) = z(3)则称之为线性编码 sigmoid激活函数要求输入范围在[0,1]之间,某些数据集很难满足,则采用线性编码此时,误差项更新为

（六）6.16 Neurons Networks linear decoders and its implements

Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对输出层与隐层采用了不用的激励函数,所以 Linear Decoder 得到的模型更容易应用,而且对模型的参数变化有更高的鲁棒性. 在网络中的前向传导过程中的公式: 其中 a(3) 是输出. 在自编码器中, a(3) 近似重构了输入 x = a(1) . 对于最后一层为 sigmod(tanh) 激活

【转帖】UFLDL Tutorial（the main ideas of Unsupervised Feature Learning and Deep Learning）

UFLDL Tutorial From Ufldl Jump to: navigation, search Description: This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep le

Deep Learning基础--线性解码器、卷积、池化

本文主要是学习下Linear Decoder已经在大图片中经常采用的技术convolution和pooling,分别参考网页http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial中对应的章节部分. Linear Decoders: 以三层的稀疏编码神经网络而言,在sparse autoencoder中的输出层满足下面的公式: 从公式中可以看出,a3的输出值是f函数的输出,而在普通的sparse autoencoder中f函数一

Gradient-based learning applied to document recognition(转载)

Deep learning:三十八(Stacked CNN简单介绍) 前言: 本节主要是来简单介绍下stacked CNN(深度卷积网络),起源于本人在构建SAE网络时的一点困惑:见Deep learning:三十六(关于构建深度卷积SAE网络的一点困惑).因为有时候针对大图片进行recognition时,需要用到无监督学习的方法去pre-training(预训练)stacked CNN的每层网络,然后用BP算法对整个网络进行fine-tuning(微调),并且上一层的输出作为下一层的输入.这几

LaNet-5学习资源记录

1.卷积神经网络(包含lenet-5 的例子和简化实现,出略介绍) http://blog.csdn.net/yeyang911/article/details/12103885 2 caffe源码分析--poolinger_layer.cpp http://blog.csdn.net/lingerlanlan/article/details/38294169 3 caffe源码分析--SyncedMemory类代码研究 http://blog.csdn.net/lingerlanlan/art

Deep learning：三十八(Stacked CNN简单介绍)

http://www.cnblogs.com/tornadomeet/archive/2013/05/05/3061457.html 前言: 本节主要是来简单介绍下stacked CNN(深度卷积网络),起源于本人在构建SAE网络时的一点困惑:见Deep learning:三十六(关于构建深度卷积SAE网络的一点困惑).因为有时候针对大图片进行recognition时,需要用到无监督学习的方法去pre-training(预训练)stacked CNN的每层网络,然后用BP算法对整个网络进行fin

Spark MLlib Linear Regression线性回归算法

1.Spark MLlib Linear Regression线性回归算法 1.1 线性回归算法 1.1.1 基础理论在统计学中,线性回归(Linear Regression)是利用称为线性回归方程的最小平方函数对一个或多个自变量和因变量之间关系进行建模的一种回归分析.这种函数是一个或多个称为回归系数的模型参数的线性组合. 回归分析中,只包括一个自变量和一个因变量,且二者的关系可用一条直线近似表示,这种回归分析称为一元线性回归分析.如果回归分析中包括两个或两个以上的自变量,且因变量和自变量之间

ReLu(Rectified Linear Units)激活函数

ReLu(Rectified Linear Units)激活函数论文参考:Deep Sparse Rectifier Neural Networks (很有趣的一篇paper) 起源:传统激活函数.脑神经元激活频率研究.稀疏激活性传统Sigmoid系激活函数传统神经网络中最常用的两个激活函数,Sigmoid系(Logistic-Sigmoid.Tanh-Sigmoid)被视为神经网络的核心所在. 从数学上来看,非线性的Sigmoid函数对中央区的信号增益较大,对两侧区的信号增益小,在信号的