烦烦烦

other_techniques_for_regularization

随手翻译,略作参考,禁止转载

www.cnblogs.com/santian/p/5457412.html

Dropout: Dropout is a radically different technique for regularization. Unlike L1 and L2 regularization, dropout doesn‘t rely on modifying the cost function. Instead, in dropout we modify the network itself. Let me describe the basic mechanics of how dropout works, before getting into why it works, and what the results are.

Suppose we‘re trying to train a network:

Dropout 技术:Dropout是一个同正则化完全不同的技术,与L1和L2范式正则化不同。dropout并不会修改代价函数而是修改深度网络本身。在我描述dropout的工作机制和dropout导致何种结果前,让我们假设我们正在训练如下一个网络。

In particular, suppose we have a training input xx and corresponding desired output yy. Ordinarily, we‘d train by forward-propagating xxthrough the network, and then backpropagating to determine the contribution to the gradient. With dropout, this process is modified. We start by randomly (and temporarily) deleting half the hidden neurons in the network, while leaving the input and output neurons untouched. After doing this, we‘ll end up with a network along the following lines. Note that the dropout neurons, i.e., the neurons which have been temporarily deleted, are still ghosted in:

特别的。假设我们有一个输入xx并且相关的输入yy的训练。通常的我们将首先通过前馈网络把xx输入我们随机初始化权重后的网络。然后反向传播拿到对梯度的影响。也就是根据误差,根据链式法则反向拿到对相应权重的偏微分。但是,使用dropout技术的话。相关的处理就完全不同了。在开始训练的时候我们随机的(临时)删除一般的神经元。但是输入层和输出层不做变动。对深度网络dropout后。我们将会得到下图中这样类似的网络。注意。下图中的虚线存在的网络就是我们临时删除的。

We forward-propagate the input xx through the modified network, and then backpropagate the result, also through the modified network. After doing this over a mini-batch of examples, we update the appropriate weights and biases. We then repeat the process, first restoring the dropout neurons, then choosing a new random subset of hidden neurons to delete, estimating the gradient for a different mini-batch, and updating the weights and biases in the network.

我们前向传播输入项xx通过修改后的网络。然后反向传播拿到的结果通过修改后的网络。对此昨晚一个样本化的迷你批次的样本后。我们更新相应的权重和偏置。这样重复迭代处理。首先存储dropout的神经元,然后选择一个新的随机隐层神经元的子集去删除。估计不同样本批次的梯度。最后更新网络的权重和偏置。

By repeating this process over and over, our network will learn a set of weights and biases. Of course, those weights and biases will have been learnt under conditions in which half the hidden neurons were dropped out. When we actually run the full network that means that twice as many hidden neurons will be active. To compensate for that, we halve the weights outgoing from the hidden neurons.

通过不断的重复处理。我们的网络将会学到一系列的权重和偏置参数。当然这些参数是在一半的隐层神经元被dropped out(临时删除的)情况下学习到的。当我们真正的运行整个神经网络的时候意味着两倍多的隐层神经元将被激活。为了抵消此影响。我将从隐层的权重输出减半。

This dropout procedure may seem strange and ad hoc. Why would we expect it to help with regularization? To explain what‘s going on, I‘d like you to briefly stop thinking about dropout, and instead imagine training neural networks in the standard way (no dropout). In particular, imagine we train several different neural networks, all using the same training data. Of course, the networks may not start out identical, and as a result after training they may sometimes give different results. When that happens we could use some kind of averaging or voting scheme to decide which output to accept. For instance, if we have trained five networks, and three of them are classifying a digit as a "3", then it probably really is a "3". The other two networks are probably just making a mistake. This kind of averaging scheme is often found to be a powerful (though expensive) way of reducing overfitting. The reason is that the different networks may overfit in different ways, and averaging may help eliminate that kind of overfitting.

dropout处理看起来是奇怪并且没有规律的。为什么我们希望他对正则化有帮助呢。来解释dropout到底发生了什么。我们先不要思考dropout技术。而是想象我们用一个正常的方式训练一个神经网络。特别的。假设我们训练了几个完全不同的神经网络。用的是完全相同的训练数据。当然了。因为随机初始化参数或其他原因。训练得到的结果也许是不同的。当这种情况发生的时候,我们就可以平均这几种网络的结果,或者根据相应的规则决定使用哪一种神经网络输出的结果。例如。如果我们训练了五个网络。其中三个分类一个数字为3,最终的结果就是他是3的可能性更大一些。其他的两个网络也许有些错误。这种平均的架构被发现通常是十分有用的来减少过拟合。(当然这种训练多个网络的代价也是昂贵的。)出现这种结果的原因就是不同的网络也是在不同的方式上过你和。通过平均可以排除掉这种过拟合的。

What‘s this got to do with dropout? Heuristically, when we dropout different sets of neurons, it‘s rather like we‘re training different neural networks. And so the dropout procedure is like averaging the effects of a very large number of different networks. The different networks will overfit in different ways, and so, hopefully, the net effect of dropout will be to reduce overfitting.

这种现象与dropout这种技术有什么作用的。启发式的我们发现。dropout不同设置的神经元和我们训练几种不同的神经网络很像。因此,dropout处理很像是平均一个大量不同网络的平均结果。不同的网络在不同的情况下过拟合。因此,很大程度上。dropout将会减少这种过拟合。

A related heuristic explanation for dropout is given in one of the earliest papers to use the technique(**ImageNet Classification with Deep Convolutional Neural Networks, by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (2012).): "This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons." In other words, if we think of our network as a model which is making predictions, then we can think of dropout as a way of making sure that the model is robust to the loss of any individual piece of evidence. In this, it‘s somewhat similar to L1 and L2 regularization, which tend to reduce weights, and thus make the network more robust to losing any individual connection in the network.

一个相关的早期使用这种技术的论文((**ImageNet Classification with Deep Convolutional Neural Networks, by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (2012).))中启发性的dropout解释是:这种技术减少了神经元之间复杂的共适性。因为一个神经元不能依赖其他特定的神经元。因此,不得不去学习随机子集神经元间的鲁棒性的有用连接。换句话说。想象我们的神经元作为要给预测的模型,dropout是一种方式可以确保我们的模型在丢失一个个体线索的情况下保持健壮的模型。在这种情况下,可以说他的作用和L1和L2范式正则化是相同的。都是来减少权重连接,然后增加网络模型在缺失个体连接信息情况下的鲁棒性。

Of course, the true measure of dropout is that it has been very successful in improving the performance of neural networks. The original paper(**Improving neural networks by preventing co-adaptation of feature detectors by Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov (2012). Note that the paper discusses a number of subtleties that I have glossed over in this brief introduction.) introducing the technique applied it to many different tasks. For us, it‘s of particular interest that they applied dropout to MNIST digit classification, using a vanilla feedforward neural network along lines similar to those we‘ve been considering. The paper noted that the best result anyone had achieved up to that point using such an architecture was 98.498.4 percent classification accuracy on the test set. They improved that to 98.798.7 percent accuracy using a combination of dropout and a modified form of L2 regularization. Similarly impressive results have been obtained for many other tasks, including problems in image and speech recognition, and natural language processing. Dropout has been especially useful in training large, deep networks, where the problem of overfitting is often acute.

当然,真正使dropout作为一个强大工具的原因是它在提高神经网络的表现方面是非常成功的。原始的dropout被发现的论文()介绍了这种技术对不同任务执行的结果。对我们来说。我们对dropout这种技术对手写字识别的提升特别感兴趣。用一个毫无新意的前馈神经网络。论文表明最好的结果实现的是98.4984的正确率。通过使用dropout和L2范式正则化。正确率提升到了98.7987.同样显著的效果也在其他任务中得到了体现。包括图像识别,语音识别,自然语言处理。大型深度网络过拟合现象很突出。dropout在训练大型的深度网络的时候在解决过拟合问题的非常有用。

时间: 2024-10-25 04:16:02

烦烦烦的相关文章

杨幂曝马天宇丑照为其庆生,马天宇:烦不烦啊,我不要面子啊!

今天是马天宇31岁生日.中午,他以巩俐的口吻发文:你们好,我是巩俐.王菲.王凯.王大陆.杨幂.郑爽.戚薇.李易峰.韩东君.万茜.陈伟霆.董子健.孙怡.张惠妹.布莱德皮特.安吉丽娜朱莉.玛丽亚凯莉.阿黛尔.让我祝马天宇小朋友生日快乐 PS:以上排名不分先后,想起谁来是谁. 过个生日能让中外明星都祝福的估计也只有马天宇小朋友了吧,哈哈.虽然说排名不分先后,但是无论什么时候,马天宇都会不由自主的把王菲排在最前面,因为他是一位真正的菲迷. 马天宇29岁生日的时候,就曾以王菲的口吻发文:大家好,我是王菲.

士两复摆痰烦的烦

http://www.itouxian.com/182955 http://www.itouxian.com/0uyo4oas http://www.itouxian.com/182957 http://www.itouxian.com/m0m0u8mw http://www.itouxian.com/182959 http://www.itouxian.com/19jjz3x1 http://www.itouxian.com/182961 http://www.itouxian.com/o42

豆腐干地方好烦好烦

http://weheartit.com/g46h7t/collections/81825583-2015-1-13 http://weheartit.com/g46h7t/collections/81825580-2015-1-13 http://weheartit.com/g46h7t/collections/81825586-2015-1-13 http://weheartit.com/g46h7t/collections/81825588-2015-1-13 http://weheart

Struts文件上传allowedTypes问题,烦人的“允许上传的文件类型”

Struts的文件上传问题,相信很多人都会使用allowedTypes参数来配置允许上传的文件类型,如下. [html] view plaincopy <param name="allowedTypes"> image/png,image/bmp,image/jpg </param> 但是,用过这个参数的人都知道,allowedTypes是“文件类型”, 而不是“文件后缀名”,文件类型与文件后缀名有什么区别呢? 就如后缀名为bmp的图片的文件类型为image/b

VIJOS PID221 / 烦人的幻灯片

 暴力出奇迹,古人诚不欺我. PID221 / 烦人的幻灯片 2017-04-14 19:47:08 运行耗时:30 ms 运行内存:12292 KB 查看最后一次评测记录 题目描述 李教授于今天下午做一个非常重要的演讲.不幸的是他不是一个非常爱整洁的人,他把自己做演讲要用的幻灯片随便堆放在一起.因此,演讲之前他不得不去整理这些幻灯片.做为一个讲求效率的学者,他希望尽可能简单地完成它.情况是这样,教授这次演讲一共要用n张幻灯片(n<=26),这n张幻灯片按照演讲要使用的顺序已经用数字1,2,-,

今天烦死了,各种技术,各种问题,全栈式多屏工程师不好做啊

感言:全栈式多屏工程师不好做啊 今天,是最近写代码最烦的一天啊,遇到各种问题. 1.公司项目,发短信不成功.    其中一个平台的短信发送不成功,这个真不能怪我.   一是由于,HTTP短信接口的API地址不对,404.   而是由于,WebService接口,没有SDK包,只有SDK的文档. 2.表格组件grid有问题.   无论怎么尝试,分页总是失败.   秒针原来某个同事写的grid组件,用的比较多,但是2.0和3.0有很大变化.   参数的格式在变化,接受参数的方式也在变化,Spring

★★★★★★★★★★★★★★★★★★啊好烦,写代码写的要吐血了哦

全数字好域名疯狂抢99%成功率|域名注册|已备案未注册|到期删除过期域名查询|-尽在(爱酷名_ikuMing.com) 啊好烦,写代码写的要吐血了哦 写代码真的好痛苦啊,快坚持不下去了,谁来拯救我一下哦

4735 烦人的幻灯片 (拓扑)

4735 烦人的幻灯片 时间限制: 2 s 空间限制: 128000 KB 题目等级 : 黄金 Gold 题目描述 Description 帅气的作者将于今天下午作一次非常重要的演讲.不幸的是他不是一个非常爱整洁的人,他把自己演讲要用的幻灯片随便堆在了一起.因此,演讲之前他不得不去整理这些幻灯片.作为一个讲求效率的oier,他希望尽可能简单地完成它.帅气的作者这次演讲一共要用n张幻灯片(n<=26),这n张幻灯片按照演讲要使用的顺序已经用数字1~n编了号.因为幻灯片是透明的,所以我们不能一下子看

Android进阶(二十七)Android原生扰人烦的布局

Android原生扰人烦的布局 在开发Android应用时,UI布局是一件令人烦恼的事情.下面主要讲解一下Android中的界面布局. 一.线性布局(LinearLayout) 线性布局分为: (1)垂直线性布局: (2)水平线性布局: 针对这两种区别,只是一个属性的区别 <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android" android:orientation="vertic

去除下载文件属性中烦人的锁定状态

我们用浏览器下载文件的时候,往往会附加一个锁定的状态,执行些被锁定的文时,会出现一个安全警告框. 要去除这个安全警告,必须在右键的属性选项中点击"解除锁定"的按钮: 虽然这个是为了安全考虑,但是很多时候还是觉得非常烦人的.之前在网上找了个注册表可以去掉这个自动锁定的功能,但最近重装系统后,发现这个烦人的特性又回来了.便再次搜索了一下,发现园子里有篇文章介绍得比较详细:Windows沙拉:为什么下载的文件打开时会有警告,而且会被"锁定"? 该文章也介绍了两种解决的方法