【CV知识学习】early stop、regularation、fine-tuning and some other trick to be known

深度学习有不少的trick,而且这些trick有时还挺管用的,所以,了解一些trick还是必要的。上篇说的normalization、initialization就是trick的一种,下面再总结一下自己看Deep Learning Summer School, Montreal 2016 总结的一些trick。请路过大牛指正~~~

early stop

“早停止”很好理解,就是在validation的error开始上升之前,就把网络的训练停止了。说到这里,把数据集分成train、validation和test也算是一种trick吧。看一张图就明白了:

L1、L2 regularization

第一次接触正则化是在学习PRML的回归模型,那时还不太明白,现在可以详细讲讲其中的原理了。额~~~~(I will write the surplus of this article in English)

At the very outset, we take a insight on L1, L2 regularization. Assume the loss function of a linear regression model as . In fact, L1, L2 regularization can be seen as introducing prior distribution for the parameters. Here, L1 regularization can be interpreted as Laplace prior, and Guass prior for L2 regularization.Such tricks are used to reduce the complexity of a model.

L2 regularization

As before, we make a hypothesis that the target variable  is determined by the function  with an additive guass noise ,  resulting in, where is a zero mean Guassian random variable with precision . Thus we know  and its loglikelihood function form can be wrote as

Finding the optimal solution for this problem is equal to minimize the least squre function .   If a zero mean  variance Guass prior is introduced for ,  we get the posterior distribution for  from the function (need to know prior + likelihood = posterior)

rewrite the equaltion in loglikelihood form, we will get

at the present,  you see . This derivation process accounts for why L2 regularization could be interpreted as Guass Prior.

L1 regularization

Generally, solving the the L1 regularization is called LASSO problem, which will force some coefficients to zero thus create a sparse model. You can get reference from here.

  

From the above images(two-dimensionality), something can be discovered that for the L1 regularization, the interaction points always locate on the axis, and this will forces some of the variables to be zero. Addtionally, we can learn that L2 regularization doesn‘t produce a sparse model(the right side image).

Fine tuning

It‘s easy to understand fine tuning, which could be interpreted as a way to initailizes the weight parameters then proceed a supervised learning. Parameters value can be migrated from a well-trained network, such as ImageNet. However, sometimes you may need an autoencoder, which is an unsupervised learning method and seems not very popular now. An autoencoder will learn the latent structure of the data in hidden layers, whose input and output should be same.

In fact, supervised learning of fine tune just performs regular feed forward process as in a feed-forwad network.

Then, I want to show you an image about autoencoder, also you can get more information in detail through this website.

Data argumentation

If training data is not large enough, it‘s a neccessary to expand the data by fliping, transferring or rotating to avoid overfit. However, according to the paper "Visualizing and Understanding Convolutional Networks",  CNN may not be as stable when fliping for image as transferring and rotating, except a symmetrical image.

Pooling

I have introduced why pooling works well in the ealier article in this blog, if interested, you may need patience to look through the catalogue. In the paper "Visualizing and Understanding Convolutional Networks", you will see pooling sometimes a good way to control the invariance, I want to write down my notes about this classical paper in next article.

Dropout and Drop layer

Dropout now has become a prevalent technique to prevent overfitting, proposed by Hinton, etc. The dropout program will inactivate some nodes according to a probability p, which has been fixed before training. It makes the network more thinner. But why does it works ?  There may exist two main reasons, (1) for a batch of input data, nodes can be trained better in a thinner network, for they trained with more data in average.  (2) dropout inject noise into network which will do help.

Inspired by dropout and ResNet, Huang, etc propose drop layer, which make use of the shortcut connection in ResNet. It will enforce the information flow through the shortcut connection directly and obstruct the normal way according to a probability just like what dropout does. This technique makes the network  shorter in training, and can be used to trained a deeper network even more than 1000 layers.

Depth is important

The chart below interprets the reason why depth is important,

as the number of layer increasing, the classification accuracy arises. While visualizing the feature extracted from layers using decovnet("Visualizing and Understanding Convolutional Networks"), the deeper layer the more abstract feature extracted. And abstract feature will strengthen the  robustness, i.e. the invariance to transformation.

时间: 2024-10-10 14:46:56

【CV知识学习】early stop、regularation、fine-tuning and some other trick to be known的相关文章

【CV知识学习】神经网络梯度与归一化问题总结+highway network、ResNet的思考

这是一篇水货写的笔记,希望路过的大牛可以指出其中的错误,带蒟蒻飞啊~ 一.    梯度消失/梯度爆炸的问题 首先来说说梯度消失问题产生的原因吧,虽然是已经被各大牛说烂的东西.不如先看一个简单的网络结构, 可以看到,如果输出层的值仅是输入层的值与权值矩阵W的线性组合,那么最终网络最终的输出会变成输入数据的线性组合.这样很明显没有办法模拟出非线性的情况.记得神经网络是可以拟合任意函数的.好了,既然需要非线性函数,那干脆加上非线性变换就好了.一般会使用sigmoid函数,得到,这个函数会把数据压缩到开

【CV知识学习】Fisher Vector

在论文<action recognition with improved trajectories>中看到fisher vector,所以学习一下.但网上很多的资料我觉得都写的不好,查了一遍,按照自己的认识陈述一下,望大牛指正. 核函数: 先来看一下<统计学习方法>里叙述的核函数的概念, 可以看到,核函数其实是一个内积,在SVM的公式可以提炼出内积的部分.数据在低维输入空间可能线性不可分,而在高维希尔伯特空间可能线性可分的,因此会经过一个映射函数.事实上,内积中可以理解为相似性即距

【CV知识学习】【转】beyond Bags of features for rec scenen categories。基于词袋模型改进的自然场景识别方法

原博文地址:http://www.cnblogs.com/nobadfish/articles/5244637.html 原论文名叫Byeond bags of features:Spatial Pyramid Matching for Recognizing Natural Scene Categories. 这篇文章的中心思想就是基于词袋模型+金字塔结构的识别算法.首先简单介绍词袋模型. 1.词袋模型 Bag of words模型也成为“词袋”模型,在最初多是用来做自然语言处理,Svetla

前端开发知识学习概要

前端开发工具 编辑器 editPlus sublime 浏览器: Chrome Safari Firebox IE 插件:firebug chrome: inspect element html 组成结构 1 <!DOCTYPE> 声明必须是 HTML 文档的第一行,位于 <html> 标签之前,不是 HTML 标签: 2 它是指示 web 浏览器关于页面使用哪个 HTML 版本进行编写的指令, 浏览器读取. 3 <html> 4 <head> 5 描述页面

【iOS知识学习】_iOS动态改变TableView Cell高度

在做tableView的时候,我们有时候需要根据cell的高度动态来调整,最近在网上看到一段代码不错,跟大家Share一下. 在 -(UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath{ 类中获取cell的高度: CGSize boundSize = CGSizeMake(216, CGFLOAT_MAX); cell.textLabel.text

搭建工具辅助的知识学习流水线

1 知识学习的困难 学习对于个人乃至社会都是非常重要的.社会的发展离不开知识的传承,个人的进步依赖知识的积累.但在实际的学习过程中可能会遇到很多困难: 1.     互联网发展促使信息大爆炸,如何在其中找到有价值的目标? 2.     如何从众多的知识渠道将零散的内容收集汇总,形成系统化的知识? 3.     在工作和生活之余,怎样利用碎片化的时间提高学习的效率? 4.     在相同的时间投入下,如何获取更高学习成果? 下面针对问题从知识学习的流程出发,分享一些辅助工具和方法.工具和方法相互配

计算机基础知识学习

第一周学习 一.知识方面,总得来说,就是学习计算机基础知识. 1.从计算机的发展,应用,组成,网络等方面了解: 2.对计算机操作系统的学习:常用的操作系统,操作系统的功能.分类: 3.办公软件的学习:Word\Excel\ppt,像Word中编号格式.自动生成目录,Excel中条件格式的应用,图表插入,以及各种基础函数的使用: =SUM(Eoo*$E$3,Foo*$F$3,Goo*$G$3)  表示:用E\F\G列的各数与E3\F3\G3相乘后求和,count(if)\a 计数函数, =COUN

ASP.NET知识重新梳理(二)------关于ASP.NET知识学习流程的一些理解

ASP.NET知识的学习流程我大概是这么理解的,首先我们必须打好C#的基础,若是之前没有学过C++之类的面向对象语言作为基础,还是要好好看看继承派生多态之类的区别和联系的:其次,当今的编程不仅仅只是我们在学校课堂上所学的控制台应用程序,我们还要学习winfom,WPF之类的本地窗体应用:第三,每个公司的技术方向都是不同的,但是共同的地方是都需要数据库来存储自己的数据,而且sqlserver,mysql,oracle都是大同小异的,所以如果你之前的编程学的不是很好,做一个DBA吧,入门难度也不是很

安全测试3_Web后端知识学习

其实中间还应该学习下web服务和数据库的基础,对于web服务大家可以回家玩下tomcat或者wamp等东西,数据库的话大家掌握基本的增删该查就好了,另外最好掌握下数据库的内置函数,如:concat().group_concat().concat_ws().load_file().hex().char().count().substring().into+outfile.LOAD DATA INFIL.limit等. 接下来接着学习Web后端知识学习 安全知识3_Web后端知识学习 1.PHP概述