Deeplearning - Overview of Convolution Neural Network

Finally pass all the Deeplearning.ai courses in March! I highly recommend it!

If you already know the basic then you may be interested in course 4 & 5, which shows many interesting cases in CNN and RNN. Altough I do think that 1 & 2 is better structured than others, which give me more insight into NN.

I have uploaded the assignment of all the deep learning courses to my GitHub. You can find the assignment for CNN here. Hopefully it can give you some help when you struggle with the grader. For a new course, you indeed need more patience to fight with the grader. Don‘t ask me how I know this ... >_<

I have finished the summary of the first course in my pervious post:

  1. Sigmoid and shallow NN.
  2. Forward & Backward Propogation,
  3. Regularization

I will keep working on the others. Since I am using CNN at work recently, let‘s go through CNN first. Any feedback is absolutely welcomed! And please correct me if I make any mistake.



When talking about CNN, image application is usually what comes to our mind first. While actually CNN can be more generally applied to differnt data that fits certain assumption. what assumption? You will know later.

1. CNN Features

CNN stands out from traditional NN in 3 area:

  • sparse interaction (connection)
  • parameter sharing
  • equivariant representation.

Actually the third feature is more like a result of the first 2 features. Let‘s go through them one by one.

Fully Connected NN NN with Sparse connection

Sparse interaction, unlike fullly connected neural network, for Convolution layer each output is only connected to limited inputs like above. For a hidden layer that takes \(m\) neurons as input and \(n\) neurons as output, a fullly connected hidden layer has a weight matrix of size \(m*n\) to compute each ouput. When \(m\) is very big, the weigt can be a huge matrix. With sparse connection, only \(k\) input is connected to each output, leading to a decrease in computation scale from \(O(m*n)\) to \(O(k*n)\). And a decrease in memory usage from \(m*n\) to \(k*n\).

Parameter sharing has more insight when considered together with sparse connection. Becasue sparse connection creates segmentation among data. For example \(x_1\) \(x_5\) is independent in above plot due to sparse connection. However with parameter sharing, same weight matrix is used across all positions, leading to a hidden connectivity. Additionally, it can further reduces the memory storage of weight matrix from \(k*n\) to \(k\). Especially when dealing with image, from \(m*n\) to \(k\) can be a huge improvment in memory usage.

Equivariant representation is a result of parameter sharing. Because same weight matrix is used at different position across input. So the output is invaritate to parallel movemnt. Say \(g\) represent parallel shift and \(f\) is the convolution function, then \(f(g(x)) = g(f(x))\). This feature can be very useful when we only care about the presence of feature not their position. But on the other hand this can be a big flaw of CNN that it is not good at detecting position.

2. CNN Components

Given the above 3 features, let‘s talk about how to implement CNN.

(1).Kernel

Kernel, or so-called filter, is the weight matrix in CNN. It implements element-wise computation across input matrix and output the sum. Kernel usually has a size that is much smaller than the original input so that we can take advantage of decrease in memory.

Below is a 2D input of convolution layer. It can be greyscale image, or multivarate timeseries.

When input is 3D dimension, we call the 3rd dimension Channel(volume). The most common case is the RGB image input, where each channel is a 2D matrix representing one color. See below:

Please keep in mind that Kernel always have same number of channel as input! Therefore it leads to dimension reduction in all dimensions (unless you use 1*1 kernel). But we can have multiple kernels to capture differnt features. Like below, we have 2 kernels(filters), each has dimension (3,3,3).

Dimension Cheatsheet of Kernel

  • Input Dimension ( n_w, n_h, n_channel ). When n_channel = 1, it is a 2D input.
  • Kernel Dimension ( n_k, n_k, n_channel ). Kernel is not always a square, it can be ( n_k1, n_k2, n_channel )
  • Output Dimension (n_w - n_k + 1, n_h - n_k + 1, 1 )
  • When we have n different kernels, output dimension will be (n_w - n_k + 1, n_h - n_k + 1, n)

(2). Stride

Like we mention before, one key advantage of CNN is to speeed up computation using dimension reduction. Can we be more aggresive on this ?! Yes we can use Stride! Basically stride is when moving kernel across input, it skips certain input by certain length.

We can easily tell how stride works by below comparison:

No Stride

Stride = 1

Thanks vdumoulin for such great animation. You can find more at his github

Stride can further speed up computation, but it will lose some feature in the output. We can consider it as output downsampling.

(3). Padding

Both Kernel and Stride function as dimension reduction technic. So for each convolution layer, the output dimension will always be smaller than input. However if we want to build a deep convolution network, we don‘t want the input size to shrink too fast. A small kernel can partly solve this problem. But in order to maintain certain dimension we need zero padding. Basically it is adding zero to your input, like below:

Padding = 1

There is a few types of padding that are frequently used:

  • Valid padding: no padding at all, output = input - (K - 1)
  • Same padding: maintain samesize, output = input
  • Full padding: each input is visited k times, output = input + (k - 1)

To summarize, We use \(s\) to denote stride, and \(p\) denotes padding. \(n\) is the input size, \(k\) is kernel size (kernel and input are both sqaure for simplicity). Then output dimension will be following:

\[\lfloor (n+2p-k)/s\rfloor +1\]

(4). Pooling

I remember in a latest paper of CNN, the author says that I can‘t explain why I add pooling layer, but a good CNN structure always comes with a pooling layer.

Pooling functions as a dimension reduction Technic. But unlike Kernel which reduces all dimensions, pooling keep channel dimension untouched. Therefore it can further accelerate computation.

Basicallly Pooling ouputs a certain statistics for a certain amoung of input. This introduces a feature stronger than Equivariant representation -- Invariant representation.

The mainly used Pooling is max and average pooling. And there is L2, and weighted average, and etc.

3. CNN structure

(1). Intuition of CNN

In Deep Learning book, author gives a very interesting insight. He consider convolution and pooling as a infinite strong prior distribution. The distribution indicates that all hidden units share the same weight, derived from certain amount of the input and have parallel invariant feature.

Under Bayesian statistics, prior distribuion is a subjective preference of the model based on experience. And the stronger the prior distribution is, the higher impact it will have on the optimal model. So before we use CNN, we have to make sure that our data fits the above assumption.

(2). classic structure

A classic convolution neural network has a convolutional layer, a non-linear activation layer, and a pooling layer. For deep NN, we can stack a few convolution layer together. like below

The above plot is taken from Adit Deshpande‘s A Beginner‘s Guide To Understanding Convolutional Neural Networks, one of my favoriate blogger of ML.

The interesting part of deep CNN is that deep hidden layer can receive more information from input than shallow layer, meaning although the direct connection is sparse, the deeper hidden neuron are still able to receive nearly all the features from input.

(3). To be continue

With learning more and more about NN, I gradually realize that NN is more flexible than I thought. It is like LEGO, convolution, pooling, they are just different basic tools with different assumption. You need to analyze your data and select tools that fits your assumption, and try combining them to improve performance interatively. Latrer I will open a new post to collect all the NN structure that I ever read about.



Reference

1 Vincent Dumoulin, Francesco Visin - A guide to convolution arithmetic for deep learning (BibTeX)

2 Adit Deshpande - A Beginner‘s Guide To Understanding Convolutional Neural Networks

3 Ian Goodfellow, Yoshua Bengio, Aaron Conrville - Deep Learning

原文地址:https://www.cnblogs.com/gogoSandy/p/8684501.html

时间: 2024-10-08 14:26:24

Deeplearning - Overview of Convolution Neural Network的相关文章

Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.1

3.Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.1 http://blog.csdn.net/sunbow0 Spark MLlib Deep Learning工具箱,是根据现有深度学习教程<UFLDL教程>中的算法,在SparkMLlib中的实现.具体Spark MLlib Deep Learning(深度学习)目录结构: 第一章Neural Net(NN) 1.源码 2.源码解析 3.实例 第二章D

Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.2

3.Spark MLlib Deep Learning Convolution Neural Network(深度学习-卷积神经网络)3.2 http://blog.csdn.net/sunbow0 第三章Convolution Neural Network (卷积神经网络) 2基础及源码解析 2.1 Convolution Neural Network卷积神经网络基础知识 1)基础知识: 自行google,百度,基础方面的非常多,随便看看就可以,只是很多没有把细节说得清楚和明白: 能把细节说清

Spark MLlib Deep Learning Convolution Neural Network (深度学习-卷积神经网络)3.3

3.Spark MLlib Deep Learning Convolution Neural Network(深度学习-卷积神经网络)3.3 http://blog.csdn.net/sunbow0 第三章Convolution Neural Network (卷积神经网络) 3实例 3.1 测试数据 按照上例数据,或者新建图片识别数据. 3.2 CNN实例 //2 测试数据 Logger.getRootLogger.setLevel(Level.WARN) valdata_path="/use

Convolution Neural Network (CNN) 原理与实现

本文结合Deep learning的一个应用,Convolution Neural Network 进行一些基本应用,参考Lecun的Document 0.1进行部分拓展,与结果展示(in python). 分为以下几部分: 1. Convolution(卷积) 2. Pooling(降采样过程) 3. CNN结构 4.  跑实验 下面分别介绍. PS:本篇blog为ese机器学习短期班参考资料(20140516课程),本文只是简要讲最naive最simple的思想,重在实践部分,原理课上详述.

【转帖】【面向代码】学习 Deep Learning(三)Convolution Neural Network(CNN)

今天是CNN的内容啦,CNN讲起来有些纠结,你可以事先看看convolution和pooling(subsampling),还有这篇:tornadomeet的博文 下面是那张经典的图: ====================================================================================================== 打开\tests\test_example_CNN.m一观 [cpp] view plaincopyprin

深度学习:卷积神经网络(convolution neural network)

(一)卷积神经网络 卷积神经网络最早是由Lecun在1998年提出的. 卷积神经网络通畅使用的三个基本概念为: 1.局部视觉域: 2.权值共享: 3.池化操作. 在卷积神经网络中,局部接受域表明输入图像与隐藏神经元的连接方式.在图像处理操作中采用局部视觉域的原因是:图像中的像素并不是孤立存在的,每一个像素与它周围的像素都有着相互关联,而并不是与整幅图像的像素点相关,因此采用局部视觉接受域可以类似图像的此种特性. 另外,在图像数据中存在大量的冗余数据,因此在图像处理过程中需要对这些冗余数据进行处理

论文阅读(Weilin Huang——【TIP2016】Text-Attentional Convolutional Neural Network for Scene Text Detection)

Weilin Huang--[TIP2015]Text-Attentional Convolutional Neural Network for Scene Text Detection) 目录 作者和相关链接 方法概括 创新点和贡献 方法细节 实验结果 问题讨论 作者和相关链接 总结与收获点 作者补充信息 参考文献 作者和相关链接 论文下载 作者: tong he, 黄伟林,乔宇,姚剑 方法概括 使用改进版的MSER(CE-MSERs,contrast-enhancement)提取候选字符区域

ufldl学习笔记与编程作业:Convolutional Neural Network(卷积神经网络)

ufldl出了新教程,感觉比之前的好,从基础讲起,系统清晰,又有编程实践. 在deep learning高质量群里面听一些前辈说,不必深究其他机器学习的算法,可以直接来学dl. 于是最近就开始搞这个了,教程加上matlab编程,就是完美啊. 新教程的地址是:http://ufldl.stanford.edu/tutorial/ 本节学习地址:http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/ 一直没更

(转)The Neural Network Zoo

转自:http://www.asimovinstitute.org/neural-network-zoo/ THE NEURAL NETWORK ZOO POSTED ON SEPTEMBER 14, 2016 BY FJODOR VAN VEEN With new neural network architectures popping up every now and then, it's hard to keep track of them all. Knowing all the abb