NEGOUT: SUBSTITUTE FOR MAXOUT UNITS

NEGOUT: SUBSTITUTE FOR MAXOUT UNITS

Maxout [1] units are well-known and frequently used tools for Deep Neural Networks. For whom does not know, with a basic explanation, a Maxout unit is a set of internal activation units competing with each other for each instance and activation of the winner is propagated as output and the loosers are kept silent. At the backpropagation phase, it means we update only the winner unit. That also means, implicitly, we always prefer to back-propagate gradient signal through the strongest path.  It is an important aspect of Maxout units, especially for very deep models which are prone to gradient instability.

Although Maxout units have very good properties like which I told (please refer to the paper for more details), I am a proactive sceptic of its ability to encode underlying information and pass it to next layer.  Here is a very simple example. Suppose we have two competing functions (filters) in a Maxout unit. One of these functions is receptive of edge structures whereas the other is receptive of corners. For an instance, we might have the first filter as the winner with a value, let’s say, ~3 which means Maxout output is also ~3. For another instance, we have the other function as the winner with approximately same value ~3. If we assume that each NN layer is a classifier which takes the previous layer output as a feature vector (I guess not very wrong assumption), then basically we give the same value for different detections for a particular feature dimension (which is corresponded to our Maxout unit). Eventually, we cannot expect from the next layer to be able to discern this signal.

Edge detector is the winner

Corner detector is the winner but the result is same

One can argue that we should evaluate Maxout unit as a whole and it is reminiscent of OR function on top of multiple filters. This is a valid argument which I cannot refuse directly but the problem that I indicated above is still floating on air.  Beside,  why we would waste our expensive NN parameters, if we could come up with a better encoding scheme for Maxout units

Here is one alternative approach for better encoding of competing functions, which we call NegOut. Let‘s assume we have a ordering of two competing functions by heart as 1st and 2nd. If the winner is the 1st function, NegOut outputs the 1st function‘s value and otherwise it outputs the 2nd function but by taking its negative. NegOut yields two assumptions. The first, competing functions are always positive (like ReLU functions ). The second, we have 2 competing functions.

NegOut activation with different winners.

If we consider the backpropagation signal, the only difference from Maxout unit is to take negative of the gradient signal for the 2nd competing unit, if it is the winner.

As you can see from the figure, the inherent property here is to output different values for different winner detectors in which the value captures both the structural difference and the strength of the winner activation.

I performed some experiments on CIFAR-10 and MNIST comparing Maxout Network with NegOut Network with exact same architectures explained in the Maxout Paper [1].  The table below summarizes results that I observe by the initial runs without any finetunning or hyper-parameter optimization yet. More comparisons on larger datasets are still in progress.

Results on CIFAR-10 and MNIST after average of 5 different runs.

NegOut give better results on CIFAR, although it is slightly lower on MNIST. Again notice that no tunning has been took a place for our NegOut network where as Maout Network is optimized as described in the paper [1].  In addition, NegOut network uses 2 competing set of units (as it is constrained by its nature) for the last FC layer in comparison to Maxout net which uses 5 competing units. My expectation is to have more difference as we go through larger models and datasets since as we scale up, representational power takes more place for better results.

Here, I tried to give a basic sketch of my recent work by no means complete. Different observations and experiments are still running. I also need to include LWTA [2] for being more fair and grasp more wider aspect of competing units. Please feel free to share your thoughts as well. Any contribution is appreciated.

PS: Lately, I devote myself to analyze the internal dynamics of Neural Networks with different architectures, layers and activation functions. The aim is checking under the hood and analyzing any intuitionally well-functioning ideas applied to  Deep Neural Networks. I also expect to share more of my findings at my blog.

[1] Maxout networks IJ Goodfellow, D Warde-Farley, M Mirza, A Courville, Y Bengio arXiv preprint arXiv:1302.4389

[2] Understanding Locally Competitive Networks Rupesh Kumar Srivastava, Jonathan Masci, Faustino Gomez, Jürgen Schmidhuber. http://arxiv.org/abs/1410.1165

Share



Related posts:

  1. Stochastic Gradient formula for different learning algorithms
  2. How does Feature Extraction work on Images?
  3. Neural Network Loss and Activation Derivatives
  4. Deep Learning Resources
时间: 2024-08-08 19:13:50

NEGOUT: SUBSTITUTE FOR MAXOUT UNITS的相关文章

Maxout Networks

Maxout Networks Researching for my master thesis I tried to understand the paper by Goodfellow et al. on the Maxout Units. I found it very hard understanding the details and thought a clear explanation in combination with a nice figure would be reall

(转) Deep Learning in a Nutshell: Core Concepts

Deep Learning in a Nutshell: Core Concepts Share: Posted on November 3, 2015by Tim Dettmers 7 CommentsTagged cuDNN, Deep Learning, Deep Neural Networks, Machine Learning,Neural Networks This post is the first in a series I’ll be writing for Parallel

用500行Julia代码开始深度学习之旅 Beginning deep learning with 500 lines of Julia

Click here for a newer version (Knet7) of this tutorial. The code used in this version (KUnet) has been deprecated. There are a number of deep learning packages out there. However most sacrifice readability for efficiency. This has two disadvantages:

Classifying plankton with deep neural networks

Classifying plankton with deep neural networks The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab o

Dropout & Maxout

[ML] My Journal from Neural Network to Deep Learning: A Brief Introduction to Deep Learning. Part. Eight Dropout & Maxout This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhiksha Raj's

深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling

技术交流QQ群:433250724,欢迎对算法.技术感兴趣的同学加入. 最近接下来几篇博文会回到神经网络结构的讨论上来,前面我在"深度学习方法(五):卷积神经网络CNN经典模型整理Lenet,Alexnet,Googlenet,VGG,Deep Residual Learning"一文中介绍了经典的CNN网络结构模型,这些可以说已经是家喻户晓的网络结构,在那一文结尾,我提到"是时候动一动卷积计算的形式了",原因是很多工作证明了,在基本的CNN卷积计算模式之外,很多简

重装系统,出现:Units specified don't exist SHSUCDX can't install

重装系统,出现:Units specified don't exist SHSUCDX can't install 解决方案1: 首先是你的硬盘分区不对吧 先用PQ格成ntfs或far32 进PE把C盘格式化成FAT32 BIOS里面硬盘模式ACHI改为IDE~要到BIOS将硬盘的模式改成compatibility模式 就可以装 解决方案2: 插入GHOST xp光盘,设置为光驱启动,出现界面后,选择xp pe(老毛桃),回车.笔记本就启动了.然后利用桌面上的GHOST安装,就不会出现问题,也只

ReLu(Rectified Linear Units)激活函数

ReLu(Rectified Linear Units)激活函数 论文参考:Deep Sparse Rectifier Neural Networks (很有趣的一篇paper) 起源:传统激活函数.脑神经元激活频率研究.稀疏激活性 传统Sigmoid系激活函数 传统神经网络中最常用的两个激活函数,Sigmoid系(Logistic-Sigmoid.Tanh-Sigmoid)被视为神经网络的核心所在. 从数学上来看,非线性的Sigmoid函数对中央区的信号增益较大,对两侧区的信号增益小,在信号的

substitute 命令与 global 命令

他们是很强大的EX命令: substitute的格式: :[range]s[ubstitute]/{pattern}/{string}/{flags} 其中的patttern 指的是正则表达式的匹配: flags:为标志位: 注意: 1.  所以呢,在缺省情况下,substitute命令仅仅作用于当前行, 而且只会修改第一处匹配: 2. 第二点: 当查找留空时,VIM 会重用上次的查找模式:所以,可以分开写成的下面的形式: :/{pattern} :[range]s[ubstitute]//{s