DEEPLY SUPERVISED NETS
Fig1: Deeply-supervised Nets architecture and cost functions illustration
Abstract
We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error while improving the directness and transparency of the hidden layer learning process. We focus our attention on three specific aspects in traditional convolutional-neural-network-type (CNN-type) architectures: (1) transparency in the effect that intermediate layers have on the overall classification; (2) discriminativeness and robustness of learned features, especially in early network layers; (3) training effectiveness in the face of “exploding” and “vanishing” gradients. To combat these issues, we introduce “companion” objective functions at each individual hidden layer, in addition to the overall objective function at the output layer (a strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results on benchmark datasets, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.
Experiments
Our results on several benchmark datasets are listed below:
Fig2: Test accuracy on different benchmark datasets.
Fig3: Visualization of the convolutional feature map learned by DSN
Code, preprocessed data and configuration files are available.
Publication
[1] Deeply-Supervised Nets [pdf][Arxiv version]
Chen-Yu Lee*, Saining Xie*, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu
(* indicates equal contributions) In Proceedings of AISTATS 2015
An early and undocumented version presented at Deep Learning and Representation Learning Workshop, NIPS 2014
Disclosure
Deeply-supervised neural networks, Zhuowen Tu, Chen-Yu Lee, Saining Xie,
UCSD Docket No. SD2014-313, May 22, 2014