ABSTRACT:
1.Deeper neural networks are more difficult to train.
2.We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
3.We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
4.We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
5.On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8 times deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.
1.在现有基础下,想要进一步训练更深层次的神经网络是非常困难的。
2.我们提出了一种减轻网络训练负担的残差学习框架,这种网络比以前使用过的网络本质上层次更深。
3.我们明确地将这层作为输入层相关的学习残差函数,而不是学习未知的函数。同时,我们提供了全面实验数据,这些数据证明残差网络更容易优化,并且可以从深度增加中大大提高精度。
4.我们在ImageNet数据集用152 层--比VGG网络深8倍的深度来评估残差网络,但它仍具有较低的复杂度。
5.在ImageNet测试集中,这些残差网络整体达到了3.57%的误差。该结果在2015年大规模视觉识别挑战赛分类任务中赢得了第一。此外,我们还用了100到1000层深度分析了的CIFAR-10。
INTRODUCTION:
1.Deep networks naturally integrate low/mid/highlevel features and classifiers in an end-to-end multilayer fashion, and the “levels” of features can be enriched by the number of stacked layers (depth).
2.Problem of vanishing/exploding gradients, which hamper convergence from the beginning. This problem, however, has been largely addressed by normalized initialization and intermediate normalization layers , which enable networks with tens of layers to start converging for stochastic gradient descent (SGD) with backpropagation.
3.with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as reported in and thoroughly verified by our experiments.
4.There exists a solution by construction to the deeper model: the added layers are identity mapping, and the other layers are copied from the learned shallower model.
5.In this paper, we address the degradation problem by introducing a deep residual learning framework. Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping.
advantage:
1) Our extremely deep residual nets are easy to optimize, but the counterpart “plain” nets (that simply stack layers) exhibit higher training error when the depth increases;
2) Our deep residual nets can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks.