Deep Residual Learning for Image Recognition (ResNet)

目录

  • 主要内容
  • 代码

He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]. computer vision and pattern recognition, 2016: 770-778.

@article{he2016deep,
title={Deep Residual Learning for Image Recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
pages={770--778},
year={2016}}

主要内容

深度一直是CNN很重要的一个点, 作者发现, 当仅仅增加层数不一定会带来优势, 甚至会误差会加大, 而且这个误差并非是过拟合导致的.

设输入为\(x\), 一般的网络的输出可以表示为\(\mathcal{H}(x)\), 作者考虑的是
\[
\tag{1}
\mathcal{F}(x):=\mathcal{H}(x)-x.
\]

实际上看到这里是有困惑的, 为什么\(\mathcal{H}(x)-x\)是成立的? 这不就意味着网络的输出和输入是同样大小的? 那还怎么分类.

从上面的图中可以看到, 其实\(\mathcal{H}(x)\)并非是整个网络的输出, 而是某些层的输出,图中每俩个层就会进行一次残差的操作. 所以用网络去学习\(\mathcal{F}(x)\), 能够把前者的信息更好的传递下去. 就像作者说的, 如果前面部分的层能够很好的完成任务, 后面的层只需要称为恒等映射就行了. 但是恒等映射不一定能够被很好的逼近, 这将导致网络加深反而误差变大, 但是如果改成学习残差就很容易了, 因为后面的层只需要将权重设置为0,那么后面每一块的输出都会是\(x\)(为某一层的输出), 这至少能够保证深度加深结果不会变坏.

当然还有最后一个问题, \(x\)的大小终究是要变化的, 所以我们没法保证\(\mathcal{F}(x)\)和\(x\)的尺寸是一致的, 一种解决办法是增加一个线性映射
\[
\tag{2}
\mathcal{F}(x)+W_s x,
\]
代码里用的便是1x1的卷积核, 或者也可以通过补零来实现.

代码

"""
Resnet34训练于CIFAR10
epoches=1000
lr=0.01 论文中0.1开始  试了以下梯度炸了 可能是网络结构的原因
momentum=0.9
weight_decay=0.0001
"""

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import numpy as np
import os

class Residualblock(nn.Module):

    def __init__(self, in_channels, out_channels,
                 stride=1, shortcut=None):
        super(Residualblock, self).__init__()

        self.longway = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, stride, 1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, 3, 1, 1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

        self.shortway = shortcut

    def forward(self, x):

        residual = self.longway(x)
        identity = x if self.shortway is None else self.shortway(x)
        return nn.functional.relu(identity + residual)

class ResNet(nn.Module):

    def __init__(self, out_size=10, layers=None):
        """
        :param out_size: 输出的类的数量
        :param layers:  每组有多少块 说不清 回看论文
        """
        super(ResNet, self).__init__()

        if layers is None:
            layers = (2, 3, 5, 2)
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 7, 2, 3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, 2, 1)
        )
        self.layer1 = self._make_layer(64, 64, layers[0])
        self.layer2 = self._make_layer(64, 128, layers[1], 2)
        self.layer3 = self._make_layer(128, 256, layers[2], 2)
        self.layer4 = self._make_layer(256, 512, layers[3], 2)

        #ada_avg: 将输入(N, C, H, W) -> (N, C, H*, W*)
        #下面H*, W* = 1, 1
        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, out_size)

        #直接从pytorch源码中搬来的初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, in_channels, out_channels,
                    block_nums, stride=1):

        shortcut = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 1, stride)
        )
        layer = [nn.Sequential(
            Residualblock(in_channels, out_channels, stride, shortcut)
        )]
        for block in range(block_nums):
            layer.append(
                Residualblock(out_channels, out_channels, 1)
            )
        return nn.Sequential(*layer)

    def forward(self, x):

        x = self.conv1(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avg_pool(x)

        x = torch.flatten(x, 1) #展平 等价于.vier(x.size(0), -1)
        out = self.fc(x)
        return out

class Train:

    def __init__(self, lr=0.01, momentum=0.9, weight_decay=0.0001):
        self.net = ResNet()
        self.criterion = nn.CrossEntropyLoss()
        self.opti = torch.optim.SGD(self.net.parameters(),
                                    lr=lr, momentum=momentum,
                                    weight_decay=weight_decay)
        self.gpu()
        self.generate_path()
        self.acc_rates = []
        self.errors = []

    def gpu(self):
        self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        if torch.cuda.device_count() > 1:
            print("Let'us use %d GPUs" % torch.cuda.device_count())
            self.net = nn.DataParallel(self.net)
        self.net = self.net.to(self.device)

    def generate_path(self):
        """
        生成保存数据的路径
        :return:
        """
        try:
            os.makedirs('./paras')
            os.makedirs('./logs')
            os.makedirs('./infos')
        except FileExistsError as e:
            pass
        name = self.net.__class__.__name__
        paras = os.listdir('./paras')
        logs = os.listdir('./logs')
        infos = os.listdir('./infos')
        number = max((len(paras), len(logs), len(infos)))
        self.para_path = "./paras/{0}{1}.pt".format(
            name,
            number
        )

        self.log_path = "./logs/{0}{1}.txt".format(
            name,
            number
        )
        self.info_path = "./infos/{0}{1}.npy".format(
            name,
            number
        )

    def log(self, strings):
        """
        运行日志
        :param strings:
        :return:
        """
        # a 往后添加内容
        with open(self.log_path, 'a', encoding='utf8') as f:
            f.write(strings)

    def save(self):
        """
        保存网络参数
        :return:
        """
        torch.save(self.net.state_dict(), self.para_path)

    def derease_lr(self, multi=10):
        """
        降低学习率
        :param multi:
        :return:
        """
        self.opti.param_groups()[0]['lr'] /= multi

    def train(self, trainloder, epochs=50):
        data_size = len(trainloder) * trainloder.batch_size
        part = int(trainloder.batch_size / 2)
        for epoch in range(epochs):
            running_loss = 0.
            total_loss = 0.
            acc_count = 0.
            if (epoch + 1) % int(epochs / 2) is 0:
                self.derease_lr()
                self.log(#日志记录
                    "learning rate change!!!\n"
                )
            for i, data in enumerate(trainloder):
                imgs, labels = data
                imgs = imgs.to(self.device)
                labels = labels.to(self.device)
                out = self.net(imgs)
                loss = self.criterion(out, labels)
                _, pre = torch.max(out, 1)  #判断是否判断正确
                acc_count += (pre == labels).sum().item() #加总对的个数

                self.opti.zero_grad()
                loss.backward()
                self.opti.step()

                running_loss += loss.item()

                if (i+1) % part is 0:
                    strings = "epoch {0:<3} part {1:<5} loss: {2:<.7f}\n".format(
                        epoch, i, running_loss / part
                    )
                    self.log(strings)#日志记录
                    total_loss += running_loss
                    running_loss = 0.
            self.acc_rates.append(acc_count / data_size)
            self.errors.append(total_loss / data_size)
            self.log( #日志记录
                "Accuracy of the network on %d train images: %d %%\n" %(
                    data_size, acc_count / data_size * 100
                )
            )
            self.save() #保存网络参数
        #保存一些信息画图用
        np.save(self.info_path, {
            'acc_rates': np.array(self.acc_rates),
            'errors': np.array(self.errors)
        })

if __name__ == "__main__":

    root = "../../data"

    trainset = torchvision.datasets.CIFAR10(root=root, train=True,
                                          download=False,
                                          transform=transforms.Compose(
                                              [transforms.ToTensor(),
                                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
                                          ))

    train_loader = torch.utils.data.DataLoader(trainset, batch_size=128,
                                              shuffle=True, num_workers=0)

    dog = Train()
    dog.train(train_loader, epochs=1000)

原文地址:https://www.cnblogs.com/MTandHJ/p/12181564.html

时间: 2024-07-30 16:12:06

Deep Residual Learning for Image Recognition (ResNet)的相关文章

论文学习:Deep residual learning for image recognition

目录 I. Overview II. Degradation III. Solution & Deep residual learning IV. Implementation & Shortcut connections Home page https://github.com/KaimingHe/deep-residual-networks TensorFlow实现: https://github.com/tensorpack/tensorpack/tree/master/exampl

Deep Residual Learning for Image Recognition(MSRA-深度残差学习)

ABSTRACT: 1.Deeper neural networks are more difficult to train. 2.We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. 3.We explicitly reformulate the layers as learning r

Deep Residual Learning for Image Recognition

Kaiming HeXiangyu ZhangShaoqing RenMicrosoft Research {kahe, v-xiangz, v-shren, jiansun}@microsoft.com Abstract Deeper neural networks are more difficult to train. Wepresent a residual learning framework to ease the trainingof networks that are subst

ResNet——Deep Residual Learning for Image Recognition

1. 摘要 更深的神经网络通常更难训练,作者提出了一个残差学习的框架,使得比过去深许多的的网络训连起来也很容易. 在 ImageNet 数据集上,作者设计的网络达到了 152 层,是 VGG-19 的 8 倍,但却有着更低的复杂性.通过集成学习模型最终取得了 3.57% 的错误率,获得了 ILSVRC 2015 比赛的第一名. 表示的深度对于许多视觉识别任务而言至关重要,仅仅由于特别深的表示,作者在 COCO 物体检测数据集上获得了 28% 的相对改进. 2. 介绍 深度神经网络通常集成了低层.

Paper | Deep Residual Learning for Image Recognition

目录 1. 故事 2. 残差学习网络 2.1 残差块 2.2 ResNet 2.3 细节 3. 实验 3.1 短连接网络与plain网络 3.2 Projection解决短连接维度不匹配问题 3.3 更深的bottleneck结构 ResNet的意义已经不需要我在这里赘述.该文发表在2016 CVPR,至今(2019.10)已有3万+引用.由于ResNet已经成为大多数论文的baseline,因此我们着重看其训练细节.测试细节以及bottleneck等思想. 核心: We explicitly

Deep Residual Learning for Image Recognition(残差网络)

深度在神经网络中有及其重要的作用,但越深的网络越难训练. 随着深度的增加,从训练一开始,梯度消失或梯度爆炸就会阻止收敛,normalized initialization和intermediate normalization能够解决这个问题.但依旧会出现degradation problem:随着深度的增加,准确率会达到饱和,再持续增加深度则会导致准确率下降.这个问题不是由于过拟合造成的,因为训练误差也会随着深度增加而增大. 假定输入是x,期望输出是H(x),如果我们直接把输入x传到输出作为初始

Deep Residual Learning

最近在做一个分类的任务,输入为3通道车型图片,输出要求将这些图片对车型进行分类,最后分类类别总共是30个. 开始是试用了实验室师姐的方法采用了VGGNet的模型对车型进行分类,据之前得实验结果是训练后最高能达到92%的正确率,在采用了ImageNet训练过的DataLayer之后,可以达到97%的正确率,由于我没有进行长时间的运行测试,运行了十几个小时最高达到了92%的样子. 后来是尝试使用Deep Residual Learning的ImageNet(以后简称ResNet)的实现方法,十几个小

关于深度残差网络(Deep residual network, ResNet)

题外话: From <白话深度学习与TensorFlow> 深度残差网络: 深度残差网络的设计就是为了克服这种由于网络深度加深而产生的学习效率变低,准确率无法有效提升的问题(也称为网络退化). 甚至在一些场景下,网络层数的增加反而会降低正确率.这种本质问题是由于出现了信息丢失而产生的过拟合问题(overfitting,所建的机器学习模型或者是深度学习模型在训练样本中表现的过于优越,导致在验证数据集及测试数据集中表现不佳,即为了得到一致假设而使假设变得过度复杂).解决思路是尝试着使他们引入这些刺

【阅读笔记】Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation

Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation 作者:Lin Yang, Yizhe Zhang, Jianxu Chen, Siyuan Zhang, Danny Z. Chen 针对问题: 1.医学方向训练集数据较少 2.仅专业人士能进行标注,耗费人力物力,数据集数量难以快速提升 贡献点: 1.提出了新的全卷积网络(FCN),在测试数据集上取得了很好的结果 2.