深度学习面试题17：VGGNet(1000类图像分类)

VGGNet探索了卷积神经网络的深度与其性能之间的关系，成功地构筑了16~19层深的卷积神经网络，证明了增加网络的深度能够在一定程度上影响网络最终的性能，使错误率大幅下降，同时拓展性又很强，迁移到其它图片数据上的泛化性也非常好。到目前为止，VGG仍然被用来提取图像特征。
VGGNet可以看成是加深版本的AlexNet，都是由卷积层、全连接层两大部分构成。

VGGNet网络结构

VGGNet比AlexNet的网络层数多，不再使用尺寸较大的卷积核，如11*11、7*7、5*5，而是只采用了尺寸为3*3的卷积核，VGG-16的卷积神经网络结构如下:

对应代码为：

import tensorflow as tf
import numpy as np

# 输入
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
# 第1层：与64个3*3*3的核，步长=1，SAME卷积
w1 = tf.Variable(tf.random_normal([3, 3, 3, 64]), dtype=tf.float32, name=‘w1‘)
conv1 = tf.nn.relu(tf.nn.conv2d(x, w1, [1, 1, 1, 1], ‘SAME‘))
# 结果为224*224*64

# 第2层：与64个3*3*64的核，步长=1，SAME卷积
w2 = tf.Variable(tf.random_normal([3, 3, 64, 64]), dtype=tf.float32, name=‘w2‘)
conv2 = tf.nn.relu(tf.nn.conv2d(conv1, w2, [1, 1, 1, 1], ‘SAME‘))
# 结果为224*224*64

# 池化1
pool1 = tf.nn.max_pool(conv2, [1, 2, 2, 1], [1, 2, 2, 1], ‘VALID‘)
# 结果为112*112*64

# 第3层：与128个3*3*64的核，步长=1，SAME卷积
w3 = tf.Variable(tf.random_normal([3, 3, 64, 128]), dtype=tf.float32, name=‘w3‘)
conv3 = tf.nn.relu(tf.nn.conv2d(pool1, w3, [1, 1, 1, 1], ‘SAME‘))
# 结果为112*112*128

# 第4层：与128个3*3*128的核，步长=1，SAME卷积
w4 = tf.Variable(tf.random_normal([3, 3, 128, 128]), dtype=tf.float32, name=‘w4‘)
conv4 = tf.nn.relu(tf.nn.conv2d(conv3, w4, [1, 1, 1, 1], ‘SAME‘))
# 结果为112*112*128

# 池化2
pool2 = tf.nn.max_pool(conv4, [1, 2, 2, 1], [1, 2, 2, 1], ‘VALID‘)
# 结果为56*56*128

# 第5层：与256个3*3*128的核，步长=1，SAME卷积
w5 = tf.Variable(tf.random_normal([3, 3, 128, 256]), dtype=tf.float32, name=‘w5‘)
conv5 = tf.nn.relu(tf.nn.conv2d(pool2, w5, [1, 1, 1, 1], ‘SAME‘))
# 结果为56*56*256

# 第6层：与256个3*3*256的核，步长=1，SAME卷积
w6 = tf.Variable(tf.random_normal([3, 3, 256, 256]), dtype=tf.float32, name=‘w6‘)
conv6 = tf.nn.relu(tf.nn.conv2d(conv5, w6, [1, 1, 1, 1], ‘SAME‘))
# 结果为56*56*256

# 第7层：与256个3*3*256的核，步长=1，SAME卷积
w7 = tf.Variable(tf.random_normal([3, 3, 256, 256]), dtype=tf.float32, name=‘w7‘)
conv7 = tf.nn.relu(tf.nn.conv2d(conv6, w7, [1, 1, 1, 1], ‘SAME‘))
# 结果为56*56*256

# 池化3
pool3 = tf.nn.max_pool(conv7, [1, 2, 2, 1], [1, 2, 2, 1], ‘VALID‘)
# 结果为28*28*256

# 第8层：与512个3*3*256的核，步长=1，SAME卷积
w8 = tf.Variable(tf.random_normal([3, 3, 256, 512]), dtype=tf.float32, name=‘w8‘)
conv8 = tf.nn.relu(tf.nn.conv2d(pool3, w8, [1, 1, 1, 1], ‘SAME‘))
# 结果为28*28*512

# 第9层：与512个3*3*512的核，步长=1，SAME卷积
w9 = tf.Variable(tf.random_normal([3, 3, 512, 512]), dtype=tf.float32, name=‘w9‘)
conv9 = tf.nn.relu(tf.nn.conv2d(conv8, w9, [1, 1, 1, 1], ‘SAME‘))
# 结果为28*28*512

# 第10层：与512个3*3*512的核，步长=1，SAME卷积
w10 = tf.Variable(tf.random_normal([3, 3, 512, 512]), dtype=tf.float32, name=‘w10‘)
conv10 = tf.nn.relu(tf.nn.conv2d(conv9, w10, [1, 1, 1, 1], ‘SAME‘))
# 结果为28*28*512

# 池化4
pool4 = tf.nn.max_pool(conv10, [1, 2, 2, 1], [1, 2, 2, 1], ‘VALID‘)
# 结果为14*14*512

# 第11层：与512个3*3*256的核，步长=1，SAME卷积
w11 = tf.Variable(tf.random_normal([3, 3, 512, 512]), dtype=tf.float32, name=‘w11‘)
conv11 = tf.nn.relu(tf.nn.conv2d(pool4, w11, [1, 1, 1, 1], ‘SAME‘))
# 结果为14*14*512

# 第12层：与512个3*3*512的核，步长=1，SAME卷积
w12 = tf.Variable(tf.random_normal([3, 3, 512, 512]), dtype=tf.float32, name=‘w12‘)
conv12 = tf.nn.relu(tf.nn.conv2d(conv11, w12, [1, 1, 1, 1], ‘SAME‘))
# 结果为14*14*512

# 第13层：与512个3*3*512的核，步长=1，SAME卷积
w13 = tf.Variable(tf.random_normal([3, 3, 512, 512]), dtype=tf.float32, name=‘w13‘)
conv13 = tf.nn.relu(tf.nn.conv2d(conv12, w13, [1, 1, 1, 1], ‘SAME‘))
# 结果为14*14*512

# 池化5
pool5 = tf.nn.max_pool(conv13, [1, 2, 2, 1], [1, 2, 2, 1], ‘VALID‘)
# 结果为7*7*512

# 拉伸为25088
pool_l5_shape = pool5.get_shape()
num = pool_l5_shape[1].value * pool_l5_shape[2].value * pool_l5_shape[3].value
flatten = tf.reshape(pool5, [-1, num])
# 结果为25088*1

# 第14层：与4096个神经元全连接
fcW1 = tf.Variable(tf.random_normal([num, 4096]), dtype=tf.float32, name=‘fcW1‘)
fc1 = tf.nn.relu(tf.matmul(flatten, fcW1))

# 第15层：与4096个神经元全连接
fcW2 = tf.Variable(tf.random_normal([4096, 4096]), dtype=tf.float32, name=‘fcW2‘)
fc2 = tf.nn.relu(tf.matmul(fc1, fcW2))

# 第16层：与1000个神经元全连接+softmax输出
fcW3 = tf.Variable(tf.random_normal([4096, 1000]), dtype=tf.float32, name=‘fcW3‘)
out = tf.matmul(fc2, fcW3)
out=tf.nn.softmax(out)

session = tf.Session()
session.run(tf.global_variables_initializer())
result = session.run(out, feed_dict={x: np.ones([1, 224, 224, 3], np.float32)})
# "打印最后的输出尺寸"
print(np.shape(result))

返回目录

论文中还讨论了其他结构

返回目录

参考资料

吴恩达深度学习

VGGNet-Very Deep Convolutional Networks for Large-Scale Image Recognition

《图解深度学习与神经网络：从张量到TensorFlow实现》_张平

《深-度-学-习-核-心-技-术-与-实-践》

大话CNN经典模型：VGGNet

https://my.oschina.net/u/876354/blog/1634322

返回目录

原文地址：https://www.cnblogs.com/mfryf/p/11381314.html

时间： 2024-10-03 07:45:50

深度学习面试题17：VGGNet(1000类图像分类)的相关文章

深度学习面试题13：AlexNet(1000类图像分类)

目录网络结构两大创新点参考资料第一个典型的CNN是LeNet5网络结构,但是第一个引起大家注意的网络却是AlexNet,Alex Krizhevsky其实是Hinton的学生,这个团队领导者是Hinton,于2012年发表论文. AlexNet有60 million个参数和65000个神经元,五层卷积,三层全连接网络,最终的输出层是1000通道的softmax.AlexNet利用了两块GPU进行计算,大大提高了运算效率,并且在ILSVRC-2012竞赛中获得了top-5测试的15.3%

深度学习Keras框架笔记之AutoEncoder类

深度学习Keras框架笔记之AutoEncoder类使用笔记 keras.layers.core.AutoEncoder(encoder, decoder,output_reconstruction=True, weights=None) 这是一个用于构建很常见的自动编码模型.如果参数output_reconstruction=True,那么dim(input)=dim(output):否则dim(output)=dim(hidden). inputshape: 取决于encoder的定义 ou

深度学习Keras框架笔记之TimeDistributedDense类

深度学习Keras框架笔记之TimeDistributedDense类使用方法笔记例: keras.layers.core.TimeDistributedDense(output_dim,init='glorot_uniform', activation='linear', weights=None W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint

深度学习面试题27：非对称卷积(Asymmetric Convolutions)

目录产生背景举例参考资料产生背景之前在深度学习面试题16:小卷积核级联卷积VS大卷积核卷积中介绍过小卷积核的三个优势: ①整合了三个非线性激活层,代替单一非线性激活层,增加了判别能力. ②减少了网络参数. ③减少了计算量在<Rethinking the Inception Architecture for Computer Vision>中作者还想把小卷积核继续拆解,从而进一步增强前面的优势返回目录举例一个3*3的卷积可以拆解为:一个3*1的卷积再串联一个1*3的卷积,实验证

深度学习面试题29：GoogLeNet(Inception V3)

目录使用非对称卷积分解大filters 重新设计pooling层辅助构造器使用标签平滑参考资料在<深度学习面试题20:GoogLeNet(Inception V1)>和<深度学习面试题26:GoogLeNet(Inception V2)>中对前两个Inception版本做了介绍,下面主要阐述V3版本的创新点使用非对称卷积分解大filters InceptionV3中在网络较深的位置使用了非对称卷积,他的好处是在不降低模型效果的前提下,缩减模型的参数规模,在<深度学

深度学习面试题20：GoogLeNet(Inception V1)

目录简介网络结构对应代码网络说明参考资料简介 2014年,GoogLeNet和VGG是当年ImageNet挑战赛(ILSVRC14)的双雄,GoogLeNet获得了第一名.VGG获得了第二名,这两类模型结构的共同特点是层次更深了.VGG继承了LeNet以及AlexNet的一些框架结构,而GoogLeNet则做了更加大胆的网络结构尝试,虽然深度只有22层,但大小却比AlexNet和VGG小很多,GoogleNet参数为500万个,AlexNet参数个数是GoogleNet的12倍,VG

深度学习面试题12：LeNet(手写数字识别)

目录神经网络的卷积.池化.拉伸 LeNet网络结构 LeNet在MNIST数据集上应用参考资料 LeNet是卷积神经网络的祖师爷LeCun在1998年提出,用于解决手写数字识别的视觉任务.自那时起,CNN的最基本的架构就定下来了:卷积层.池化层.全连接层.如今各大深度学习框架中所使用的LeNet都是简化改进过的LeNet-5(-5表示具有5个层),和原始的LeNet有些许不同,比如把激活函数改为了现在很常用的ReLu. 神经网络的卷积.池化.拉伸前面讲了卷积和池化,卷积层可以从图像中提取特

深度学习面试题24：在每个深度上分别卷积(depthwise卷积)

目录举例单个张量与多个卷积核在深度上分别卷积参考资料举例如下张量x和卷积核K进行depthwise_conv2d卷积结果为: depthwise_conv2d和conv2d的不同之处在于conv2d在每一深度上卷积,然后求和,depthwise_conv2d没有求和这一步,对应代码为: import tensorflow as tf # [batch, in_height, in_width, in_channels] input =tf.reshape( tf.constant([

深度学习面试题16：小卷积核级联卷积VS大卷积核卷积

目录感受野多个小卷积核连续卷积和单个大卷积核卷积的作用相同小卷积核的优势参考资料感受野在卷积神经网络中,感受野(Receptive Field)的定义是卷积神经网络每一层输出的特征图(feature map)上的像素点在输入图片上映射的区域大小.再通俗点的解释是,特征图上的一个点对应输入图上的区域,如下图所示: 返回目录多个小卷积核连续卷积和单个大卷积核卷积的作用相同像LeNet.AlexNet网络,都是用了较大的卷积核,目的是提取出输入图像更大邻域范围的信息,一般是卷积与池化操