第二十二节,TensorFlow中的图片分类模型库slim的使用

Google在TensorFlow1.0,之后推出了一个叫slim的库,TF-slim是TensorFlow的一个新的轻量级的高级API接口。这个模块是在16年新推出的,其主要目的是来做所谓的“代码瘦身”。它类似我们在TensorFlow模块中所介绍的tf.contrib.lyers模块,将很多常见的TensorFlow函数进行了二次封装,使得代码变得更加简洁,特别适用于构建复杂结构的深度神经网络,它可以用了定义、训练、和评估复杂的模型。

这里我们为什么要过来介绍这一节的内容呢?主要是因为TensorFlow的models模块里提供了大量用slim写好的网络模型结构代码,以及用该代码训练出来的模型检查点文件,可以作为我们预训练模型来使用。因此我们需要会使用slim库。

一 获取models中的slim模块代码

为了能够使用models中的代码,需要先验证下我们的TensorFlow版本是否集成了slim模块。接着从GitHub上将models代码下载下来:

1.验证slim库

在使用slim之前,要测试本地的tf.contrib.slim模块是否有效,在命令行中输入如下命令:

python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once"

如果没有任何错误,则表明TF-Slim是可以工作的。

2. 下载models模块

To use TF-Slim for image classification, you also have to install the TF-Slim image models library, which is not part of the core TF library. To do this, check out the tensorflow/models repository as follows:

cd $HOME/workspace
git clone https://github.com/tensorflow/models/

This will put the TF-Slim image models library in $HOME/workspace/models/research/slim. (It will also create a directory calledmodels/inception, which contains an older version of slim; you can safely ignore this.)

To verify that this has worked, execute the following commands; it should run without raising any errors.

cd $HOME/workspace/models/research/slim
python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"

我使用的是window操作系统,我直接从https://github.com/tensorflow/models/网址下载了该模块:

二  models中的slim目录结构

slim位于\models-master\research\slim路径下,一共有5个文件夹:

  • datasets:处理数据集相关的代码。
  • deployment:部署。通过创建clone方式实现跨机器的分布训练,可以在多CPU和多GPU上实现运算的同步或者异步。
  • nets:该文件夹里存放着各种网络模型。
  • preprocessing:适用于各种网络的图片处理函数。
  • scripts:运行网络模型的一些案例脚本,这些脚本只能在支持shell的系统下使用。

在这里重点介绍datasets,nets,preprocessing三个文件夹。

1.datesets数据集处理模块

datasets里面存放着常用的图片训练数据集相关的代码。主要支持的数据集有cifar10、flowers、mnist、imagenet。

代码文件的名称和数据集相对应,可以使用这些代码下载或获取数据集中的数据。以imagenet为例,可以使用如下函数从网上获取imagenet标签。

    imagenet_map = imagenet.create_readable_names_for_imagenet_labels()

上面代码返回的是imagenet中1000个类的分类标签名字(与样本序列对应)。

2.nets模块

该文件夹下面包含各种网络模块:

每个网络模型文件都是以自己的名字命名的,而且里面的代码结构框架也大致相同,以inception_resnet_v2为例:

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition of the Inception Resnet V2 architecture.

As described in http://arxiv.org/abs/1602.07261.

  Inception-v4, Inception-ResNet and the Impact of Residual Connections
    on Learning
  Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

slim = tf.contrib.slim

def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
  """Builds the 35x35 resnet block."""
  with tf.variable_scope(scope, ‘Block35‘, [net], reuse=reuse):
    with tf.variable_scope(‘Branch_0‘):
      tower_conv = slim.conv2d(net, 32, 1, scope=‘Conv2d_1x1‘)
    with tf.variable_scope(‘Branch_1‘):
      tower_conv1_0 = slim.conv2d(net, 32, 1, scope=‘Conv2d_0a_1x1‘)
      tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope=‘Conv2d_0b_3x3‘)
    with tf.variable_scope(‘Branch_2‘):
      tower_conv2_0 = slim.conv2d(net, 32, 1, scope=‘Conv2d_0a_1x1‘)
      tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope=‘Conv2d_0b_3x3‘)
      tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope=‘Conv2d_0c_3x3‘)
    mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2])
    up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
                     activation_fn=None, scope=‘Conv2d_1x1‘)
    scaled_up = up * scale
    if activation_fn == tf.nn.relu6:
      # Use clip_by_value to simulate bandpass activation.
      scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)

    net += scaled_up
    if activation_fn:
      net = activation_fn(net)
  return net

def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
  """Builds the 17x17 resnet block."""
  with tf.variable_scope(scope, ‘Block17‘, [net], reuse=reuse):
    with tf.variable_scope(‘Branch_0‘):
      tower_conv = slim.conv2d(net, 192, 1, scope=‘Conv2d_1x1‘)
    with tf.variable_scope(‘Branch_1‘):
      tower_conv1_0 = slim.conv2d(net, 128, 1, scope=‘Conv2d_0a_1x1‘)
      tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7],
                                  scope=‘Conv2d_0b_1x7‘)
      tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1],
                                  scope=‘Conv2d_0c_7x1‘)
    mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
    up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
                     activation_fn=None, scope=‘Conv2d_1x1‘)

    scaled_up = up * scale
    if activation_fn == tf.nn.relu6:
      # Use clip_by_value to simulate bandpass activation.
      scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)

    net += scaled_up
    if activation_fn:
      net = activation_fn(net)
  return net

def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
  """Builds the 8x8 resnet block."""
  with tf.variable_scope(scope, ‘Block8‘, [net], reuse=reuse):
    with tf.variable_scope(‘Branch_0‘):
      tower_conv = slim.conv2d(net, 192, 1, scope=‘Conv2d_1x1‘)
    with tf.variable_scope(‘Branch_1‘):
      tower_conv1_0 = slim.conv2d(net, 192, 1, scope=‘Conv2d_0a_1x1‘)
      tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3],
                                  scope=‘Conv2d_0b_1x3‘)
      tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1],
                                  scope=‘Conv2d_0c_3x1‘)
    mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
    up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
                     activation_fn=None, scope=‘Conv2d_1x1‘)

    scaled_up = up * scale
    if activation_fn == tf.nn.relu6:
      # Use clip_by_value to simulate bandpass activation.
      scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0)

    net += scaled_up
    if activation_fn:
      net = activation_fn(net)
  return net

def inception_resnet_v2_base(inputs,
                             final_endpoint=‘Conv2d_7b_1x1‘,
                             output_stride=16,
                             align_feature_maps=False,
                             scope=None,
                             activation_fn=tf.nn.relu):
  """Inception model from  http://arxiv.org/abs/1602.07261.

  Constructs an Inception Resnet v2 network from inputs to the given final
  endpoint. This method can construct the network up to the final inception
  block Conv2d_7b_1x1.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    final_endpoint: specifies the endpoint to construct the network up to. It
      can be one of [‘Conv2d_1a_3x3‘, ‘Conv2d_2a_3x3‘, ‘Conv2d_2b_3x3‘,
      ‘MaxPool_3a_3x3‘, ‘Conv2d_3b_1x1‘, ‘Conv2d_4a_3x3‘, ‘MaxPool_5a_3x3‘,
      ‘Mixed_5b‘, ‘Mixed_6a‘, ‘PreAuxLogits‘, ‘Mixed_7a‘, ‘Conv2d_7b_1x1‘]
    output_stride: A scalar that specifies the requested ratio of input to
      output spatial resolution. Only supports 8 and 16.
    align_feature_maps: When true, changes all the VALID paddings in the network
      to SAME padding so that the feature maps are aligned.
    scope: Optional variable_scope.
    activation_fn: Activation function for block scopes.

  Returns:
    tensor_out: output tensor corresponding to the final_endpoint.
    end_points: a set of activations for external use, for example summaries or
                losses.

  Raises:
    ValueError: if final_endpoint is not set to one of the predefined values,
      or if the output_stride is not 8 or 16, or if the output_stride is 8 and
      we request an end point after ‘PreAuxLogits‘.
  """
  if output_stride != 8 and output_stride != 16:
    raise ValueError(‘output_stride must be 8 or 16.‘)

  padding = ‘SAME‘ if align_feature_maps else ‘VALID‘

  end_points = {}

  def add_and_check_final(name, net):
    end_points[name] = net
    return name == final_endpoint

  with tf.variable_scope(scope, ‘InceptionResnetV2‘, [inputs]):
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
                        stride=1, padding=‘SAME‘):
      # 149 x 149 x 32
      net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding,
                        scope=‘Conv2d_1a_3x3‘)
      if add_and_check_final(‘Conv2d_1a_3x3‘, net): return net, end_points

      # 147 x 147 x 32
      net = slim.conv2d(net, 32, 3, padding=padding,
                        scope=‘Conv2d_2a_3x3‘)
      if add_and_check_final(‘Conv2d_2a_3x3‘, net): return net, end_points
      # 147 x 147 x 64
      net = slim.conv2d(net, 64, 3, scope=‘Conv2d_2b_3x3‘)
      if add_and_check_final(‘Conv2d_2b_3x3‘, net): return net, end_points
      # 73 x 73 x 64
      net = slim.max_pool2d(net, 3, stride=2, padding=padding,
                            scope=‘MaxPool_3a_3x3‘)
      if add_and_check_final(‘MaxPool_3a_3x3‘, net): return net, end_points
      # 73 x 73 x 80
      net = slim.conv2d(net, 80, 1, padding=padding,
                        scope=‘Conv2d_3b_1x1‘)
      if add_and_check_final(‘Conv2d_3b_1x1‘, net): return net, end_points
      # 71 x 71 x 192
      net = slim.conv2d(net, 192, 3, padding=padding,
                        scope=‘Conv2d_4a_3x3‘)
      if add_and_check_final(‘Conv2d_4a_3x3‘, net): return net, end_points
      # 35 x 35 x 192
      net = slim.max_pool2d(net, 3, stride=2, padding=padding,
                            scope=‘MaxPool_5a_3x3‘)
      if add_and_check_final(‘MaxPool_5a_3x3‘, net): return net, end_points

      # 35 x 35 x 320
      with tf.variable_scope(‘Mixed_5b‘):
        with tf.variable_scope(‘Branch_0‘):
          tower_conv = slim.conv2d(net, 96, 1, scope=‘Conv2d_1x1‘)
        with tf.variable_scope(‘Branch_1‘):
          tower_conv1_0 = slim.conv2d(net, 48, 1, scope=‘Conv2d_0a_1x1‘)
          tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5,
                                      scope=‘Conv2d_0b_5x5‘)
        with tf.variable_scope(‘Branch_2‘):
          tower_conv2_0 = slim.conv2d(net, 64, 1, scope=‘Conv2d_0a_1x1‘)
          tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3,
                                      scope=‘Conv2d_0b_3x3‘)
          tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3,
                                      scope=‘Conv2d_0c_3x3‘)
        with tf.variable_scope(‘Branch_3‘):
          tower_pool = slim.avg_pool2d(net, 3, stride=1, padding=‘SAME‘,
                                       scope=‘AvgPool_0a_3x3‘)
          tower_pool_1 = slim.conv2d(tower_pool, 64, 1,
                                     scope=‘Conv2d_0b_1x1‘)
        net = tf.concat(
            [tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3)

      if add_and_check_final(‘Mixed_5b‘, net): return net, end_points
      # TODO(alemi): Register intermediate endpoints
      net = slim.repeat(net, 10, block35, scale=0.17,
                        activation_fn=activation_fn)

      # 17 x 17 x 1088 if output_stride == 8,
      # 33 x 33 x 1088 if output_stride == 16
      use_atrous = output_stride == 8

      with tf.variable_scope(‘Mixed_6a‘):
        with tf.variable_scope(‘Branch_0‘):
          tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2,
                                   padding=padding,
                                   scope=‘Conv2d_1a_3x3‘)
        with tf.variable_scope(‘Branch_1‘):
          tower_conv1_0 = slim.conv2d(net, 256, 1, scope=‘Conv2d_0a_1x1‘)
          tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3,
                                      scope=‘Conv2d_0b_3x3‘)
          tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3,
                                      stride=1 if use_atrous else 2,
                                      padding=padding,
                                      scope=‘Conv2d_1a_3x3‘)
        with tf.variable_scope(‘Branch_2‘):
          tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2,
                                       padding=padding,
                                       scope=‘MaxPool_1a_3x3‘)
        net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3)

      if add_and_check_final(‘Mixed_6a‘, net): return net, end_points

      # TODO(alemi): register intermediate endpoints
      with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1):
        net = slim.repeat(net, 20, block17, scale=0.10,
                          activation_fn=activation_fn)
      if add_and_check_final(‘PreAuxLogits‘, net): return net, end_points

      if output_stride == 8:
        # TODO(gpapan): Properly support output_stride for the rest of the net.
        raise ValueError(‘output_stride==8 is only supported up to the ‘
                         ‘PreAuxlogits end_point for now.‘)

      # 8 x 8 x 2080
      with tf.variable_scope(‘Mixed_7a‘):
        with tf.variable_scope(‘Branch_0‘):
          tower_conv = slim.conv2d(net, 256, 1, scope=‘Conv2d_0a_1x1‘)
          tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2,
                                     padding=padding,
                                     scope=‘Conv2d_1a_3x3‘)
        with tf.variable_scope(‘Branch_1‘):
          tower_conv1 = slim.conv2d(net, 256, 1, scope=‘Conv2d_0a_1x1‘)
          tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2,
                                      padding=padding,
                                      scope=‘Conv2d_1a_3x3‘)
        with tf.variable_scope(‘Branch_2‘):
          tower_conv2 = slim.conv2d(net, 256, 1, scope=‘Conv2d_0a_1x1‘)
          tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3,
                                      scope=‘Conv2d_0b_3x3‘)
          tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2,
                                      padding=padding,
                                      scope=‘Conv2d_1a_3x3‘)
        with tf.variable_scope(‘Branch_3‘):
          tower_pool = slim.max_pool2d(net, 3, stride=2,
                                       padding=padding,
                                       scope=‘MaxPool_1a_3x3‘)
        net = tf.concat(
            [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3)

      if add_and_check_final(‘Mixed_7a‘, net): return net, end_points

      # TODO(alemi): register intermediate endpoints
      net = slim.repeat(net, 9, block8, scale=0.20, activation_fn=activation_fn)
      net = block8(net, activation_fn=None)

      # 8 x 8 x 1536
      net = slim.conv2d(net, 1536, 1, scope=‘Conv2d_7b_1x1‘)
      if add_and_check_final(‘Conv2d_7b_1x1‘, net): return net, end_points

    raise ValueError(‘final_endpoint (%s) not recognized‘, final_endpoint)

def inception_resnet_v2(inputs, num_classes=1001, is_training=True,
                        dropout_keep_prob=0.8,
                        reuse=None,
                        scope=‘InceptionResnetV2‘,
                        create_aux_logits=True,
                        activation_fn=tf.nn.relu):
  """Creates the Inception Resnet V2 model.

  Args:
    inputs: a 4-D tensor of size [batch_size, height, width, 3].
      Dimension batch_size may be undefined. If create_aux_logits is false,
      also height and width may be undefined.
    num_classes: number of predicted classes. If 0 or None, the logits layer
      is omitted and the input features to the logits layer (before  dropout)
      are returned instead.
    is_training: whether is training or not.
    dropout_keep_prob: float, the fraction to keep before final layer.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse ‘scope‘ must be given.
    scope: Optional variable_scope.
    create_aux_logits: Whether to include the auxilliary logits.
    activation_fn: Activation function for conv2d.

  Returns:
    net: the output of the logits layer (if num_classes is a non-zero integer),
      or the non-dropped-out input to the logits layer (if num_classes is 0 or
      None).
    end_points: the set of end_points from the inception model.
  """
  end_points = {}

  with tf.variable_scope(scope, ‘InceptionResnetV2‘, [inputs],
                         reuse=reuse) as scope:
    with slim.arg_scope([slim.batch_norm, slim.dropout],
                        is_training=is_training):

      net, end_points = inception_resnet_v2_base(inputs, scope=scope,
                                                 activation_fn=activation_fn)

      if create_aux_logits and num_classes:
        with tf.variable_scope(‘AuxLogits‘):
          aux = end_points[‘PreAuxLogits‘]
          aux = slim.avg_pool2d(aux, 5, stride=3, padding=‘VALID‘,
                                scope=‘Conv2d_1a_3x3‘)
          aux = slim.conv2d(aux, 128, 1, scope=‘Conv2d_1b_1x1‘)
          aux = slim.conv2d(aux, 768, aux.get_shape()[1:3],
                            padding=‘VALID‘, scope=‘Conv2d_2a_5x5‘)
          aux = slim.flatten(aux)
          aux = slim.fully_connected(aux, num_classes, activation_fn=None,
                                     scope=‘Logits‘)
          end_points[‘AuxLogits‘] = aux

      with tf.variable_scope(‘Logits‘):
        # TODO(sguada,arnoegw): Consider adding a parameter global_pool which
        # can be set to False to disable pooling here (as in resnet_*()).
        kernel_size = net.get_shape()[1:3]
        if kernel_size.is_fully_defined():
          net = slim.avg_pool2d(net, kernel_size, padding=‘VALID‘,
                                scope=‘AvgPool_1a_8x8‘)
        else:
          net = tf.reduce_mean(net, [1, 2], keep_dims=True, name=‘global_pool‘)
        end_points[‘global_pool‘] = net
        if not num_classes:
          return net, end_points
        net = slim.flatten(net)
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope=‘Dropout‘)
        end_points[‘PreLogitsFlatten‘] = net
        logits = slim.fully_connected(net, num_classes, activation_fn=None,
                                      scope=‘Logits‘)
        end_points[‘Logits‘] = logits
        end_points[‘Predictions‘] = tf.nn.softmax(logits, name=‘Predictions‘)

    return logits, end_points
inception_resnet_v2.default_image_size = 299

def inception_resnet_v2_arg_scope(weight_decay=0.00004,
                                  batch_norm_decay=0.9997,
                                  batch_norm_epsilon=0.001,
                                  activation_fn=tf.nn.relu):
  """Returns the scope with the default parameters for inception_resnet_v2.

  Args:
    weight_decay: the weight decay for weights variables.
    batch_norm_decay: decay for the moving average of batch_norm momentums.
    batch_norm_epsilon: small float added to variance to avoid dividing by zero.
    activation_fn: Activation function for conv2d.

  Returns:
    a arg_scope with the parameters needed for inception_resnet_v2.
  """
  # Set weight_decay for weights in conv2d and fully_connected layers.
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      weights_regularizer=slim.l2_regularizer(weight_decay),
                      biases_regularizer=slim.l2_regularizer(weight_decay)):

    batch_norm_params = {
        ‘decay‘: batch_norm_decay,
        ‘epsilon‘: batch_norm_epsilon,
        ‘fused‘: None,  # Use fused batch norm if possible.
    }
    # Set activation_fn and parameters for batch_norm.
    with slim.arg_scope([slim.conv2d], activation_fn=activation_fn,
                        normalizer_fn=slim.batch_norm,
                        normalizer_params=batch_norm_params) as scope:
      return scope

该网络的框架接口如下:

  • inception_resnet_v2.default_image_size:默认图片的大小
  • inception_resnet_v2_base:为inception_resnet_v2的基础结构实现函数,输出inception_resnet_v2网络中最原始的数据,默认是传到inception_resnet_v2函数中,一般不会改变其内部。当要使用自定义的输出层时,会将传入自己的函数来替代inception_resnet_v2函数。
  • inception_resnet_v2:inception_resnet_v2网络的实现函数,这个函数有两个输出,一个是预测结果logits,另一个是辅助信息AuxLogits。辅助信息是为了显示或分析使用,主要包括summaries和losses。
  • inception_resnet_v2_arg_scope:该函数返回命名空间的名字。在外层修改或者使用模型时,可以使用与模型相同的命名空间。

3.preprocessing模块

该模块代码包含几个图片预处理文件,命名也是按照模型的名字来命名的。slim会把某一类模型常用的预处理函数放到一个文件里,并命名该类模型相关的名字,而且每个代码文件函数结构也大致相似。

三 slim中的数据集处理

1.准备数据集

As part of this library, we‘ve included scripts to download several popular image datasets (listed below) and convert them to slim format.

2 下载数据集并转换成TFRecord格式

TFRecord是TensorFlow推荐的数据集格式,与TensorFlow框架结合紧密。在TensorFlow中提供了一系列接口可以访问TFRecord格式,该结构存在的意义主要是为了满足在处理海量样本集时,需要边执行训练边从硬盘上读取数据的需求。将原始文件转换成TFRecord的格式,然后在运行中通过多线程的方式来读取,这样可以减少主线程训练的负担,使得训练过程变得更高效。关于TFRecord格式详情可以参考文章

第十二节,TensorFlow读取数据的几种方法以及队列的使用

For each dataset, we‘ll need to download the raw data and convert it to TensorFlow‘s native TFRecord format. Each TFRecord contains a TF-Example protocol buffer. Below we demonstrate how to do this for the Flowers dataset.

$ DATA_DIR=/tmp/data/flowers
$ python download_and_convert_data.py     --dataset_name=flowers     --dataset_dir="${DATA_DIR}"

这里有两个关键点:一个是数据集(例子中的flowers),另一个是下载路径(这里是存放在/tmp/data/flowers下的)

When the script finishes you will find several TFRecord files created:

These represent the training and validation data, sharded over 5 files each. You will also find the $DATA_DIR/labels.txt file which contains the mapping from integer labels to class names.

You can use the same script to create the mnist and cifar10 datasets. However, for ImageNet, you have to follow the instructionshere. Note that you first have to sign up for an account at image-net.org. Also, the download can take several hours, and could use up to 500GB.

在这里我详细介绍一下执行的代码,我们打开download_and_convert_data.py 文件,代码内容如下:

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Downloads and converts a particular dataset.

Usage:
```shell

$ python download_and_convert_data.py     --dataset_name=mnist     --dataset_dir=/tmp/mnist

$ python download_and_convert_data.py     --dataset_name=cifar10     --dataset_dir=/tmp/cifar10

$ python download_and_convert_data.py     --dataset_name=flowers     --dataset_dir=/tmp/flowers
```
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

from datasets import download_and_convert_cifar10
from datasets import download_and_convert_flowers
from datasets import download_and_convert_mnist

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string(
    ‘dataset_name‘,
    None,
    ‘The name of the dataset to convert, one of "cifar10", "flowers", "mnist".‘)

tf.app.flags.DEFINE_string(
    ‘dataset_dir‘,
    None,
    ‘The directory where the output TFRecords and temporary files are saved.‘)

def main(_):
  if not FLAGS.dataset_name:
    raise ValueError(‘You must supply the dataset name with --dataset_name‘)
  if not FLAGS.dataset_dir:
    raise ValueError(‘You must supply the dataset directory with --dataset_dir‘)

  if FLAGS.dataset_name == ‘cifar10‘:
    download_and_convert_cifar10.run(FLAGS.dataset_dir)
  elif FLAGS.dataset_name == ‘flowers‘:
    download_and_convert_flowers.run(FLAGS.dataset_dir)
  elif FLAGS.dataset_name == ‘mnist‘:
    download_and_convert_mnist.run(FLAGS.dataset_dir)
  else:
    raise ValueError(
        ‘dataset_name [%s] was not recognized.‘ % FLAGS.dataset_name)

if __name__ == ‘__main__‘:
  tf.app.run()

  • 程序使用过  tf.app.run()函数执行的,该函数会解析命令行参数,并传递给flags。当我们执行上面那一句命令行时,即等于FLAGS.dataset_name=‘flowers‘,FLAGS.dataset_dir=‘/tmp/data/flowers’
  • 执行main函数,然后执行   download_and_convert_flowers.run(FLAGS.dataset_dir)该函数。该函数实现:开始下载数据集,并解压数据集,然后再转换成TFRecord格式,删除数据集文件。

download_and_convert_flowers.run函数位于download_and_convert_flowers.py文件下,run()函数代码如下:

def run(dataset_dir):
  """Runs the download and conversion operation.

  Args:
    dataset_dir: The dataset directory where the dataset is stored.
  """
  if not tf.gfile.Exists(dataset_dir):
    tf.gfile.MakeDirs(dataset_dir)

  if _dataset_exists(dataset_dir):
    print(‘Dataset files already exist. Exiting without re-creating them.‘)
    return

  dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir)
  photo_filenames, class_names = _get_filenames_and_classes(dataset_dir)
  class_names_to_ids = dict(zip(class_names, range(len(class_names))))

  # Divide into train and test:
  random.seed(_RANDOM_SEED)
  random.shuffle(photo_filenames)
  training_filenames = photo_filenames[_NUM_VALIDATION:]
  validation_filenames = photo_filenames[:_NUM_VALIDATION]

  # First, convert the training and validation sets.
  _convert_dataset(‘train‘, training_filenames, class_names_to_ids,
                   dataset_dir)
  _convert_dataset(‘validation‘, validation_filenames, class_names_to_ids,
                   dataset_dir)

  # Finally, write the labels file:
  labels_to_class_names = dict(zip(range(len(class_names)), class_names))
  dataset_utils.write_label_file(labels_to_class_names, dataset_dir)

  _clean_up_temporary_files(dataset_dir)
  print(‘\nFinished converting the Flowers dataset!‘)

在这里只粗略的解释一下代码的执行流程:

  • 判断dataset_dir文件夹是否存在,不存在则创建。
  • 检查dataset_dir文件夹下是否存在所有的TFRecord文件,存在则退出。
  • 从_DATA_URL网址下载数据集,并解压到dataset_dir文件下下。
  • 获取所有图片的全路径和类别名,注意这里文件夹均是以类别名称命名的,所以全路径中就包含了类别。

  • 创建标签->类别名的映射字典。
  • 打乱文件名,然后划分验证集和训练集。
  • 把训练集每一个样本分别以TF-Example 格式写入TFRecord文件中。
  • 把验证集每一个样本分别以TF-Example 格式写入TFRecord文件中。
  • def image_to_tfexample(image_data, image_format, height, width, class_id):
      return tf.train.Example(features=tf.train.Features(feature={
          ‘image/encoded‘: bytes_feature(image_data),
          ‘image/format‘: bytes_feature(image_format),
          ‘image/class/label‘: int64_feature(class_id),
          ‘image/height‘: int64_feature(height),
          ‘image/width‘: int64_feature(width),
      }))
  • 生成标签文件.txt。每行数据格式为  标签:类别名(后面是换行符\n)
  • 清除数据集.tgz文件和解压的文件。

3 利用slim读取TFRecord中的数据

我们已经创建好了TFRecord文件,下面就可以读取文件中的数据了。

import tensorflow as tf
from datasets import flowers
from datasets import download_and_convert_flowers
import matplotlib.pyplot as plt
import osimport numpy as np
#TFRecord文件所在目录
DATA_DIR = ‘./datasets/data/flowers‘

slim = tf.contrib.slim

def read_flower_image_and_label(dataset_dir = DATA_DIR):
    ‘‘‘
    下载flower_photos.tgz数据集
    切分训练集和验证集
    并将数据转换成TFRecord格式  5个训练数据文件(3320),5个验证数据文件(350),还有一个标签文件(存放每个数字标签对应的类名)

    args:
        dataset_dir:数据集所在的目录
    return:
        image,label:返回随机读取的一张图片,和对应的标签
    ‘‘‘

    download_and_convert_flowers.run(dataset_dir = DATA_DIR)
    ‘‘‘
    利用slim读取TFRecord中的数据
    ‘‘‘
    #选择数据集train
    dataset = flowers.get_split(split_name = ‘train‘,dataset_dir = DATA_DIR)

    #创建一个数据provider
    provider = slim.dataset_data_provider.DatasetDataProvider(dataset)

    #通过provider的get随机获取一条样本数据 返回的是两个张量
    [image,label] = provider.get([‘image‘,‘label‘])

    return image,label

上面代码中,先引入头文件,然后创建provider,通过get来获取image与label两个张量。这是并没有真的读取到数据,只是构建图的过程,具体数据需要通过session启动队列线程后才可以。

下面我们启动session读取数据。

if __name__ == ‘__main__‘:
    #test()
    #读取一张图片,以及对应的标签
    image,label = read_flower_image_and_label()

    ‘‘‘
    启动session,读取数据
    ‘‘‘
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        #创建一个协调器,管理线程
        coord = tf.train.Coordinator()  

        #启动QueueRunner, 此时文件名才开始进队。
        threads=tf.train.start_queue_runners(sess=sess,coord=coord)                      

        img, lab = sess.run([image, label])
        plt.imshow(img)
        plt.title(‘Original image‘)
        plt.show()

        #终止线程
        coord.request_stop()
        coord.join(threads)  

四 在slim中训练模型

slim模块共享了模型的训练代码,使用者不再需要关注模型代码,只需通过命令行方式即可完成训练、微调、测试等任务。

对于linux用户,在slim的scripts文件夹下还提供了模型下载、训练、预训练、微调、测试等一条龙的完整shell脚本,如果你是windows,也可以在命令行下一条一条地复制命令并执行。

1.从头训练

训练模型的代码被放在slim下的train_image_classifier.py文件里,在该文件所在路径下,这里使用flower数据集来训练Inception_v3网络模型。在命令行下执行:

python train_image_classifier.py  --train_dir=./log/train_logs --dataset_name=flowers --dataset_split_name=train --dataset_dir=./datasets/data/flowers --model_name=inception_v3

2 预训练模型

预训练是在别人训练好的模型上进行二次训练,以得到自己想要的模型。可以帮你省去大量的时间。一些高质量的模型都是通过了大量的数据样本训练而来。Github上提供了很多训练好的模型(在Imagenet数据集),可以在https://github.com/tensorflow/models/tree/master/research/slim/#Pretrained中下载。

Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, we provide various pre-trained models, as listed below. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.

In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales.

下载完预训练模型后,只要在上一节命令中添加一个参数checkpoint_path即可。

--checkpoint_path = 模型路径

checkpoint_path 里的模型是用于预训练模型的参数初始化,在训练过程中不会改变,新产生的模型会被保存在--train_dir路径下。
注意:预训练时使用的样本必须与原来的输入尺寸和输出的分类个数一致。这些下载的模型都是分成1000类的,如果你不想分这么多类,可以使用下面的微调方法。

3微调fine-tuning

上述的预训练模型都是在imagenet上训练的,最终输出的是1000个分类,如果我们想使用预训练模型训练自己的数据集,就要微调了。

在微调的过程中,需要将原有模型中的最后一层去掉,换成自己的数据集对应的分类层,例如我们要训练flowers数据集,就需要将1000个输出换成10个输出。

具体做法如下:

  • 通过参数--checkpoint_exclude_scopes指定载入预训练时哪一层的权重不被载入。
  • 再通过--trainable_scopes参数指定对哪一层的参数进行训练,当--trainable_scopes出现时,没有被指定训练的参数将在训练中被冻结。

举例:使用inception_v3的模型进行微调,使其可以训练flowers数据集。将下载好的模型inception_v3.ckpt解压后放在当前目录文件夹inception_v3下,通过cmd进入命令行来到slim文件下,运行命令:

python train_image_classifier.py
    --train_dir=./log/in3--dataset_dir=./datasets/data/flowers--dataset_name=flowers
    --dataset_split_name=train
    --model_name=inception_v3
    --checkpoint_path=./inception_v3/inception_v3.ckpt--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
    --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits

在例子中,--checkpoint_path里的模型会被载入,将权重初始化成模型里的参数,同时--checkpoint_exclude_scopes限制了最后一层没有被初始化成模型里的参数。--trainable_scopes指定了只需训练最后新加的一层,这样在训练过程中被冻结的其它参数具有原来模型训练好的合适值,而新加入的一层则通过迭代在不断的优化自己的参数。

在微调过程中,还可以通过在上面命令中加入:

--max_number_of_steps=500

来指定训练步数。如果没有指定训练步数,默认会一致训练下去。更多的参数,可以去看train_image_classifier.py源码。另外Script中还有使用模型来识别图片的例子。

4 评估模型

To evaluate the performance of a model (whether pretrained or your own), you can use the eval_image_classifier.py script, as shown below.

Below we give an example of downloading the pretrained inception model and evaluating it on the imagenet dataset.

python eval_image_classifier.py
    --alsologtostderr
    --checkpoint_path=./log/in3/model.ckpt    --dataset_dir=./datasets/data/flowers    --dataset_name=flowers    --dataset_split_name=validation
    --model_name=inception_v3

指定的./log/in3/model.ckpt,为在微调中训练出来的模型文件。

5 打包模型

训练好的模型可以被打包到各个平台上使用,无论是iso,Android还是linux。具体是通过一个bazel开源工具实现的。详情参考:https://github.com/tensorflow/models/tree/master/research/slim/#Export

原文地址:https://www.cnblogs.com/zyly/p/9145081.html

时间: 2024-10-27 00:44:49

第二十二节,TensorFlow中的图片分类模型库slim的使用的相关文章

[ExtJS5学习笔记]第二十二节 Extjs5中使用beforeLabelTpl配置给标签增加必填选项星号标志

本文地址:http://blog.csdn.net/sushengmiyan/article/details/39395753 官方例子:http://docs.sencha.com/extjs/5.0/apidocs/#!/api/Ext.form.Labelable-cfg-beforeLabelTpl 本文作者:sushengmiyan -----------------------------------------------------------------------------

TensorFlow中数据读取之tfrecords

关于Tensorflow读取数据,官网给出了三种方法: 供给数据(Feeding): 在TensorFlow程序运行的每一步, 让Python代码来供给数据. 从文件读取数据: 在TensorFlow图的起始, 让一个输入管线从文件中读取数据. 预加载数据: 在TensorFlow图中定义常量或变量来保存所有数据(仅适用于数据量比较小的情况). 对于数据量较小而言,可能一般选择直接将数据加载进内存,然后再分batch输入网络进行训练(tip:使用这种方法时,结合yield 使用更为简洁,大家自己

第二十二节,TensorFlow中RNN实现一些其它知识补充

一 初始化RNN 上一节中介绍了 通过cell类构建RNN的函数,其中有一个参数initial_state,即cell初始状态参数,TensorFlow中封装了对其初始化的方法. 1.初始化为0 对于正向或反向,第一个cell传入时没有之前的序列输出值,所以需要对其进行初始化.一般来讲,不用刻意取指定,系统会默认初始化为0,当然也可以手动指定其初始化为0. initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32) 2.初

Tensorflow中使用CNN实现Mnist手写体识别

本文参考Yann LeCun的LeNet5经典架构,稍加ps得到下面适用于本手写识别的cnn结构,构造一个两层卷积神经网络,神经网络的结构如下图所示: 输入-卷积-pooling-卷积-pooling-全连接层-Dropout-Softmax输出 第一层卷积利用5*5的patch,32个卷积核,可以计算出32个特征.然后进行maxpooling.第二层卷积利用5*5的patch,64个卷积核,可以计算出64个特征.然后进行max pooling.卷积核的个数是我们自己设定,可以增加卷积核数目提高

Android 使用ContentProvider扫描手机中的图片,仿微信显示本地图片效果

首先我们先看第一个界面吧,使用将手机中的图片扫描出来,然后根据图片的所在的文件夹将其分类出来,并显示所在文件夹里面的一张图片和文件夹中图片个数,我们根据界面元素(文件夹名, 文件夹图片个数,文件夹中的一张图片)使用一个实体对象ImageBean来封装这三个属性 package com.example.imagescan; /** * GridView的每个item的数据对象 * * @author len * */ public class ImageBean{ /** * 文件夹的第一张图片路

php将图片保存到mysql数据库及从数据库中读取图片的方法源码 转

php将图片保存到mysql数据库及从数据库中读取图片的方法源码 分类: 网站 2012-03-11 15:25 5059人阅读 评论(0) 收藏 举报 数据库mysqlphpsql serverquerydatabase 一般来讲都是把图片保存到服务器下,然后根据路径读出的,但是有时候出于安全及版权什么的考虑,会把图片保存到mysql的数据库中,然后再读出来,这样的图片点击右键属性,是看不到图片地址的.下面逍遥一生就介绍下如何用php把图片存储到mysql中及如何读出.     MySQL数据

第十四节,TensorFlow中的反卷积,反池化操作以及gradients的使用

反卷积是指,通过测量输出和已知输入重构未知输入的过程.在神经网络中,反卷积过程并不具备学习的能力,仅仅是用于可视化一个已经训练好的卷积神经网络,没有学习训练的过程.反卷积有着许多特别的应用,一般可以用于信道均衡.图像恢复.语音识别.地震学.无损探伤等未知输入估计和过程辨识方面的问题. 在神经网络的研究中,反卷积更多的是充当可视化的作用,对于一个复杂的深度卷积网络,通过每层若干个卷积核的变换,我们无法知道每个卷积核关注的是什么,变换后的特征是什么样子.通过反卷积的还原,可以对这些问题有个清晰的可视

TensorFlow 中的卷积网络

TensorFlow 中的卷积网络 是时候看一下 TensorFlow 中的卷积神经网络的例子了. 网络的结构跟经典的 CNNs 结构一样,是卷积层,最大池化层和全链接层的混合. 这里你看到的代码与你在 TensorFlow 深度神经网络的代码类似,我们按 CNN 重新组织了结构. 如那一节一样,这里你将会学习如何分解一行一行的代码.你还可以下载代码自己运行. 感谢 Aymeric Damien 提供了这节课的原始 TensorFlow 模型. 现在开看下! 数据集 你从之前的课程中见过这节课的

PyTorch LSTM的一个简单例子:实现MNIST图片分类

在上一篇博客中,我们实现了用LSTM对单词进行词性判断,本篇博客我们将实现用LSTM对MNIST图片分类.MNIST图片的大小为28*28,我们将其看成长度为28的序列,序列中的每个数据的维度是28,这样我们就可以把它变成一个序列数据了.代码如下. ''' 本程序实现用LSTM对MNIST进行图片分类 ''' import torch import numpy as np import torch.nn as nn import torch.utils.data as Data import t