SSD框架训练自己的数据集

SSD demo中详细介绍了如何在VOC数据集上使用SSD进行物体检测的训练和验证。本文介绍如何使用SSD实现对自己数据集的训练和验证过程,内容包括:
1 数据集的标注2 数据集的转换3 使用SSD如何训练4 使用SSD如何测试

1 数据集的标注 

  数据的标注使用BBox-Label-Tool工具,该工具使用python实现,使用简单方便。该工具生成的标签格式是:object_numberx1min y1min x1max y1maxx2min y2min x2max y2max...2 数据集的转换  caffe训练使用LMDB格式的数据,ssd框架中提供了voc数据格式转换成LMDB格式的脚本。所以实践中先将BBox-Label-Tool标注的数据转换成voc数据格式,然后再转换成LMDB格式。

2.1 voc数据格式


(1)Annotations中保存的是xml格式的label信息

<?xml version="1.0" ?>
<annotation>
    <folder>VOC2007</folder>
    <filename>1.jpg</filename>
    <source>
        <database>My Database</database>
        <annotation>VOC2007</annotation>
        <image>flickr</image>
        <flickrid>NULL</flickrid>
    </source>
    <owner>
        <flickrid>NULL</flickrid>
        <name>idaneel</name>
    </owner>
    <size>
        <width>320</width>
        <height>240</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>door</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>109</xmin>
            <ymin>3</ymin>
            <xmax>199</xmax>
            <ymax>204</ymax>
        </bndbox>
    </object>
</annotation>

VOC XML内容信息





(2)ImageSet目录下的Main目录里存放的是用于表示训练的图片集和测试的图片集




(3)JPEGImages目录下存放所有图片集






(4)label目录下保存的是BBox-Label-Tool工具标注好的bounding box坐标文件,该目录下的文件就是待转换的label标签文件。





2.2 Label转换成VOC数据格式
BBox-Label-Tool工具标注好的bounding box坐标文件转换成VOC数据格式的形式.具体的转换过程包括了两个步骤:(1)将BBox-Label-Tool下的txt格式保存的bounding box信息转换成VOC数据格式下以xml方式表示;(2)生成用于训练的数据集和用于测试的数据集。用python实现了上述两个步骤的换转。createXml.py  完成txt到xml的转换;  执行脚本./createXml.py %classname%

#!/usr/bin/env python

import os
import sys
import cv2
from itertools import islice
from xml.dom.minidom import Document

labels=‘label‘
imgpath=‘JPEGImages/‘
xmlpath_new=‘Annotations/‘
foldername=‘VOC2007‘

try:
    labelName = sys.argv[1]
except:
    print ‘Please input class name‘
    print ‘./createXml dog‘
    os._exit(0)

def insertObject(doc, datas):
    obj = doc.createElement(‘object‘)
    name = doc.createElement(‘name‘)
    name.appendChild(doc.createTextNode(labelName))
    obj.appendChild(name)
    pose = doc.createElement(‘pose‘)
    pose.appendChild(doc.createTextNode(‘Unspecified‘))
    obj.appendChild(pose)
    truncated = doc.createElement(‘truncated‘)
    truncated.appendChild(doc.createTextNode(str(0)))
    obj.appendChild(truncated)
    difficult = doc.createElement(‘difficult‘)
    difficult.appendChild(doc.createTextNode(str(0)))
    obj.appendChild(difficult)
    bndbox = doc.createElement(‘bndbox‘)

    xmin = doc.createElement(‘xmin‘)
    xmin.appendChild(doc.createTextNode(str(datas[0])))
    bndbox.appendChild(xmin)

    ymin = doc.createElement(‘ymin‘)
    ymin.appendChild(doc.createTextNode(str(datas[1])))
    bndbox.appendChild(ymin)
    xmax = doc.createElement(‘xmax‘)
    xmax.appendChild(doc.createTextNode(str(datas[2])))
    bndbox.appendChild(xmax)
    ymax = doc.createElement(‘ymax‘)
    ymax.appendChild(doc.createTextNode(str(datas[3])[0:-1]))
    bndbox.appendChild(ymax)
    obj.appendChild(bndbox)
    return obj

def create():
    for walk in os.walk(labels):
        for each in walk[2]:
            fidin=open(walk[0] + ‘/‘+ each,‘r‘)
            objIndex = 0
            for data in islice(fidin, 1, None):
                objIndex += 1
                data=data.strip(‘\n‘)
                datas = data.split(‘ ‘)
                pictureName = each.replace(‘.txt‘, ‘.jpg‘)
                imageFile = imgpath + pictureName
                img = cv2.imread(imageFile)
                imgSize = img.shape
                if 1 == objIndex:
                    xmlName = each.replace(‘.txt‘, ‘.xml‘)
                    f = open(xmlpath_new + xmlName, "w")
                    doc = Document()
                    annotation = doc.createElement(‘annotation‘)
                    doc.appendChild(annotation)

                    folder = doc.createElement(‘folder‘)
                    folder.appendChild(doc.createTextNode(foldername))
                    annotation.appendChild(folder)

                    filename = doc.createElement(‘filename‘)
                    filename.appendChild(doc.createTextNode(pictureName))
                    annotation.appendChild(filename)

                    source = doc.createElement(‘source‘)
                    database = doc.createElement(‘database‘)
                    database.appendChild(doc.createTextNode(‘My Database‘))
                    source.appendChild(database)
                    source_annotation = doc.createElement(‘annotation‘)
                    source_annotation.appendChild(doc.createTextNode(foldername))
                    source.appendChild(source_annotation)
                    image = doc.createElement(‘image‘)
                    image.appendChild(doc.createTextNode(‘flickr‘))
                    source.appendChild(image)
                    flickrid = doc.createElement(‘flickrid‘)
                    flickrid.appendChild(doc.createTextNode(‘NULL‘))
                    source.appendChild(flickrid)
                    annotation.appendChild(source)

                    owner = doc.createElement(‘owner‘)
                    flickrid = doc.createElement(‘flickrid‘)
                    flickrid.appendChild(doc.createTextNode(‘NULL‘))
                    owner.appendChild(flickrid)
                    name = doc.createElement(‘name‘)
                    name.appendChild(doc.createTextNode(‘idaneel‘))
                    owner.appendChild(name)
                    annotation.appendChild(owner)

                    size = doc.createElement(‘size‘)
                    width = doc.createElement(‘width‘)
                    width.appendChild(doc.createTextNode(str(imgSize[1])))
                    size.appendChild(width)
                    height = doc.createElement(‘height‘)
                    height.appendChild(doc.createTextNode(str(imgSize[0])))
                    size.appendChild(height)
                    depth = doc.createElement(‘depth‘)
                    depth.appendChild(doc.createTextNode(str(imgSize[2])))
                    size.appendChild(depth)
                    annotation.appendChild(size)

                    segmented = doc.createElement(‘segmented‘)
                    segmented.appendChild(doc.createTextNode(str(0)))
                    annotation.appendChild(segmented)
                    annotation.appendChild(insertObject(doc, datas))
                else:
                    annotation.appendChild(insertObject(doc, datas))
            try:
                f.write(doc.toprettyxml(indent = ‘    ‘))
                f.close()
                fidin.close()
            except:
                pass

if __name__ == ‘__main__‘:
    create()

createXml.py


createTest.py 生成训练集和测试集标识文件; 执行脚本

./createTest.py %startID% %endID% %testNumber%


#!/usr/bin/env python

import os
import sys
import random

try:
    start = int(sys.argv[1])
    end = int(sys.argv[2])
    test = int(sys.argv[3])
    allNum = end-start+1
except:
    print ‘Please input picture range‘
    print ‘./createTest.py 1 1500 500‘
    os._exit(0)

b_list = range(start,end)
blist_webId = random.sample(b_list, test)
blist_webId = sorted(blist_webId)
allFile = []

testFile = open(‘ImageSets/Main/test.txt‘, ‘w‘)
trainFile = open(‘ImageSets/Main/trainval.txt‘, ‘w‘)

for i in range(allNum):
    allFile.append(i+1)

for test in blist_webId:
    allFile.remove(test)
    testFile.write(str(test) + ‘\n‘)

for train in allFile:
    trainFile.write(str(train) + ‘\n‘)
testFile.close()
trainFile.close()

createTest.py


说明: 由于BBox-Label-Tool实现相对简单,该工具每次只能对一个类别进行打标签,所以转换脚本

每一次也是对一个类别进行数据的转换,这个问题后续需要优化改进。

2.3  VOC数据转换成LMDB数据

  SSD提供了VOC数据到LMDB数据的转换脚本 data/VOC0712/create_list.sh 和 ./data/VOC0712/create_data.sh,这两个脚本是完全针对VOC0712目录下的数据进行的转换。  实现中为了不破坏VOC0712目录下的数据内容,针对我们自己的数据集,修改了上面这两个脚本,将脚本中涉及到VOC0712的信息替换成我们自己的目录信息。在处理我们的数据集时,将VOC0712替换成indoor。具体的步骤如下:  (1) 在 $HOME/data/VOCdevkit目录下创建indoor目录,该目录中存放自己转换完成的VOC数据集;  (2) $CAFFE_ROOT/examples目录下创建indoor目录;        (3) $CAFFE_ROOT/data目录下创建indoor目录,同时将data/VOC0712下的create_list.sh,create_data.sh,labelmap_voc.prototxt这三个文件copy到indoor目录下,分别重命名为create_list_indoor.sh,create_data_indoor.sh, labelmap_indoor.prototxt  (4)对上面新生成的两个create文件进行修改,主要修改是将VOC0712相关的信息替换成indoor  修改后的这两个文件分别为:  

#!/bin/bash

root_dir=$HOME/data/VOCdevkit/
sub_dir=ImageSets/Main
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

for dataset in trainval test
do
  dst_file=$bash_dir/$dataset.txt
  if [ -f $dst_file ]
  then
    rm -f $dst_file
  fi
  for name in indoor
  do
    if [[ $dataset == "test" && $name == "VOC2012" ]]
    then
      continue
    fi
    echo "Create list for $name $dataset..."
    dataset_file=$root_dir/$name/$sub_dir/$dataset.txt

    img_file=$bash_dir/$dataset"_img.txt"
    cp $dataset_file $img_file
    sed -i "s/^/$name\/JPEGImages\//g" $img_file
    sed -i "s/$/.jpg/g" $img_file

    label_file=$bash_dir/$dataset"_label.txt"
    cp $dataset_file $label_file
    sed -i "s/^/$name\/Annotations\//g" $label_file
    sed -i "s/$/.xml/g" $label_file

    paste -d‘ ‘ $img_file $label_file >> $dst_file

    rm -f $label_file
    rm -f $img_file
  done
  # Generate image name and size infomation.
  if [ $dataset == "test" ]
  then
    $bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt"
  fi

  # Shuffle trainval file.
  if [ $dataset == "trainval" ]
  then
    rand_file=$dst_file.random
    cat $dst_file | perl -MList::Util=shuffle -e ‘print shuffle(<STDIN>);‘ > $rand_file
    mv $rand_file $dst_file
  fi
done

create_list_indoor.sh


cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$cur_dir/../..

cd $root_dir

redo=1
data_root_dir="$HOME/data/VOCdevkit"
dataset_name="indoor"
mapfile="$root_dir/data/$dataset_name/labelmap_indoor.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi
for subset in test trainval
do
  python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $root_dir/data/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$db examples/$dataset_name
done

create_data_indoor.sh

        (5)修改labelmap_indoor.prototxt,将该文件中的类别修改成和自己的数据集相匹配,注意需要保留一个label 0 , background类别

item {
  name: "none_of_the_above"
  label: 0
  display_name: "background"
}
item {
  name: "door"
  label: 1
  display_name: "door"
}

labelmap_indoor.prototxt


  完成上面步骤的修改后,可以开始LMDB数据数据的制作,在$CAFFE_ROOT目录下分别运行:

  ./data/indoor/create_list_indoor.sh

  ./data/indoor/create_data_indoor.sh

  命令执行完毕后,可以在$CAFFE_ROOT/indoor目录下查看转换完成的LMDB数据数据。

3 使用SSD进行自己数据集的训练

训练时使用ssd demo中提供的预训练好的VGGnet model : VGG_ILSVRC_16_layers_fc_reduced.caffemodel将该模型保存到$CAFFE_ROOT/models/VGGNet下。将ssd_pascal.py copy一份 ssd_pascal_indoor.py文件, 根据自己的数据集修改ssd_pascal_indoor.py主要修改点: (1)train_data和test_data修改成指向自己的数据集LMDB   train_data = "examples/indoor/indoor_trainval_lmdb"            test_data = "examples/indoor/indoor_test_lmdb"(2) num_test_image该变量修改成自己数据集中测试数据的数量(3)num_classes 该变量修改成自己数据集中 标签类别数量数 + 1

针对我的数据集,ssd_pascal_indoor.py的内容为:

from __future__ import print_function
import caffe
from caffe.model_libs import *
from google.protobuf import text_format

import math
import os
import shutil
import stat
import subprocess
import sys

# Add extra layers on top of a "base" network (e.g. VGGNet or Inception).
def AddExtraLayers(net, use_batchnorm=True):
    use_relu = True

    # Add additional convolutional layers.
    from_layer = net.keys()[-1]
    # TODO(weiliu89): Construct the name using the last layer to avoid duplication.
    out_layer = "conv6_1"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 1, 0, 1)

    from_layer = out_layer
    out_layer = "conv6_2"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512, 3, 1, 2)

    for i in xrange(7, 9):
      from_layer = out_layer
      out_layer = "conv{}_1".format(i)
      ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1)

      from_layer = out_layer
      out_layer = "conv{}_2".format(i)
      ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 1, 2)

    # Add global pooling layer.
    name = net.keys()[-1]
    net.pool6 = L.Pooling(net[name], pool=P.Pooling.AVE, global_pooling=True)

    return net

### Modify the following parameters accordingly ###
# The directory which contains the caffe code.
# We assume you are running the script at the CAFFE_ROOT.
caffe_root = os.getcwd()

# Set true if you want to start training right after generating all files.
run_soon = True
# Set true if you want to load from most recently saved snapshot.
# Otherwise, we will load from the pretrain_model defined below.
resume_training = True
# If true, Remove old model files.
remove_old_models = False

# The database file for training data. Created by data/VOC0712/create_data.sh
train_data = "examples/indoor/indoor_trainval_lmdb"
# The database file for testing data. Created by data/VOC0712/create_data.sh
test_data = "examples/indoor/indoor_test_lmdb"
# Specify the batch sampler.
resize_width = 300
resize_height = 300
resize = "{}x{}".format(resize_width, resize_height)
batch_sampler = [
        {
                ‘sampler‘: {
                        },
                ‘max_trials‘: 1,
                ‘max_sample‘: 1,
        },
        {
                ‘sampler‘: {
                        ‘min_scale‘: 0.3,
                        ‘max_scale‘: 1.0,
                        ‘min_aspect_ratio‘: 0.5,
                        ‘max_aspect_ratio‘: 2.0,
                        },
                ‘sample_constraint‘: {
                        ‘min_jaccard_overlap‘: 0.1,
                        },
                ‘max_trials‘: 50,
                ‘max_sample‘: 1,
        },
        {
                ‘sampler‘: {
                        ‘min_scale‘: 0.3,
                        ‘max_scale‘: 1.0,
                        ‘min_aspect_ratio‘: 0.5,
                        ‘max_aspect_ratio‘: 2.0,
                        },
                ‘sample_constraint‘: {
                        ‘min_jaccard_overlap‘: 0.3,
                        },
                ‘max_trials‘: 50,
                ‘max_sample‘: 1,
        },
        {
                ‘sampler‘: {
                        ‘min_scale‘: 0.3,
                        ‘max_scale‘: 1.0,
                        ‘min_aspect_ratio‘: 0.5,
                        ‘max_aspect_ratio‘: 2.0,
                        },
                ‘sample_constraint‘: {
                        ‘min_jaccard_overlap‘: 0.5,
                        },
                ‘max_trials‘: 50,
                ‘max_sample‘: 1,
        },
        {
                ‘sampler‘: {
                        ‘min_scale‘: 0.3,
                        ‘max_scale‘: 1.0,
                        ‘min_aspect_ratio‘: 0.5,
                        ‘max_aspect_ratio‘: 2.0,
                        },
                ‘sample_constraint‘: {
                        ‘min_jaccard_overlap‘: 0.7,
                        },
                ‘max_trials‘: 50,
                ‘max_sample‘: 1,
        },
        {
                ‘sampler‘: {
                        ‘min_scale‘: 0.3,
                        ‘max_scale‘: 1.0,
                        ‘min_aspect_ratio‘: 0.5,
                        ‘max_aspect_ratio‘: 2.0,
                        },
                ‘sample_constraint‘: {
                        ‘min_jaccard_overlap‘: 0.9,
                        },
                ‘max_trials‘: 50,
                ‘max_sample‘: 1,
        },
        {
                ‘sampler‘: {
                        ‘min_scale‘: 0.3,
                        ‘max_scale‘: 1.0,
                        ‘min_aspect_ratio‘: 0.5,
                        ‘max_aspect_ratio‘: 2.0,
                        },
                ‘sample_constraint‘: {
                        ‘max_jaccard_overlap‘: 1.0,
                        },
                ‘max_trials‘: 50,
                ‘max_sample‘: 1,
        },
        ]
train_transform_param = {
        ‘mirror‘: True,
        ‘mean_value‘: [104, 117, 123],
        ‘resize_param‘: {
                ‘prob‘: 1,
                ‘resize_mode‘: P.Resize.WARP,
                ‘height‘: resize_height,
                ‘width‘: resize_width,
                ‘interp_mode‘: [
                        P.Resize.LINEAR,
                        P.Resize.AREA,
                        P.Resize.NEAREST,
                        P.Resize.CUBIC,
                        P.Resize.LANCZOS4,
                        ],
                },
        ‘emit_constraint‘: {
            ‘emit_type‘: caffe_pb2.EmitConstraint.CENTER,
            }
        }
test_transform_param = {
        ‘mean_value‘: [104, 117, 123],
        ‘resize_param‘: {
                ‘prob‘: 1,
                ‘resize_mode‘: P.Resize.WARP,
                ‘height‘: resize_height,
                ‘width‘: resize_width,
                ‘interp_mode‘: [P.Resize.LINEAR],
                },
        }

# If true, use batch norm for all newly added layers.
# Currently only the non batch norm version has been tested.
use_batchnorm = False
# Use different initial learning rate.
if use_batchnorm:
    base_lr = 0.0004
else:
    # A learning rate for batch_size = 1, num_gpus = 1.
    base_lr = 0.00004

# Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_VOC0712_{}".format(job_name)

# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the detection results.
output_result_dir = "{}/data/VOCdevkit/results/VOC2007/{}/Main".format(os.environ[‘HOME‘], job_name)

# model definition files.
train_net_file = "{}/train.prototxt".format(save_dir)
test_net_file = "{}/test.prototxt".format(save_dir)
deploy_net_file = "{}/deploy.prototxt".format(save_dir)
solver_file = "{}/solver.prototxt".format(save_dir)
# snapshot prefix.
snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)
# job script path.
job_file = "{}/{}.sh".format(job_dir, model_name)

# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "data/indoor/test_name_size.txt"
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
# Stores LabelMapItem.
label_map_file = "data/indoor/labelmap_indoor.prototxt"

# MultiBoxLoss parameters.
num_classes = 2
share_location = True
background_label_id=0
train_on_diff_gt = True
normalization_mode = P.Loss.VALID
code_type = P.PriorBox.CENTER_SIZE
neg_pos_ratio = 3.
loc_weight = (neg_pos_ratio + 1.) / 4.
multibox_loss_param = {
    ‘loc_loss_type‘: P.MultiBoxLoss.SMOOTH_L1,
    ‘conf_loss_type‘: P.MultiBoxLoss.SOFTMAX,
    ‘loc_weight‘: loc_weight,
    ‘num_classes‘: num_classes,
    ‘share_location‘: share_location,
    ‘match_type‘: P.MultiBoxLoss.PER_PREDICTION,
    ‘overlap_threshold‘: 0.5,
    ‘use_prior_for_matching‘: True,
    ‘background_label_id‘: background_label_id,
    ‘use_difficult_gt‘: train_on_diff_gt,
    ‘do_neg_mining‘: True,
    ‘neg_pos_ratio‘: neg_pos_ratio,
    ‘neg_overlap‘: 0.5,
    ‘code_type‘: code_type,
    }
loss_param = {
    ‘normalization‘: normalization_mode,
    }

# parameters for generating priors.
# minimum dimension of input image
min_dim = 300
# conv4_3 ==> 38 x 38
# fc7 ==> 19 x 19
# conv6_2 ==> 10 x 10
# conv7_2 ==> 5 x 5
# conv8_2 ==> 3 x 3
# pool6 ==> 1 x 1
mbox_source_layers = [‘conv4_3‘, ‘fc7‘, ‘conv6_2‘, ‘conv7_2‘, ‘conv8_2‘, ‘pool6‘]
# in percent %
min_ratio = 20
max_ratio = 95
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio in xrange(min_ratio, max_ratio + 1, step):
  min_sizes.append(min_dim * ratio / 100.)
  max_sizes.append(min_dim * (ratio + step) / 100.)
min_sizes = [min_dim * 10 / 100.] + min_sizes
max_sizes = [[]] + max_sizes
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]]
# L2 normalize conv4_3.
normalizations = [20, -1, -1, -1, -1, -1]
# variance used to encode/decode prior bboxes.
if code_type == P.PriorBox.CENTER_SIZE:
  prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
  prior_variance = [0.1]
flip = True
clip = True

# Solver parameters.
# Defining which GPUs to use.
gpus = "0"
gpulist = gpus.split(",")
num_gpus = len(gpulist)

# Divide the mini-batch to different GPUs.
batch_size = 4
accum_batch_size = 32
iter_size = accum_batch_size / batch_size
solver_mode = P.Solver.CPU
device_id = 0
batch_size_per_device = batch_size
if num_gpus > 0:
  batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))
  iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))
  solver_mode = P.Solver.GPU
  device_id = int(gpulist[0])

if normalization_mode == P.Loss.NONE:
  base_lr /= batch_size_per_device
elif normalization_mode == P.Loss.VALID:
  base_lr *= 25. / loc_weight
elif normalization_mode == P.Loss.FULL:
  # Roughly there are 2000 prior bboxes per image.
  # TODO(weiliu89): Estimate the exact # of priors.
  base_lr *= 2000.

# Which layers to freeze (no backward) during training.
freeze_layers = [‘conv1_1‘, ‘conv1_2‘, ‘conv2_1‘, ‘conv2_2‘]

# Evaluate on whole test set.
num_test_image = 800
test_batch_size = 1
test_iter = num_test_image / test_batch_size

solver_param = {
    # Train parameters
    ‘base_lr‘: base_lr,
    ‘weight_decay‘: 0.0005,
    ‘lr_policy‘: "step",
    ‘stepsize‘: 40000,
    ‘gamma‘: 0.1,
    ‘momentum‘: 0.9,
    ‘iter_size‘: iter_size,
    ‘max_iter‘: 60000,
    ‘snapshot‘: 40000,
    ‘display‘: 10,
    ‘average_loss‘: 10,
    ‘type‘: "SGD",
    ‘solver_mode‘: solver_mode,
    ‘device_id‘: device_id,
    ‘debug_info‘: False,
    ‘snapshot_after_train‘: True,
    # Test parameters
    ‘test_iter‘: [test_iter],
    ‘test_interval‘: 10000,
    ‘eval_type‘: "detection",
    ‘ap_version‘: "11point",
    ‘test_initialization‘: False,
    }

# parameters for generating detection output.
det_out_param = {
    ‘num_classes‘: num_classes,
    ‘share_location‘: share_location,
    ‘background_label_id‘: background_label_id,
    ‘nms_param‘: {‘nms_threshold‘: 0.45, ‘top_k‘: 400},
    ‘save_output_param‘: {
        ‘output_directory‘: output_result_dir,
        ‘output_name_prefix‘: "comp4_det_test_",
        ‘output_format‘: "VOC",
        ‘label_map_file‘: label_map_file,
        ‘name_size_file‘: name_size_file,
        ‘num_test_image‘: num_test_image,
        },
    ‘keep_top_k‘: 200,
    ‘confidence_threshold‘: 0.01,
    ‘code_type‘: code_type,
    }

# parameters for evaluating detection results.
det_eval_param = {
    ‘num_classes‘: num_classes,
    ‘background_label_id‘: background_label_id,
    ‘overlap_threshold‘: 0.5,
    ‘evaluate_difficult_gt‘: False,
    ‘name_size_file‘: name_size_file,
    }

### Hopefully you don‘t need to change the following ###
# Check file.
check_if_exist(train_data)
check_if_exist(test_data)
check_if_exist(label_map_file)
check_if_exist(pretrain_model)
make_if_not_exist(save_dir)
make_if_not_exist(job_dir)
make_if_not_exist(snapshot_dir)

# Create train net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,
        train=True, output_label=True, label_map_file=label_map_file,
        transform_param=train_transform_param, batch_sampler=batch_sampler)

VGGNetBody(net, from_layer=‘data‘, fully_conv=True, reduced=True, dilated=True,
    dropout=False, freeze_layers=freeze_layers)

AddExtraLayers(net, use_batchnorm)

mbox_layers = CreateMultiBoxHead(net, data_layer=‘data‘, from_layers=mbox_source_layers,
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
        aspect_ratios=aspect_ratios, normalizations=normalizations,
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
        prior_variance=prior_variance, kernel_size=3, pad=1)

# Create the MultiBoxLossLayer.
name = "mbox_loss"
mbox_layers.append(net.label)
net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,
        loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value(‘TRAIN‘)),
        propagate_down=[True, True, False, False])

with open(train_net_file, ‘w‘) as f:
    print(‘name: "{}_train"‘.format(model_name), file=f)
    print(net.to_proto(), file=f)
shutil.copy(train_net_file, job_dir)

# Create test net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,
        train=False, output_label=True, label_map_file=label_map_file,
        transform_param=test_transform_param)

VGGNetBody(net, from_layer=‘data‘, fully_conv=True, reduced=True, dilated=True,
    dropout=False, freeze_layers=freeze_layers)

AddExtraLayers(net, use_batchnorm)

mbox_layers = CreateMultiBoxHead(net, data_layer=‘data‘, from_layers=mbox_source_layers,
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
        aspect_ratios=aspect_ratios, normalizations=normalizations,
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
        prior_variance=prior_variance, kernel_size=3, pad=1)

conf_name = "mbox_conf"
if multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:
  reshape_name = "{}_reshape".format(conf_name)
  net[reshape_name] = L.Reshape(net[conf_name], shape=dict(dim=[0, -1, num_classes]))
  softmax_name = "{}_softmax".format(conf_name)
  net[softmax_name] = L.Softmax(net[reshape_name], axis=2)
  flatten_name = "{}_flatten".format(conf_name)
  net[flatten_name] = L.Flatten(net[softmax_name], axis=1)
  mbox_layers[1] = net[flatten_name]
elif multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:
  sigmoid_name = "{}_sigmoid".format(conf_name)
  net[sigmoid_name] = L.Sigmoid(net[conf_name])
  mbox_layers[1] = net[sigmoid_name]

net.detection_out = L.DetectionOutput(*mbox_layers,
    detection_output_param=det_out_param,
    include=dict(phase=caffe_pb2.Phase.Value(‘TEST‘)))
net.detection_eval = L.DetectionEvaluate(net.detection_out, net.label,
    detection_evaluate_param=det_eval_param,
    include=dict(phase=caffe_pb2.Phase.Value(‘TEST‘)))

with open(test_net_file, ‘w‘) as f:
    print(‘name: "{}_test"‘.format(model_name), file=f)
    print(net.to_proto(), file=f)
shutil.copy(test_net_file, job_dir)

# Create deploy net.
# Remove the first and last layer from test net.
deploy_net = net
with open(deploy_net_file, ‘w‘) as f:
    net_param = deploy_net.to_proto()
    # Remove the first (AnnotatedData) and last (DetectionEvaluate) layer from test net.
    del net_param.layer[0]
    del net_param.layer[-1]
    net_param.name = ‘{}_deploy‘.format(model_name)
    net_param.input.extend([‘data‘])
    net_param.input_shape.extend([
        caffe_pb2.BlobShape(dim=[1, 3, resize_height, resize_width])])
    print(net_param, file=f)
shutil.copy(deploy_net_file, job_dir)

# Create solver.
solver = caffe_pb2.SolverParameter(
        train_net=train_net_file,
        test_net=[test_net_file],
        snapshot_prefix=snapshot_prefix,
        **solver_param)

with open(solver_file, ‘w‘) as f:
    print(solver, file=f)
shutil.copy(solver_file, job_dir)

max_iter = 0
# Find most recent snapshot.
for file in os.listdir(snapshot_dir):
  if file.endswith(".solverstate"):
    basename = os.path.splitext(file)[0]
    iter = int(basename.split("{}_iter_".format(model_name))[1])
    if iter > max_iter:
      max_iter = iter

train_src_param = ‘--weights="{}" \\\n‘.format(pretrain_model)
if resume_training:
  if max_iter > 0:
    train_src_param = ‘--snapshot="{}_iter_{}.solverstate" \\\n‘.format(snapshot_prefix, max_iter)

if remove_old_models:
  # Remove any snapshots smaller than max_iter.
  for file in os.listdir(snapshot_dir):
    if file.endswith(".solverstate"):
      basename = os.path.splitext(file)[0]
      iter = int(basename.split("{}_iter_".format(model_name))[1])
      if max_iter > iter:
        os.remove("{}/{}".format(snapshot_dir, file))
    if file.endswith(".caffemodel"):
      basename = os.path.splitext(file)[0]
      iter = int(basename.split("{}_iter_".format(model_name))[1])
      if max_iter > iter:
        os.remove("{}/{}".format(snapshot_dir, file))

# Create job file.
with open(job_file, ‘w‘) as f:
  f.write(‘cd {}\n‘.format(caffe_root))
  f.write(‘./build/tools/caffe train \\\n‘)
  f.write(‘--solver="{}" \\\n‘.format(solver_file))
  f.write(train_src_param)
  if solver_param[‘solver_mode‘] == P.Solver.GPU:
    f.write(‘--gpu {} 2>&1 | tee {}/{}.log\n‘.format(gpus, job_dir, model_name))
  else:
    f.write(‘2>&1 | tee {}/{}.log\n‘.format(job_dir, model_name))

# Copy the python script to job_dir.
py_file = os.path.abspath(__file__)
shutil.copy(py_file, job_dir)

# Run the job.
os.chmod(job_file, stat.S_IRWXU)
if run_soon:
  subprocess.call(job_file, shell=True)

ssd_pascal_indoor.py

训练命令:python examples/ssd/ssd_pascal_indoor.py
 
未完待续......
时间: 2024-11-08 22:47:50

SSD框架训练自己的数据集的相关文章

可变卷积Deforable ConvNet 迁移训练自己的数据集 MXNet框架 GPU版

[引言] 最近在用可变卷积的rfcn 模型迁移训练自己的数据集, MSRA官方使用的MXNet框架 环境搭建及配置:http://www.cnblogs.com/andre-ma/p/8867031.html 一 参数修改: 1.1  ~/Deformable-ConvNets/experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml  文件中修改两个参数 (yaml文件包含对应训练脚本的一切配置信息和超参数)

物体检测算法 SSD 的训练和测试

GitHub:https://github.com/stoneyang/caffe_ssd Paper: https://arxiv.org/abs/1512.02325 1. 安装 caffe_SSD: git clone https://github.com/weiliu89/caffe.git cd caffe git checkout ssd 2. 编译该 caffe 文件,在主目录下: # Modify Makefile.config according to your Caffe i

在C#下使用TensorFlow.NET训练自己的数据集

在C#下使用TensorFlow.NET训练自己的数据集 今天,我结合代码来详细介绍如何使用 SciSharp STACK 的 TensorFlow.NET 来训练CNN模型,该模型主要实现 图像的分类 ,可以直接移植该代码在 CPU 或 GPU 下使用,并针对你们自己本地的图像数据集进行训练和推理.TensorFlow.NET是基于 .NET Standard 框架的完整实现的TensorFlow,可以支持 .NET Framework 或 .NET CORE , TensorFlow.NET

使用yolo3模型训练自己的数据集

使用yolo3模型训练自己的数据集 本项目地址:https://github.com/Cw-zero/Retrain-yolo3 一.运行环境 1. Ubuntu16.04. 2. TensorFlow-gpu 1.4.0 或更高版本. 3. Keras 2.2.4 . 4. numpy 1.15.2(实测1.16.1会报错). 二.创建数据集 1. 使用VOC2007数据集的文件结构: 文件结构如下图,可以自己创建,也可以下载VOC2007数据集后删除文件内容. 注:数据集中没有 test.p

Mask R-CNN图像实例分割实战:训练自己的数据集

Mask R-CNN是一种基于深度学习的图像实例分割方法,可对物体进行目标检测和像素级分割. 本课程将手把手地教大家使用VIA图像标注工具制作自己的数据集,并使用Mask R-CNN训练自己的数据集,从而能开展自己的图像分割应用. 课程链接:https://edu.51cto.com/course/18598.html 本课程有三个项目案例实践: (1) balloon实例分割 :对图像中的气球做检测和分割 (2) pothole(单类物体)实例分割:对汽车行驶场景中的路坑进行检测和分割 (3)

DeepLabv3+图像语义分割实战:训练自己的数据集

DeepLabv3+是一种非常先进的基于深度学习的图像语义分割方法,可对物体进行像素级分割. 本课程将手把手地教大家使用labelme图像标注工具制造自己的数据集,并使用DeepLabv3+训练自己的数据集,从而能开展自己的图像分割应用. 课程链接:https://edu.51cto.com/course/18817.html 本课程有两个项目实践: (1) CamVid语义分割 :对CamVid数据集进行语义分割 (2) RoadScene语义分割:对汽车行驶场景中的路坑.车.车道线等进行物体

U-Net图像语义分割实战:训练自己的数据集

U-Net是一种基于深度学习的图像语义分割方法,尤其在医学图像分割中表现优异. 本课程将手把手地教大家使用labelme图像标注工具制作自己的数据集,生成Mask图像,并使用U-Net训练自己的数据集,从而能开展自己的图像分割应用. 课程链接:https://edu.51cto.com/course/18936.html 本课程有三个项目实践: (1) Kaggle盐体识别比赛 :利用U-Net进行Kaggle盐体识别 (2) Pothole语义分割:对汽车行驶场景中的路坑进行标注和语义分割 (

PyTorch版Mask R-CNN图像实例分割实战:训练自己的数据集

Mask R-CNN是一种基于深度学习的图像实例分割方法,可对物体进行目标检测和像素级分割. 课程链接:https://edu.51cto.com/course/19920.html 本课程将手把手地教大家使用Labelme图像标注工具制作自己的数据集,并使用PyTorch版本的Mask R-CNN(Facebook 官方maskrcnn-benchmark)训练自己的数据集,从而能开展自己的图像分割应用. 本课程的具体项目实战案例是:对汽车行驶场景中的路坑.车.车道线等多类物体进行检测和分割

目标检测算法SSD在window环境下GPU配置训练自己的数据集

由于最近想试一下牛掰的目标检测算法SSD.于是乎,自己做了几千张数据(实际只有几百张,利用数据扩充算法比如镜像,噪声,切割,旋转等扩充到了几千张,其实还是很不够).于是在网上找了相关的介绍,自己处理数据转化为VOC数据集的格式,在转化为XML格式等等.具体方法可以参见以下几个博客.具体是window还是Linux请自行对号入座. Linux:http://blog.sina.com.cn/s/blog_4a1853330102x7yd.html window:http://blog.csdn.n