DeepLearning to digit recongnizer in kaggle

近期在看deeplearning，于是就找了kaggle上字符识别进行练习。这里我主要用两种工具箱进行求解。并比对两者的结果。

两种工具箱各自是DeepLearningToolbox和caffe。

DeeplearningToolbox源代码解析见：http://blog.csdn.net/lu597203933/article/details/46576017

Caffe学习见：http://caffe.berkeleyvision.org/

一：DeeplearningToolbox

DeeplearningToolbox基于matlab，很的简单，读下源代码，对于了解卷积神经网络等过程很有帮助。

这里我主要是对digit recongnizer给出的数据集进行预处理以使其适用于我们的deeplearningToolbox工具箱。主要包括两个.m文件，各自是predeal.m和cnntest.m文件。

所须要做的就是改变addpath的路径，代码凝视很具体，大家自己看。

代码

predeal.m

% use the deeplearnToolbox to solve the digit recongnizer in kaggle!
clear;clc
trainFile = ‘train.csv‘;
testFile = ‘test.csv‘;
fidId = fopen(trainFile);

M = csvread(trainFile, 1);   % 读取csv文件除第一行以外的全部数据
train_x = M(:, 2:end);    %第2列開始为数据data
label = M(:,1)‘;  %第一列为标签
label(label == 0) = 10;   % 不变为10 以下一句无法处理
train_y = full(sparse(label, 1:size(train_x, 1), 1));   %将标签变成一个矩阵

train_x = double(reshape(train_x‘,28,28,size(train_x, 1)))/255;  

fidId = fopen(‘test.csv‘);     %% 处理预測的数据
M = csvread(testFile, 1);   % 读取csv文件除第一行以外的全部数据
test_x = double(reshape(M‘,28,28,size(M, 1)))/255;
clear fidId label testFile M testFile trainFile

addpath D:\DeepLearning\DeepLearnToolbox-master\data\      %路径须要改下
addpath D:\DeepLearning\DeepLearnToolbox-master\CNNaddpath D:\DeepLearning\DeepLearnToolbox-master\util
rand(‘state‘,0)
cnn.layers = {        %%% 设置各层feature maps个数及卷积模板大小等属性
    struct(‘type‘, ‘i‘) %input layer
    struct(‘type‘, ‘c‘, ‘outputmaps‘, 6, ‘kernelsize‘, 5) %convolution layer
    struct(‘type‘, ‘s‘, ‘scale‘, 2) %sub sampling layer
    struct(‘type‘, ‘c‘, ‘outputmaps‘, 12, ‘kernelsize‘, 5) %convolution layer
    struct(‘type‘, ‘s‘, ‘scale‘, 2) %subsampling layer
};

opts.alpha = 0.01;   %迭代下降的速率
opts.batchsize = 50;   %每次选择50个样本进行更新  随机梯度下降。每次仅仅选用50个样本进行更新
opts.numepochs = 25;   %迭代次数
cnn = cnnsetup(cnn, train_x, train_y);      %对各层參数进行初始化 包含权重和偏置
cnn = cnntrain(cnn, train_x, train_y, opts);  %训练的过程，包含bp算法及迭代过程

test_y = cnntest(cnn, test_x);      %对測试数据集进行測试
test_y(test_y == 10) = 0;      %标签10 须要反转为0
test_y = test_y‘;
M = [(1:length(test_y))‘ test_y(:)];
csvwrite(‘test_y.csv‘, M);
figure; plot(cnn.rL);

cnntest.m

  function [test_y] = cnntest(net, x)
    %  feedforward
    net = cnnff(net, x);
    [~, test_y] = max(net.o);
end

结果：用deeplearningToolbox得到的结果并非非常好，仅仅有0.94586

二：caffe to digit recongnizer

尽管caffe自带了mnist对样例对字符进行处理。可是官网给出的数据是二进制的文件，得到的结果也仅仅是一个简单的准确率，所以不能无限制的套用。

过程例如以下：

1：将给定csv数据转变成lmdb格式

这里我在mnist的目录下写了一个convert_data_to_lmdb.cpp的程序对数据进行处理：

代码例如以下：

#include <iostream>
#include <string>
#include <sstream>
#include <gflags/gflags.h>

#include "boost/scoped_ptr.hpp"
#include "gflags/gflags.h"
#include "glog/logging.h"

#include "caffe/proto/caffe.pb.h"
#include "caffe/util/db.hpp"
#include "caffe/util/io.hpp"
#include "caffe/util/rng.hpp"

using namespace caffe;
using namespace std;
using std::pair;
using boost::scoped_ptr;

/* edited by Zack
 * argv[1] the input file, argv[2] the output file*/

DEFINE_string(backend, "lmdb", "The backend for storing the result");  // get Flags_backend == lmdb

int main(int argc, char **argv){
	::google::InitGoogleLogging(argv[0]);

	#ifndef GFLAGS_GFLAGS_H_
	  namespace gflags = google;
	#endif

	if(argc < 3){
		LOG(ERROR)<< "please check the input arguments!";
		return 1;
	}
	ifstream infile(argv[1]);
	if(!infile){
		LOG(ERROR)<< "please check the input arguments!";
		return 1;
	}
	string str;
	int count = 0;
	int rows = 28;
	int cols = 28;
	unsigned char *buffer = new  unsigned char[rows*cols];
	stringstream ss;

	Datum datum;             // this data structure store the data and label
	datum.set_channels(1);    // the channels
	datum.set_height(rows);    // rows
	datum.set_width(cols);     // cols

	scoped_ptr<db::DB> db(db::GetDB(FLAGS_backend));         // new DB object
	db->Open(argv[2], db::NEW);                    // open the lmdb file to store the data
	scoped_ptr<db::Transaction> txn(db->NewTransaction());   // new Transaction object to put and commit the data

	const int kMaxKeyLength = 256;           // to save the key
	char key_cstr[kMaxKeyLength];

	bool flag= false;
	while(getline(infile, str)){
		if(flag == false){
			flag = true;
			continue;
		}
		int beg = 0;
		int end = 0;
		int str_index = 0;
		//test  need to add this----------1
		//datum.set_label(0);
		while((end = str.find_first_of(‘,‘, beg)) != string::npos){
			//cout << end << endl;
			string dig_str = str.substr(beg, end - beg);
			int pixes;
			ss.clear();
			ss << dig_str;
			ss >> pixes;
			// test need to delete this--------------2
			if(beg == 0){
				datum.set_label(pixes);
				beg = ++ end;
				continue;
			}
			buffer[str_index++] = (unsigned char)pixes;
			beg = ++end;
		}
		string dig_str = str.substr(beg);
		int pixes;
		ss.clear();
		ss << dig_str;
		ss >> pixes;
		buffer[str_index++] = (unsigned char)pixes;
		datum.set_data(buffer, rows*cols);

		int length = snprintf(key_cstr, kMaxKeyLength, "%08d", count);

		    // Put in db
		string out;
		CHECK(datum.SerializeToString(&out));              // serialize to string
		txn->Put(string(key_cstr, length), out);        // put it, both the key and value

		if (++count % 1000 == 0) {       // to commit every 1000 iteration
		  // Commit db
		  txn->Commit();
		  txn.reset(db->NewTransaction());
		  LOG(ERROR) << "Processed " << count << " files.";
		}

	}
	// write the last batch
	  if (count % 1000 != 0) {            // commit the last batch
		txn->Commit();
		LOG(ERROR) << "Processed " << count << " files.";
	  }

	return 0;
}

然后我们运行make all –j8对代码进行编译。

这样在build目录下就会生成对应的二进制文件了。

如图：

然后运行./build/examples/mnist/convert_data_to_lmdb.bin examples/mnist/kaggle/data/train.csvexamples/mnist/kaggle/mnist_train_lmdb --backend=lmdb

就能够得到得到训练文件的lmdb格式文件了。对于測试test.csv，因为test.csv没有标签，所以须要对代码进行细微调整，2处调整已在上述代码中标注了。

然后相同运行make all –j8，再运行./build/examples/mnist/convert_data_to_lmdb.bin examples/mnist/kaggle/data/test.csvexamples/mnist/kaggle/mnist_test_lmdb --backend=lmdb

就能够得到所相应的測试数据的lmdb格式文件了。

2：用训练数据进行训练得到model

Caffe在训练model的时候，代码须要在每隔test_iter时间就要对測试数据集进行測试，因此我们这里能够用train.csv的前1000条数据制作一个交叉验证的数据集lmdb, 过程和上面一样。

分别将mnist文件夹以下的lenet_solver.prototxt和lenet_train_test.prototxt复制到kaggle文件夹以下。并对相应的包括文件所在文件夹和相应的batch size进行改动。详细见：下载地址。

然后运行./build/tools/caffe train –solver=examples/mnist/kaggle/lenet_solver.prototxt，这样就能够得到我们的lenet_iter_10000.caffemodel了。

3：提取測试集prob层的特征。

这里我们使用tools文件下的extract_features.cpp的源文件。可是该源文件产生的结果是lmdb的格式。因此我对源代码进行了改动例如以下：

#include <stdio.h>  // for snprintf
#include <string>
#include <vector>
#include <fstream>

#include "boost/algorithm/string.hpp"
#include "google/protobuf/text_format.h"

#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/net.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/db.hpp"
#include "caffe/util/io.hpp"
#include "caffe/vision_layers.hpp"

using caffe::Blob;
using caffe::Caffe;
using caffe::Datum;
using caffe::Net;
using boost::shared_ptr;
using std::string;
namespace db = caffe::db;

template<typename Dtype>
int feature_extraction_pipeline(int argc, char** argv);

int main(int argc, char** argv) {
  return feature_extraction_pipeline<float>(argc, argv);
//  return feature_extraction_pipeline<double>(argc, argv);
}

template<typename Dtype>
int feature_extraction_pipeline(int argc, char** argv) {
  ::google::InitGoogleLogging(argv[0]);
  const int num_required_args = 7;     /// the parameters must be not less 7
  if (argc < num_required_args) {
    LOG(ERROR)<<
    "This program takes in a trained network and an input data layer, and then"
    " extract features of the input data produced by the net.\n"
    "Usage: extract_features  pretrained_net_param"
    "  feature_extraction_proto_file  extract_feature_blob_name1[,name2,...]"
    "  save_feature_dataset_name1[,name2,...]  num_mini_batches  db_type"
    "  [CPU/GPU] [DEVICE_ID=0]\n"
    "Note: you can extract multiple features in one pass by specifying"
    " multiple feature blob names and dataset names seperated by ‘,‘."
    " The names cannot contain white space characters and the number of blobs"
    " and datasets must be equal.";
    return 1;
  }
  int arg_pos = num_required_args;     //the necessary nums of parameters

  arg_pos = num_required_args;
  if (argc > arg_pos && strcmp(argv[arg_pos], "GPU") == 0) {          // whether use GPU------ -gpu 0
    LOG(ERROR)<< "Using GPU";
    uint device_id = 0;
    if (argc > arg_pos + 1) {
      device_id = atoi(argv[arg_pos + 1]);
      CHECK_GE(device_id, 0);
    }
    LOG(ERROR) << "Using Device_id=" << device_id;
    Caffe::SetDevice(device_id);
    Caffe::set_mode(Caffe::GPU);
  } else {
    LOG(ERROR) << "Using CPU";
    Caffe::set_mode(Caffe::CPU);
  }

  arg_pos = 0;  // the name of the executable
  std::string pretrained_binary_proto(argv[++arg_pos]);      // the mode had been trained

  // Expected prototxt contains at least one data layer such as
  //  the layer data_layer_name and one feature blob such as the
  //  fc7 top blob to extract features.
  /*
   layers {
     name: "data_layer_name"
     type: DATA
     data_param {
       source: "/path/to/your/images/to/extract/feature/images_leveldb"
       mean_file: "/path/to/your/image_mean.binaryproto"
       batch_size: 128
       crop_size: 227
       mirror: false
     }
     top: "data_blob_name"
     top: "label_blob_name"
   }
   layers {
     name: "drop7"
     type: DROPOUT
     dropout_param {
       dropout_ratio: 0.5
     }
     bottom: "fc7"
     top: "fc7"
   }
   */
  std::string feature_extraction_proto(argv[++arg_pos]);    // get the net structure
  shared_ptr<Net<Dtype> > feature_extraction_net(
      new Net<Dtype>(feature_extraction_proto, caffe::TEST));               //new net object  and set each layers------feature_extraction_net
  feature_extraction_net->CopyTrainedLayersFrom(pretrained_binary_proto);           // init the weights

  std::string extract_feature_blob_names(argv[++arg_pos]);          //exact which blob‘s feature
  std::vector<std::string> blob_names;
  boost::split(blob_names, extract_feature_blob_names, boost::is_any_of(","));   //you can exact many blobs‘ features and to store them in different dirname

  std::string save_feature_dataset_names(argv[++arg_pos]);   // to store the features
  std::vector<std::string> dataset_names;
  boost::split(dataset_names, save_feature_dataset_names,         // each dataset_names to store one blob‘s feature
               boost::is_any_of(","));
  CHECK_EQ(blob_names.size(), dataset_names.size()) <<
      " the number of blob names and dataset names must be equal";
  size_t num_features = blob_names.size();     // how many features you exact

  for (size_t i = 0; i < num_features; i++) {
    CHECK(feature_extraction_net->has_blob(blob_names[i]))
        << "Unknown feature blob name " << blob_names[i]
        << " in the network " << feature_extraction_proto;
  }

  int num_mini_batches = atoi(argv[++arg_pos]);            // each exact num_mini_batches of images

  // init the DB and Transaction for all blobs you want to extract features
  std::vector<shared_ptr<db::DB> > feature_dbs;               // new DB object, is a vector  maybe has many blogs‘ feature
  std::vector<shared_ptr<db::Transaction> > txns;            // new Transaction object, is a vectore maybe has many blob‘s feature

  // edit by Zack
   //std::string strfile = "/home/hadoop/caffe/textileImage/features/probTest";
  std::string strfile = argv[argc-1];
  std::vector<std::ofstream*> vec(num_features, 0);

  const char* db_type = argv[++arg_pos];                  //the data to store style == lmdb
  for (size_t i = 0; i < num_features; ++i) {
    LOG(INFO)<< "Opening dataset " << dataset_names[i];               // dataset_name[i] to store the feature which type is lmdb
    shared_ptr<db::DB> db(db::GetDB(db_type));             // the type of the db
    db->Open(dataset_names.at(i), db::NEW);          // open the dir to store the feature
    feature_dbs.push_back(db);             // put the db to the vector
    shared_ptr<db::Transaction> txn(db->NewTransaction());     // the transaction to the db
    txns.push_back(txn);                // put the transaction to the vector

// edit by Zack

    std::stringstream ss;
    ss.clear();
    string index;
    ss << i;
    ss >> index;
    std::string str = strfile + index + ".txt";
    vec[i] = new std::ofstream(str.c_str());
  }

  LOG(ERROR)<< "Extacting Features";

  Datum datum;
  const int kMaxKeyStrLength = 100;
  char key_str[kMaxKeyStrLength];      // to store the key
  std::vector<Blob<float>*> input_vec;
  std::vector<int> image_indices(num_features, 0);   /// how many blogs‘ feature you exact

  for (int batch_index = 0; batch_index < num_mini_batches; ++batch_index) {
    feature_extraction_net->Forward(input_vec);
    for (int i = 0; i < num_features; ++i) {    // to exact the blobs‘ name  maybe fc7 fc8
      const shared_ptr<Blob<Dtype> > feature_blob = feature_extraction_net
          ->blob_by_name(blob_names[i]);
      int batch_size = feature_blob->num();     // the nums of images-------batch size
      int dim_features = feature_blob->count() / batch_size;    // this dim of this feature of each image in this blob
      const Dtype* feature_blob_data;   // float is the features
      for (int n = 0; n < batch_size; ++n) {
        datum.set_height(feature_blob->height());     // set the height
        datum.set_width(feature_blob->width());     // set the width
        datum.set_channels(feature_blob->channels());    // set the channel
        datum.clear_data();               // clear data
        datum.clear_float_data();        // clear float_data
        feature_blob_data = feature_blob->cpu_data() +
            feature_blob->offset(n);    //the features of  which image
        for (int d = 0; d < dim_features; ++d) {
          datum.add_float_data(feature_blob_data[d]);
          (*vec[i]) << feature_blob_data[d] << " ";          // save the features
        }
        (*vec[i]) << std::endl;
        //LOG(ERROR)<< "dim" << dim_features;
        int length = snprintf(key_str, kMaxKeyStrLength, "%010d",
            image_indices[i]);       // key  di ji ge tupian
        string out;
        CHECK(datum.SerializeToString(&out));    // serialize to string
        txns.at(i)->Put(std::string(key_str, length), out);       // put to transaction
        ++image_indices[i];       // key++
        if (image_indices[i] % 1000 == 0) {    // when it reach to 1000 ,we commit it
          txns.at(i)->Commit();
          txns.at(i).reset(feature_dbs.at(i)->NewTransaction());
          LOG(ERROR)<< "Extracted features of " << image_indices[i] <<
              " query images for feature blob " << blob_names[i];
        }
      }  // for (int n = 0; n < batch_size; ++n)
    }  // for (int i = 0; i < num_features; ++i)
  }  // for (int batch_index = 0; batch_index < num_mini_batches; ++batch_index)
  // write the last batch
  for (int i = 0; i < num_features; ++i) {
    if (image_indices[i] % 1000 != 0) {     // commit the last path images
      txns.at(i)->Commit();
    }
    // edit by Zack
      vec[i]->close();
      delete vec[i];

    LOG(ERROR)<< "Extracted features of " << image_indices[i] <<
        " query images for feature blob " << blob_names[i];
    feature_dbs.at(i)->Close();
  }

  LOG(ERROR)<< "Successfully extracted the features!";
  return 0;
}

最后将得到的prob层(即最后得到的概率)存入到了txt中了。

此外对网络结构进行了调整，仅仅须要预測，网络中的參数都能够去掉不要了。,

deploy.prototxt代码例如以下：

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/kaggle/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"

  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1

  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"

  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1

  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"

  inner_product_param {
    num_output: 500

  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"

  inner_product_param {
    num_output: 10

  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

然后运行

./build/tools/extract_features.bin examples/mnist/kaggle/lenet_iter_10000.caffemodel examples/mnist/kaggle/deploy.prototxt prob examples/mnist/kaggle/features 280 lmdb /home/hadoop/caffe/caffe-master/examples/mnist/kaggle/feature

当中280为迭代次数，由于在deploy.prototxt中batch_size设为了100。故就为总共的測试数据集的大小=28000. /home/hadoop/caffe/caffe-master/examples/mnist/kaggle/feature为终于的提取特征存放在txt保存的路径。examples/mnist/kaggle/lenet_iter_10000.caffemodel为训练的权重參数，examples/mnist/kaggle/deploy.prototxt为网络结构。

4：对得到的txt进行后处理

通过上面三个步骤，我们就能够得到feture0.txt。存放的数据位28000*10大小。相应每一个样本属于哪一类发生的概率。然后运行下面matlab代码就能够得到kaggle所须要的提交结果了。最后的准确率为0.98986。排名也提升了400+。great！！

% caffe toolbox, the postprocessing of the data
clear;clc;
feature = load(‘feature0.txt‘);
feature = feature‘;
[~,test_y] = max(feature);
[M,N] = size(test_y);
test_y = test_y - repmat([1], M, N);
test_y = test_y‘;
M = [(1:length(test_y))‘ test_y(:)];
csvwrite(‘test_y3.csv‘, M);

全部文件代码下载见：https://github.com/zack6514/zackcoding

时间： 2024-10-24 23:53:46

DeepLearning to digit recognizer in kaggle

DeepLearning to digit recongnizer in kaggle

一：DeeplearningToolbox

二：caffe to digit recongnizer

1：将给定csv数据转变成lmdb格式

2：用训练数据进行训练得到model

3：提取測试集prob层的特征。

4：对得到的txt进行后处理

DeepLearning to digit recognizer in kaggle的相关文章

DeepLearning to digit recongnizer in kaggle

kaggle实战记录 =>Digit Recognizer(7月份完全掌握细节及内容）

Kiggle:Digit Recognizer

Kaggle竞赛题目之——Digit Recognizer

Kaggle Digit Recognizer 练习

Kaggle

SMO序列最小最优化算法

[转]使用sklearn进行集成学习——实践

深度学习斯坦福cs231n 课程笔记