【PaddlePaddle系列】手写数字识别

最近百度为了推广自家编写对深度学习框架PaddlePaddle不断推出各种比赛。百度声称PaddlePaddle是一个“易学、易用”的开源深度学习框架，然而网上的资料少之又少。虽然百度很用心地提供了许多文档，而且还是中英双语具备，但是最关键的是报错了很难在网上找到相应的解决办法。为了明年备战百度的比赛，便开始学习以下PaddlePaddle。

1、安装

PaddlePaddle同样支持CUDA加速运算，但是如果没有NVIDIA的显卡，那就还是装CPU版本。

CPU版本安装：pip install paddlepaddle

GPU版本根据所安装的CUDA版本以及cuDNN版本有所不同：

CUDA9 + cuDNN7.0：pip install paddlepaddle-gpu

CUDA8 + cuDNN7.0 : pip install paddlepaddle-gpu==0.14.0.post87

CUDA8 + cuDNN5.0 : pip install paddlepaddle-gpu==0.14.0.post85

2、手写数字识别

其实，Paddle的GitHub提供了这个例程。但是，个人感觉这个例程部分直接调用PaddlePaddle内部类使得读者阅读起来十分困难。特别是数据输入（Feed）中的reader，如果直接看程序，它直接一个函数就完成了图像输入，完全搞不懂它是如何操作。这里也就重点将这里，个人感觉这是和Tensorflow较大的区别。

2.1、网络构建

程序中提供了三种网络模型，代码很明显，这里应该不用太多说，直接贴出来了。需要注意的是，PaddlePaddle将图像的通道数放在最前面，即为[C H W]，区别于[H W C]。

(1)、单层全连接层+softmax

#a full-connect-layer network using softmax as activation function
def softmax_regression():
    img = fluid.layers.data(name=‘img‘,shape=[1,28,28],dtype=‘float32‘)
    predict = fluid.layers.fc(input=img,size=10,act=‘softmax‘)
    return predict

(2)、多层全连接层+softmax

#3 full-connect-layers network using softmax as activation functiondef multilayer_perceptron():
    img = fluid.layers.data(name=‘img‘,shape=[1,28,28],dtype=‘float32‘)
    hidden = fluid.layers.fc(input = img,size=128,act=‘softmax‘)
    hidden = fluid.layers.fc(input = hidden,size=64,act=‘softmax‘)
    prediction = fluid.layers.fc(input = hidden,size=10,act=‘softmax‘)
    return prediction

(3)、卷积神经网络

#traditional converlutional neural networkdef cnn():
    img = fluid.layers.data(name=‘img‘,shape=[1, 28, 28], dtype =‘float32‘)
    # first conv pool
    conv_pool_1 = fluid.nets.simple_img_conv_pool(
        input = img,
        filter_size = 5,
        num_filters = 20,
        pool_size=2,
        pool_stride=2,
        act="relu")
    conv_pool_1 = fluid.layers.batch_norm(conv_pool_1)
    # second conv pool
    conv_pool_2 = fluid.nets.simple_img_conv_pool(
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
        act="relu")
    # output layer with softmax activation function. size = 10 since there are only 10 possible digits.
    prediction = fluid.layers.fc(input=conv_pool_2, size=10, act=‘softmax‘)
    return prediction

2.2、构建损失函数

PaddlePaddle的损失函数的构建基本上与tensorflow没有太大的区别。但是需要指出的是：（1）在tensorflow中交叉熵的求解函数是使用[0 0 0 ... 1 ...]等长向量求解。但是在PaddlePaddle中，交叉熵是直接与一个整数求解；（2）标签(lable)的输入数据类型使用的是int64，尽管reader生成器返回的是int类型。笔者尝试将其改为int32类型，但是会出错。另外在其他实践过程中使用int32也是有相应的错误。

def train_program():
    #if using dtype=‘int64‘, it reports errors!
    label = fluid.layers.data(name=‘label‘, shape=[1], dtype=‘int64‘)
    # Here we can build the prediction network in different ways. Please
    predict = cnn()
    #predict = softmax_regression()
    #predict = multilayer_perssion()
    # Calculate the cost from the prediction and label.
    cost = fluid.layers.cross_entropy(input=predict, label=label)
    avg_cost = fluid.layers.mean(cost)
    acc = fluid.layers.accuracy(input=predict, label=label)
    return [avg_cost, acc]

PaddlePaddle使用Trainer进行训练，只需构建训练函数train_program作为Trainer参数（这个下面个再详细讲解）。这里要说一下，函数返回一个向量[arg_cost, acc]，其中第一个元素作为损失函数，而后面几个元素则是可选的，用于在迭代过程中print出来。所以，返回arg_cost是必要的，其他是可选的。特别说明：不要作死将一个常量放在里面，也就是里面的元素必须是会随着训练而变化，如果作死“acc=1”，则在训练中会报错。

2.3、训练

PaddlePaddle使用fulid.Trainer来创建训练器。这里则需要配备好训练器的train_program(损失函数)、place(是否使用GPU)以及optimizer_program(优化器)。然后调用train函数来进行训练。详细可见下面程序：

def optimizer_program():
    return fluid.optimizer.Adam(learning_rate=0.001)
if __name__ == "__main__":
    print("run minst train\n")
    minst_prefix = ‘/home/dzqiu/DataSet/minst/‘
    train_image_path   = minst_prefix + ‘train-images-idx3-ubyte.gz‘
    train_label_path   = minst_prefix + ‘train-labels-idx1-ubyte.gz‘
    test_image_path    = minst_prefix + ‘t10k-images-idx3-ubyte.gz‘
    test_label_path    = minst_prefix + ‘t10k-labels-idx1-ubyte.gz‘
    #reader_creator在将在下面讲述
    train_reader = paddle.batch(paddle.reader.shuffle(#shuffle用于打乱buffer的循序
                    reader_creator(train_image_path,train_label_path,buffer_size=100),
                                        buf_size=500),
                                        batch_size=64)
    test_reader  = paddle.batch(                    reader_creator(test_image_path,test_label_path,buffer_size=100),
                    batch_size=64)              #测试集就不用打乱了

    #if use GPU, use ‘export FLAGS_fraction_of_gpu_memory_to_use=0‘ at first
    use_cuda = True
    place    = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

    trainer  = fluid.Trainer(train_func=train_program,                             place=place,                             optimizer_func=optimizer_program)

    params_dirname = "recognize_digits_network.inference.model"
    lists = []
    #
    def event_handler(event):
        if isinstance(event,fluid.EndStepEvent):#每步触发事件
            if event.step % 100 == 0:
                print("Pass %d, Epoch %d, Cost %f, Acc %f"                       %(event.step, event.epoch,
                       event.metrics[0],#train_program返回的第一个参数arg_cost
                       event.metrics[1]))#train_program返回的第二个参数acc
        if isinstance(event,fluid.EndEpochEvent):#每次迭代触发事件
            trainer.save_params(params_dirname)
            #使用test的时候，返回值就是train_program的返回，所以赋值需要对应
            avg_cost, acc = trainer.test(reader=test_reader,                                         feed_order=[‘img‘,‘label‘])
            print("Test with Epoch %d, avg_cost: %s, acc: %s"
                  %(event.epoch, avg_cost, acc))
            lists.append((event.epoch, avg_cost, acc))

    # Train the model now
    trainer.train(num_epochs=5,event_handler=event_handler,                  reader=train_reader,feed_order=[‘img‘, ‘label‘])

    # find the best pass
    best = sorted(lists, key=lambda list: float(list[1]))[0]
    print ‘Best pass is %s, testing Avgcost is %s‘ % (best[0], best[1])
    print ‘The classification accuracy is %.2f%%‘ % (float(best[2]) * 100)

2.4、训练数据的读取 Reader

PaddlePaddle的训练数据读取仅用一个paddle.dataset.mnist.train()解决，封装起来难以理解其操作，更不能看出如何读取自己的训练集。这里，我将这个段函数从源码中挖出来简化为reader_creator，实现对minst数据集的读取，首先让我们看看minst数据集的格式：

训练集中，标签集前8个字节是magic和数目，后面每个字节代表数字0-9的标签；图像集中前16字节是一些数据集信息，包括magic、图像数目、行数和列数，后面每个字节代表每个像素点，也就是说我们连续取出28*28个字节安顺序就可以组成28*28的图片。弄清楚文件如何读取，那么就可以编写reader：

def reader_creator(image_filename,label_filename,buffer_size):
    def reader():
    #调用命令读取文件，Linux下使用zcat
        if platform.system()==‘Linux‘:
            zcat_cmd = ‘zcat‘
        elif paltform.system()==‘Windows‘:
            zcat_cmd = ‘gzcat‘
        else:
            raise NotImplementedError("This program is suported on Windows or Linux,                                      but your platform is" + platform.system())

        #create a subprocess to read the images
        sub_img = subprocess.Popen([zcat_cmd, image_filename], stdout = subprocess.PIPE)
        sub_img.stdout.read(16) #skip some magic bytes 这里我们已经知道，所以我们不在需要前16字节
        #create a subprocess to read the labels
        sub_lab = subprocess.Popen([zcat_cmd, label_filename], stdout = subprocess.PIPE)
        sub_lab.stdout.read(8)  #skip some magic bytes 同理

    try:
            while True:         #前面使用try,故若再读取过程中遇到结束则会退出
        #label is a pixel repersented by a unsigned byte,so just read a byte
                labels = numpy.fromfile(
                            sub_lab.stdout,‘ubyte‘,count=buffer_size).astype("int")

                if labels.size != buffer_size:
                    break
        #read 28*28 byte as array,and then resize it
                images = numpy.fromfile(
                            sub_img.stdout,‘ubyte‘,                            count=buffer_size * 28 * 28)                            .reshape(buffer_size, 28, 28).astype("float32")
        #mapping each pixel into (-1,1)
                images = images / 255.0 * 2.0 - 1.0;
                for i in xrange(buffer_size):
                    yield images[i,:],int(labels[i]) #将图像与标签抛出，循序与feed_order对应！
        finally:
            try:
        #terminate the reader subprocess
                sub_img.terminate()
            except:
                pass
            try:
        #terminate the reader subprocess
                sub_lable.terminate()
            except:
                pass
    return reader

2.5、运行结果

训练集中有60000张图片，buffer_size为100，batch_size为64，所以应该Pass了900多次。

Pass 0, Batch 0, Cost 4.250958, Acc 0.062500
Pass 100, Batch 0, Cost 0.249865, Acc 0.953125
Pass 200, Batch 0, Cost 0.281933, Acc 0.906250
Pass 300, Batch 0, Cost 0.147851, Acc 0.953125
Pass 400, Batch 0, Cost 0.144059, Acc 0.968750
Pass 500, Batch 0, Cost 0.082035, Acc 0.953125
Pass 600, Batch 0, Cost 0.105593, Acc 0.984375
Pass 700, Batch 0, Cost 0.148170, Acc 0.968750
Pass 800, Batch 0, Cost 0.182150, Acc 0.937500
Pass 900, Batch 0, Cost 0.066323, Acc 0.968750
Test with Epoch 0, avg_cost: 0.07329441363440427, acc: 0.9762620192307693
Pass 0, Batch 1, Cost 0.157396, Acc 0.953125
Pass 100, Batch 1, Cost 0.050120, Acc 0.968750
Pass 200, Batch 1, Cost 0.086324, Acc 0.984375
Pass 300, Batch 1, Cost 0.002137, Acc 1.000000
Pass 400, Batch 1, Cost 0.173876, Acc 0.984375
Pass 500, Batch 1, Cost 0.059772, Acc 0.968750
Pass 600, Batch 1, Cost 0.035788, Acc 0.984375
Pass 700, Batch 1, Cost 0.008351, Acc 1.000000
Pass 800, Batch 1, Cost 0.022678, Acc 0.984375
Pass 900, Batch 1, Cost 0.021835, Acc 1.000000
Test with Epoch 1, avg_cost: 0.06836433922317389, acc: 0.9774639423076923
Pass 0, Batch 2, Cost 0.214221, Acc 0.937500
Pass 100, Batch 2, Cost 0.212448, Acc 0.953125
Pass 200, Batch 2, Cost 0.007266, Acc 1.000000
Pass 300, Batch 2, Cost 0.015241, Acc 1.000000
Pass 400, Batch 2, Cost 0.061948, Acc 0.984375
Pass 500, Batch 2, Cost 0.043950, Acc 0.984375
Pass 600, Batch 2, Cost 0.018946, Acc 0.984375
Pass 700, Batch 2, Cost 0.015527, Acc 0.984375
Pass 800, Batch 2, Cost 0.035185, Acc 0.984375
Pass 900, Batch 2, Cost 0.004890, Acc 1.000000
Test with Epoch 2, avg_cost: 0.05774364945361809, acc: 0.9822716346153846
Pass 0, Batch 3, Cost 0.031849, Acc 0.984375
Pass 100, Batch 3, Cost 0.059525, Acc 0.953125
Pass 200, Batch 3, Cost 0.022106, Acc 0.984375
Pass 300, Batch 3, Cost 0.006763, Acc 1.000000
Pass 400, Batch 3, Cost 0.056089, Acc 0.984375
Pass 500, Batch 3, Cost 0.018876, Acc 1.000000
Pass 600, Batch 3, Cost 0.010325, Acc 1.000000
Pass 700, Batch 3, Cost 0.010989, Acc 1.000000
Pass 800, Batch 3, Cost 0.026476, Acc 0.984375
Pass 900, Batch 3, Cost 0.007792, Acc 1.000000
Test with Epoch 3, avg_cost: 0.05476908334449968, acc: 0.9830729166666666
Pass 0, Batch 4, Cost 0.061547, Acc 0.984375
Pass 100, Batch 4, Cost 0.002315, Acc 1.000000
Pass 200, Batch 4, Cost 0.009715, Acc 1.000000
Pass 300, Batch 4, Cost 0.024202, Acc 0.984375
Pass 400, Batch 4, Cost 0.150663, Acc 0.968750
Pass 500, Batch 4, Cost 0.082586, Acc 0.984375
Pass 600, Batch 4, Cost 0.012232, Acc 1.000000
Pass 700, Batch 4, Cost 0.055258, Acc 0.984375
Pass 800, Batch 4, Cost 0.016068, Acc 1.000000
Pass 900, Batch 4, Cost 0.004945, Acc 1.000000
Test with Epoch 4, avg_cost: 0.041706092633705505, acc: 0.9865785256410257
Best pass is 4, testing Avgcost is 0.041706092633705505
The classification accuracy is 98.66%

2.6 测试接口

PaddlePaddle提供接口函数，调用接口即可。特别的是，图像需要转化为[N C H W]的张量，如果是一张图像，这里N当然是1，因为是灰度图C也便是1。具体看下面代码：

def load_image(file):
        im = Image.open(file).convert(‘L‘)
        im = im.resize((28, 28), Image.ANTIALIAS)
        im = numpy.array(im).reshape(1, 1, 28, 28).astype(np.float32) #[N C H W] 这里多了一个N
        im = im / 255.0 * 2.0 - 1.0
        return im
    cur_dir = os.path.dirname(os.path.realpath(__file__))
    img = load_image(cur_dir + ‘/infer_3.png‘)
    inferencer = fluid.Inferencer(
        # infer_func=softmax_regression, # uncomment for softmax regression
        # infer_func=multilayer_perceptron, # uncomment for MLP
        infer_func=cnn,  # uncomment for LeNet5
        param_path=params_dirname,
        place=place)
    results = inferencer.infer({‘img‘: img})
    lab = numpy.argsort(results)  # probs and lab are the results of one batch data
    print "Label of infer_3.png is: %d" % lab[0][0][-1]

测试结果如下：

Label of infer_3.png is: 3

3、结语

PaddlePaddle与tensorflow还是有一定的区别，而且除了错误很难搜到解决方法，笔者会另外开一篇博客整理总结PaddlePaddle遇到的各种问题，这个对于例程的讲解也将会继续下去，坚持每周三更新，快开学了，还加把劲。

源码地址：Github

参考：Paddle/book /02.recognize_digits

原文地址：https://www.cnblogs.com/dzqiu/p/9514447.html

时间： 2024-10-28 14:07:50

【PaddlePaddle系列】手写数字识别

【PaddlePaddle系列】手写数字识别的相关文章

Tensorflow实战手写数字识别（Tensorboard可视化）

使用Caffe进行手写数字识别执行流程解析

C#中调用Matlab人工神经网络算法实现手写数字识别

C++使用matlab卷积神经网络库MatConvNet来进行手写数字识别

keras入门实战：手写数字识别

手把手教你搭建caffe及手写数字识别（全程命令提示、纯小白教程）

Python 手写数字识别-knn算法应用

在Kaggle手写数字数据集上使用Spark MLlib的朴素贝叶斯模型进行手写数字识别

Tensorflow实践 mnist手写数字识别