阅读项目：通过机器学习识别手写数字

地址：https://github.com/JosephPai/KaggleSolution/tree/master/DigitRec

数据集：https://www.kaggle.com/c/digit-recognizer/data

import pandas as pd

import matplotlib.pyplot as plt, matplotlib.image as mpimg

from sklearn.model_selection import train_test_split

from sklearn import svm

导入库，这里使用了panadas 库进行数据处理。

通过skllearn库选择分类器进行训练

labeled_images = pd.read_csv(‘C:/Users/75201/Desktop/train/input/train.csv‘)

images = labeled_images.iloc[0:5000,1:]

labels = labeled_images.iloc[0:5000,:1]

train_images,test_images,train_labels,test_labels=train_test_split(images,labels, train_size=0.8, random_state=0)

首先导入训练集，然后将标签与内容分开，labels保存标签，images保存内容，然后划分训练集和测试集

train_images：训练集内容

test_images：测试集内容

train_labels,：训练集标签

test_labels：测试集标签

这一步可以进行查看图片，将一维的数据展示成图片

i=8

img=train_images.iloc[i].as_matrix()

img=img.reshape((28,28))

plt.imshow(img,cmap=‘gray‘)

plt.title(train_labels.iloc[i,0])

可以看出图片是有灰度的

这一步开始训练，使用sklearn 包提供的 svm 模型来建立一个分类器 classifier，

clf = svm.SVC()

clf.fit(train_images, train_labels.values.ravel())

clf.score(test_images,test_labels)

训练结果0.10000，很不理想

这一步将图片转化为黑白，简化特征值，可以大幅提高准确率

test_images[test_images>0]=1

train_images[train_images>0]=1

img=train_images.iloc[i].as_matrix().reshape((28,28))

plt.imshow(img,cmap=‘binary‘)

plt.title(train_labels.iloc[i])

再次使用相同的分类器进行训练

clf = svm.SVC()
clf.fit(train_images, train_labels.values.ravel())
clf.score(test_images,test_labels)

训练结果0.887

成绩初步满意，可以开始测试，导入测试集，并将测试结果写入到test文件中

test_data=pd.read_csv(‘C:/Users/75201/Desktop/train/input/test.csv‘)

test_data[test_data>0]=1

results=clf.predict(test_data[0:5000])

df = pd.DataFrame(results)

df.index.name=‘ImageId‘

df.index+=1

df.columns=[‘Label‘]

df.to_csv(‘C:/Users/75201/Desktop/train/input/results.csv‘, header=True)

读完这个项目，我认为可以优化以下几点

增大训练样本，数据集中的数量不仅仅有5000，加大样本可以提高准确率
增加外部接口，将图片预处理为28*28像素的图片，方便进行外部测试
尝试其他分类器。
优化特征。

原文地址：https://www.cnblogs.com/dayoulaoshi/p/10466461.html

时间： 2024-10-14 14:17:53

阅读项目：通过机器学习识别手写数字的相关文章

BP神经网络识别手写数字项目解析及代码

这两天在学习人工神经网络,用传统神经网络结构做了一个识别手写数字的小项目作为练手.点滴收获与思考,想跟大家分享一下,欢迎指教,共同进步. 平常说的BP神经网络指传统的人工神经网络,相比于卷积神经网络(CNN)来说要简单些. 人工神经网络具有复杂模式和进行联想.推理记忆的功能, 它是解决某些传统方法所无法解决的问题的有力工具.目前, 它日益受到重视, 同时其他学科的发展, 为其提供了更大的机会.1986 年, Romelhart 和Mcclelland提出了误差反向传播算法(Error Back

TensorFlow实战之Softmax Regression识别手写数字

关于本文说明,本人原博客地址位于http://blog.csdn.net/qq_37608890,本文来自笔者于2018年02月21日 23:10:04所撰写内容(http://blog.csdn.net/qq_37608890/article/details/79343860). 本文根据最近学习TensorFlow书籍网络文章的情况,特将一些学习心得做了总结,详情如下.如有不当之处,请各位大拿多多指点,在此谢过. 一.相关概念 1.MNIST MNIST(Mixed

用BP人工神经网络识别手写数字

一文全解：利用谷歌深度学习框架Tensorflow识别手写数字图片（初学者篇）

笔记整理者:王小草笔记整理时间2017年2月24日原文地址 http://blog.csdn.net/sinat_33761963/article/details/56837466?fps=1&locationNum=5 Tensorflow官方英文文档地址:https://www.tensorflow.org/get_started/mnist/beginners 本文整理时官方文档最近更新时间:2017年2月15日 1.案例背景本文是跟着Tensorflow官方文档的第二篇教程–识别手

学习笔记TF024:TensorFlow实现Softmax Regression(回归)识别手写数字

TensorFlow实现Softmax Regression(回归)识别手写数字.MNIST(Mixed National Institute of Standards and Technology database),简单机器视觉数据集,28X28像素手写数字,只有灰度值信息,空白部分为0,笔迹根据颜色深浅取[0, 1], 784维,丢弃二维空间信息,目标分0~9共10类.数据加载,data.read_data_sets, 55000个样本,测试集10000样本,验证集5000样本.样本标注信

12 使用卷积神经网络识别手写数字

看代码: 1 import tensorflow as tf 2 from tensorflow.examples.tutorials.mnist import input_data 3 4 # 下载训练和测试数据 5 mnist = input_data.read_data_sets('MNIST_data/', one_hot = True) 6 7 # 创建session 8 sess = tf.Session() 9 10 # 占位符 11 x = tf.placeholder(tf.f

python实现KNN，识别手写数字

写了识别手写数字的KNN算法,如下图所示.参考链接http://blog.csdn.net/april_newnew/article/details/44176059. # -*- coding: utf-8 -*- import numpy as np import pandas as pd import os def readtxt(filename): text=[] f = open(filename,'r',encoding='utf-8') for line in f.readlin

《神经网络和深度学习》系列文章一：使用神经网络识别手写数字

出处: Michael Nielsen的<Neural Network and Deep Leraning> 本节译者:哈工大SCIR硕士生徐梓翔 (https://github.com/endyul) 声明:我们将不定期连载该书的中文翻译,如需转载请联系[email protected],未经授权不得转载. “本文转载自[哈工大SCIR]微信公众号,转载已征得同意.” 使用神经网络识别手写数字感知机 sigmoid神经元神经网络的结构用简单的网络结构解决手写数字识别通过梯度下降法学

卷积神经网络识别手写数字实例

卷积神经网络识别手写数字实例: import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data # 定义一个初始化权重的函数 def weight_variables(shape): w = tf.Variable(tf.random_normal(shape=shape,mean=0.0,stddev=1.0)) return w # 定义一个初始化偏置的函数 def bias_variabl