python读取mnist

其实就是python怎么读取binnary file

mnist的结构如下，选取train-images

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer 0x00000803(2051) magic number
0004     32 bit integer 60000            number of images
0008     32 bit integer 28               number of rows
0012     32 bit integer 28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

也就是之前我们要读取4个 32 bit integer

试过很多方法，觉得最方便的，至少对我来说还是使用

struct.unpack_from()

filename = ‘train-images.idx3-ubyte‘

binfile = open(filename , ‘rb‘)

buf = binfile.read()

先使用二进制方式把文件都读进来

index = 0

magic, numImages , numRows , numColumns = struct.unpack_from(‘>IIII‘ , buf , index)

index += struct.calcsize(‘>IIII‘)

然后使用struc.unpack_from

‘>IIII‘是说使用大端法读取4个unsinged int32

然后读取一个图片测试是否读取成功

im = struct.unpack_from(‘>784B‘ ,buf, index)

index += struct.calcsize(‘>784B‘)

im = np.array(im)

im = im.reshape(28,28)

fig = plt.figure()

plotwindow = fig.add_subplot(111)

plt.imshow(im , cmap=‘gray‘)

plt.show()

‘>784B‘的意思就是用大端法读取784个unsigned byte

完整代码如下

import numpy as np

import struct

import matplotlib.pyplot as plt

filename = ‘train-images.idx3-ubyte‘

binfile = open(filename , ‘rb‘)

buf = binfile.read()

index = 0

magic, numImages , numRows , numColumns = struct.unpack_from(‘>IIII‘ , buf , index)

index += struct.calcsize(‘>IIII‘)

im = struct.unpack_from(‘>784B‘ ,buf, index)

index += struct.calcsize(‘>784B‘)

im = np.array(im)

im = im.reshape(28,28)

fig = plt.figure()

plotwindow = fig.add_subplot(111)

plt.imshow(im , cmap=‘gray‘)

plt.show()

只是为了测试是否成功所以只读了一张图片

赶脚应该是读对了哈。。。

时间： 2024-10-06 05:49:52

python读取mnist的相关文章

Python读取MNIST数据集

MNIST数据集获取 MNIST数据集是入门机器学习/模式识别的最经典数据集之一.最早于1998年Yan Lecun在论文: Gradient-based learning applied to document recognition. 中提出.经典的LeNet-5 CNN网络也是在该论文中提出的. 数据集包含了0-9共10类手写数字图片,每张图片都做了尺寸归一化,都是28x28大小的灰度图.每张图片中像素值大小在0-255之间,其中0是黑色背景,255是白色前景.如下图所示: MNIST共包

python读取mnist label数据库

<br>[offset] [type] [value] [description] 0000 32 bit integer 0x00000803(2051) magic number 0004 32 bit integer 60000 number of items 0008 unsigned byte ?? label 0009 unsigned byte ?? label ........ xxxx unsigned byte ?? label Mnist label数据结构如上. 完整代

python读取MNIST image数据

Lecun Mnist数据集下载 import numpy as np import struct def loadImageSet(which=0): print "load image set" binfile=None if which==0: binfile = open("..//dataset//train-images-idx3-ubyte", 'rb') else: binfile= open("..//dataset//t10k-imag

解决Python读取文件时出现UnicodeDecodeError: 'gbk' codec can't decode byte...

用Python在读取某个html文件时会遇到下面问题: 出问题的代码: 1 if __name__ == '__main__': 2 fileHandler = open('../report.html', mode='r') 3 4 report_lines = fileHandler.readlines() 5 for line in report_lines: 6 print(line.rstrip()) 修改方式是在open方法指定参数encoding='UTF-8': if __nam

python读取和生成excel文件

今天来看一下如何使用python处理excel文件,处理excel文件是在工作中经常用到的,python为我们考虑到了这一点,python中本身就自带csv模块. 1.用python读取csv文件: csv是逗号分隔符格式一般我们用的execl生成的格式是xls和xlsx 直接重命名为csv的话会报错: Error: line contains NULL byte insun解决方案:出错原因是直接是把后缀为xls的execl文件重命名为csv的正常的要是另存为csv文件就不会报错了譬

Python读取txt文件

Python读取txt文件,有两种方式: (1)逐行读取 1 data=open("data.txt") 2 line=data.readline() 3 while line: 4 print line 5 line=data.readline() (2)一次全部读入内存 1 data=open("data.txt") 2 for line in data.readlines(): 3 print line

python读取excel文件（xrld模块）

Python读取excel文件一.python xlrd模块安装 mac 下安装python xlrd模块 http://www.crifan.com/python_read_excel_xls_file_xlrd/comment-page-1/ python setup.py install 在mac 下出现的错误是 http://stackoverflow.com/questions/18199853/error-could-not-create-library-python-2-7

python读取EXCLE文件数据

python读取EXCEL,利用 Google 搜索 Python Excel,点击第一条结果http://www.python-excel.org/ ,能够跨平台处理 Excel. 按照文档一步步去做,要安装三个包: xlrd(用于读Excel): xlwt(用于写Excel): xlutils(处理Excel的工具箱) 1 from xlrd import open_workbook 2 import re 3 4 #创建一个用于读取sheet的生成器,依次生成每行数据,row_count

dbf文件使用python读取程序

使用python读取dbf # -*- coding: utf-8 -*- import struct,csv,datetime class DBF_Operator(): @staticmethod def SHHQ_dbf_reader(f): numrec, lenheader = struct.unpack('<xxxxLH22x', f.read(32)) numfields = (lenheader - 33) // 32 fields = [] for fieldno in xra