THE MNIST DATABASE of handwritten digits
有训练集6万,测试集1万。是NIST的子集。数字放在一个归一化的,固定尺寸的图片的中心。
这是一个给那些想在真实的世界数据上面,学习模式识别技术的人的一个很好的数据库。仅仅花费最少的预处理和格式化。(不知道为啥几个数据库都要整这句话,有啥深刻的道理,我没理解?)
总共4个文件:
train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
如果下周的数据的大小,跟上面显示的不一样。可能是因为浏览器没有解压的原因。
文件并不是用任何标准的图片格式,所以要自己写个小程序来读它。文件的格式在下面有描述。
原始的黑白(二值)图像是从NIST中来的。NIST都已经尺寸归一化为20x20像素,但保留了数字的长宽比。而结果图像在归一化算法中都用了反混叠技术,使其包含灰度值。图像根据计算像素点的中心而放在28x28的图片中。(这句其实没理解,怎么算的中心,不知道。就理解为大致放在28x28图片的中央位置)。
在一些分类算法中(特别是基于模板的方法,比如SVM、KNN),当数字放在边界框的中心的时候,比数字放在像素点中心的时候,错误更高。如果做了这方面的预处理,在论文汇总应该体现。
MNIST是由NIST的手写体数字二值化图片的数据库SD3和SD1构成的。NIST最初的设计中,SD3是训练集,SD1是测试集。然而SD3比SD1更清晰,并且更容易识别。原因是SD3是从人口统计局的雇员收集的,而SD1是从中学的学生中收集的。从学习经验中得到有效的结论需要测试集是独立于训练集,并且测试集在完整的样本之中。因此,有必要通过混合NIST的数据来建造一个新的数据库。
MNIST的训练集是由从SD3中的3万张图片和SD1中的3万张图片组成的。测试集是由从SD3中的5000张图片和SD1中的5000张图片。6万张训练集包含了近似250位写手。我们保证了训练集和测试集的写手是不相交的。
SD1包含了由500位不同的写手写的58527张图片。比较而言,SD3的数据块是一次排列的,而SD1的数据是杂乱无章的。可以识别出SD1中的写手信息,我们根据识别出来的信息,把500位写手的数据分为两部分,前250位分到训练集中,后250位分到测试集。这样训练集和测试集我们现在都有大概3万张图片。在训练集中再加入SD3的数据,从0开始,使其凑够6万。类似的,在测试集中从SD3第35000张图开始,补充测试集到6万张。在这个网址只能下载到1万张测试集。完整的6万张训练集是可以下载的。
在这个数据集上已经试验了很多方法。下面是一些例子。详细信息在链接的论文中有。一些试验用了一些方法:输入的图片是扭斜(通过计算接近于垂直型线的主要的轴,然后移位线使其垂直)。在一些其他的试验中,用人工扭曲原始的训练数据的方法来增大训练集,扭曲随意的结合移位、比例缩放、偏移和压缩。(这小段有些地方没理解)
CLASSIFIER | PREPROCESSING | TEST ERROR RATE (%) | Reference |
Linear Classifiers | |||
linear classifier (1-layer NN) | none | 12.0 | LeCun et al. 1998 |
linear classifier (1-layer NN) | deskewing | 8.4 | LeCun et al. 1998 |
pairwise linear classifier | deskewing | 7.6 | LeCun et al. 1998 |
K-Nearest Neighbors | |||
K-nearest-neighbors, Euclidean (L2) | none | 5.0 | LeCun et al. 1998 |
K-nearest-neighbors, Euclidean (L2) | none | 3.09 | Kenneth Wilder, U. Chicago |
K-nearest-neighbors, L3 | none | 2.83 | Kenneth Wilder, U. Chicago |
K-nearest-neighbors, Euclidean (L2) | deskewing | 2.4 | LeCun et al. 1998 |
K-nearest-neighbors, Euclidean (L2) | deskewing, noise removal, blurring | 1.80 | Kenneth Wilder, U. Chicago |
K-nearest-neighbors, L3 | deskewing, noise removal, blurring | 1.73 | Kenneth Wilder, U. Chicago |
K-nearest-neighbors, L3 | deskewing, noise removal, blurring, 1 pixel shift | 1.33 | Kenneth Wilder, U. Chicago |
K-nearest-neighbors, L3 | deskewing, noise removal, blurring, 2 pixel shift | 1.22 | Kenneth Wilder, U. Chicago |
K-NN with non-linear deformation (IDM) | shiftable edges | 0.54 | Keysers et al. IEEE PAMI 2007 |
K-NN with non-linear deformation (P2DHMDM) | shiftable edges | 0.52 | Keysers et al. IEEE PAMI 2007 |
K-NN, Tangent Distance | subsampling to 16x16 pixels | 1.1 | LeCun et al. 1998 |
K-NN, shape context matching | shape context feature extraction | 0.63 | Belongie et al. IEEE PAMI 2002 |
Boosted Stumps | |||
boosted stumps | none | 7.7 | Kegl et al., ICML 2009 |
products of boosted stumps (3 terms) | none | 1.26 | Kegl et al., ICML 2009 |
boosted trees (17 leaves) | none | 1.53 | Kegl et al., ICML 2009 |
stumps on Haar features | Haar features | 1.02 | Kegl et al., ICML 2009 |
product of stumps on Haar f. | Haar features | 0.87 | Kegl et al., ICML 2009 |
Non-Linear Classifiers | |||
40 PCA + quadratic classifier | none | 3.3 | LeCun et al. 1998 |
1000 RBF + linear classifier | none | 3.6 | LeCun et al. 1998 |
SVMs | |||
SVM, Gaussian Kernel | none | 1.4 | |
SVM deg 4 polynomial | deskewing | 1.1 | LeCun et al. 1998 |
Reduced Set SVM deg 5 polynomial | deskewing | 1.0 | LeCun et al. 1998 |
Virtual SVM deg-9 poly [distortions] | none | 0.8 | LeCun et al. 1998 |
Virtual SVM, deg-9 poly, 1-pixel jittered | none | 0.68 | DeCoste and Scholkopf, MLJ 2002 |
Virtual SVM, deg-9 poly, 1-pixel jittered | deskewing | 0.68 | DeCoste and Scholkopf, MLJ 2002 |
Virtual SVM, deg-9 poly, 2-pixel jittered | deskewing | 0.56 | DeCoste and Scholkopf, MLJ 2002 |
Neural Nets | |||
2-layer NN, 300 hidden units, mean square error | none | 4.7 | LeCun et al. 1998 |
2-layer NN, 300 HU, MSE, [distortions] | none | 3.6 | LeCun et al. 1998 |
2-layer NN, 300 HU | deskewing | 1.6 | LeCun et al. 1998 |
2-layer NN, 1000 hidden units | none | 4.5 | LeCun et al. 1998 |
2-layer NN, 1000 HU, [distortions] | none | 3.8 | LeCun et al. 1998 |
3-layer NN, 300+100 hidden units | none | 3.05 | LeCun et al. 1998 |
3-layer NN, 300+100 HU [distortions] | none | 2.5 | LeCun et al. 1998 |
3-layer NN, 500+150 hidden units | none | 2.95 | LeCun et al. 1998 |
3-layer NN, 500+150 HU [distortions] | none | 2.45 | LeCun et al. 1998 |
3-layer NN, 500+300 HU, softmax, cross entropy, weight decay | none | 1.53 | Hinton, unpublished, 2005 |
2-layer NN, 800 HU, Cross-Entropy Loss | none | 1.6 | Simard et al., ICDAR 2003 |
2-layer NN, 800 HU, cross-entropy [affine distortions] | none | 1.1 | Simard et al., ICDAR 2003 |
2-layer NN, 800 HU, MSE [elastic distortions] | none | 0.9 | Simard et al., ICDAR 2003 |
2-layer NN, 800 HU, cross-entropy [elastic distortions] | none | 0.7 | Simard et al., ICDAR 2003 |
NN, 784-500-500-2000-30 + nearest neighbor, RBM + NCA training [no distortions] | none | 1.0 | Salakhutdinov and Hinton, AI-Stats 2007 |
6-layer NN 784-2500-2000-1500-1000-500-10 (on GPU) [elastic distortions] | none | 0.35 | Ciresan et al. Neural Computation 10, 2010 and arXiv 1003.0358, 2010 |
committee of 25 NN 784-800-10 [elastic distortions] | width normalization, deslanting | 0.39 | Meier et al. ICDAR 2011 |
deep convex net, unsup pre-training [no distortions] | none | 0.83 | Deng et al. Interspeech 2010 |
Convolutional nets | |||
Convolutional net LeNet-1 | subsampling to 16x16 pixels | 1.7 | LeCun et al. 1998 |
Convolutional net LeNet-4 | none | 1.1 | LeCun et al. 1998 |
Convolutional net LeNet-4 with K-NN instead of last layer | none | 1.1 | LeCun et al. 1998 |
Convolutional net LeNet-4 with local learning instead of last layer | none | 1.1 | LeCun et al. 1998 |
Convolutional net LeNet-5, [no distortions] | none | 0.95 | LeCun et al. 1998 |
Convolutional net LeNet-5, [huge distortions] | none | 0.85 | LeCun et al. 1998 |
Convolutional net LeNet-5, [distortions] | none | 0.8 | LeCun et al. 1998 |
Convolutional net Boosted LeNet-4, [distortions] | none | 0.7 | LeCun et al. 1998 |
Trainable feature extractor + SVMs [no distortions] | none | 0.83 | Lauer et al., Pattern Recognition 40-6, 2007 |
Trainable feature extractor + SVMs [elastic distortions] | none | 0.56 | Lauer et al., Pattern Recognition 40-6, 2007 |
Trainable feature extractor + SVMs [affine distortions] | none | 0.54 | Lauer et al., Pattern Recognition 40-6, 2007 |
unsupervised sparse features + SVM, [no distortions] | none | 0.59 | Labusch et al., IEEE TNN 2008 |
Convolutional net, cross-entropy [affine distortions] | none | 0.6 | Simard et al., ICDAR 2003 |
Convolutional net, cross-entropy [elastic distortions] | none | 0.4 | Simard et al., ICDAR 2003 |
large conv. net, random features [no distortions] | none | 0.89 | Ranzato et al., CVPR 2007 |
large conv. net, unsup features [no distortions] | none | 0.62 | Ranzato et al., CVPR 2007 |
large conv. net, unsup pretraining [no distortions] | none | 0.60 | Ranzato et al., NIPS 2006 |
large conv. net, unsup pretraining [elastic distortions] | none | 0.39 | Ranzato et al., NIPS 2006 |
large conv. net, unsup pretraining [no distortions] | none | 0.53 | Jarrett et al., ICCV 2009 |
large/deep conv. net, 1-20-40-60-80-100-120-120-10 [elastic distortions] | none | 0.35 | Ciresan et al. IJCAI 2011 |
committee of 7 conv. net, 1-20-P-40-P-150-10 [elastic distortions] | width normalization | 0.27 +-0.02 | Ciresan et al. ICDAR 2011 |
committee of 35 conv. net, 1-20-P-40-P-150-10 [elastic distortions] | width normalization | 0.23 | Ciresan et al. CVPR 2012 |
References
- [LeCun et al., 1998a]
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998. [on-line version]