本文K折验证拟采用的是
Python 中 sklearn 包中的 StratifiedKFold 方法。
方法思想详见:http://scikit-learn.org/stable/modules/cross_validation.html
StratifiedKFold is
a variation of k-fold which returns stratified folds:
each set contains approximately the same percentage of samples of each target class as the complete set.
【译】
StratifiedKFold 是一种将数据集中每一类样本的数据成分,按均等方式拆分的方法。
其它划分方法详见:http://scikit-learn.org/stable/modules/cross_validation.html
闲言少叙,直接上代码。
【屌丝源码】
import numpy import h5py import sklearn from sklearn import cluster,cross_validation from sklearn.cluster import AgglomerativeClustering from sklearn.cross_validation import StratifiedKFold ## 生成一个随机矩阵并保存 #arr = numpy.random.random([200,400]) #labvec = [] #for i in numpy.arange(0,200): # j = i%10 # arr[i,j*20:j*20+20] = arr[i,j*20:j*20+20]+10 # labvec.append(j) #arr = arr.T #file = h5py.File('arr.mat','w') #file.create_dataset('arr', data = arr) #file.close() #file = h5py.File('labvec.mat','w') #file.create_dataset('labvec', data = labvec) #file.close() # 读方式打开文件 myfile=h5py.File('arr.mat','r') arr = myfile['arr'][:] myfile.close() arr = arr.T myfile=h5py.File('labvec.mat','r') labvec = myfile['labvec'][:] myfile.close() skf = StratifiedKFold(labvec, 4) train_set = [] test_set = [] for train, test in skf: train_set.append(train) test_set.append(test)
详见:http://scikit-learn.org/stable/modules/cross_validation.html
版权声明:本文为博主原创文章,未经博主允许不得转载。
时间: 2024-11-16 17:35:29