正样本可分为训练集,测试集,验证集,按数量需求批量得到相应的样本是要解决的问题
把所有正样本放在all.txt中,然后按照一定的样本比例得到相应的其他文件。
代码:
#!/usr/bin/env python # -*- coding: UTF-8 -*- from numpy.matlib import random ‘‘‘edited by zr 2017/6/3 ‘‘‘ flpath = ‘/home/zr/projects/all.txt‘ fpath = open(flpath) dataset = [] for line in fpath.readlines(): dataset.append(line.strip()) random.shuffle(dataset) posnm = int((len(dataset))*0.75) posset = dataset[:posnm] negset = dataset[posnm:] f1=open(‘/home/zr/projects/pos.txt‘,‘w‘) for name in posset: f1.write(name+‘\n‘) f1.closed f2=open(‘/home/zr/projects/neg.txt‘,‘w‘) for name in negset: f2.write(name+‘\n‘) f2.closed
会自动在相应文件夹内创建pos.txt和neg.txt.
时间: 2024-10-08 10:22:38