some = sample data 0.1
遍历整个数据集,获取指定比例的行数的数据,获取的数据不确定,条数也不准确。
内部重写为filter data by random() <= 0.1
抽取100行数据
data = load ‘data‘; grpd = group data all; sums = foreach grpd generate COUNT(data) as c; some = sample data 100/(double)sums.c;
时间: 2024-12-14 05:36:13