Mahout的BreimanExample例子分析

Mahout的BreimanExample例子执行了

Leo Breiman: Random Forests. Machine Learning 45(1): 5-32 (2001)这篇论文的测试。

对它的分析我分为3个部分

- 森林生成的Iteration部分

- BreimanExample的测试执行部分

- 命令行执行部分

Iteration部分

迭代函数如下,对于训练数据集data,根据随机生成器rng随机将data分成训练集与测试集两部分,之后生成随机森林,并进行准确率检测。

/**
   * runs one iteration of the procedure.
   *
   * @param rng
   *          random numbers generator
   * @param data
   *          training data
   * @param m
   *          number of random variables to select at each tree-node
   *          take m to be the first integer less than log2(M) + 1, where M is the number of attributes
   * @param nbtrees
   *          number of trees to grow
   */
  private void runIteration(Random rng, Data data, int m, int nbtrees)

1.数据集的构造

data是输入的数据集,不过并不会将data的全部都用来做训练,而是将它分为两部分:

第一部分train是训练集,用data克隆一下,咦,好奇怪啊,那不就和data一样了么?

第二部分test为测试集,它是从train中随机取出10%左右的数据构成,同时将这些数据从train中删除,具体是使用Data类的rsplit函数实现的。

Data train = data.clone();
Data test = train.rsplit(rng, (int) (data.size() * 0.1));

1.1Data类的rsplit函数

Data类位于org.apache.mahout.classifier.df.data.Data。

它的成员变量有两个,相关说明是

Holds a list of vectors and their corresponding Dataset

  private final List<Instance> instances;

  private final Dataset dataset;

rpslit函数如下,它从Data对象的instances变量存储的数据随机取出了subsize个放到新构造的subset集合中去,这subsize个数据也就从instances中给去掉了。

因为数据同出一源,所以最后返回的Data对象它的dataset和调用rsplit的Data对象的dataset是相同的。

/**
   * Splits the data in two, returns one part, and this gets the rest of the data. <b>VERY SLOW!</b>
   */
  public Data rsplit(Random rng, int subsize) {
    List<Instance> subset = Lists.newArrayListWithCapacity(subsize);

    for (int i = 0; i < subsize; i++) {
      subset.add(instances.remove(rng.nextInt(instances.size())));
    }

    return new Data(dataset, subset);
  }

1.2TreeBuilder

接着定义了决策树的构造器以及森林的构造器。

两个类分别位于

org.apache.mahout.classifier.df.builder.DefaultTreeBuilder;

org.apache.mahout.classifier.df.ref.SequentialBuilder;

/**
 * Builds a Decision Tree <br>
 * Based on the algorithm described in the "Decision Trees" tutorials by Andrew W. Moore, available at:<br>
 * <br>
 * http://www.cs.cmu.edu/~awm/tutorials
 * <br><br>
 * This class can be used when the criterion variable is the categorical attribute.
 */
    DefaultTreeBuilder treeBuilder = new DefaultTreeBuilder();
 /**
 * Builds a Random Decision Forest using a given TreeBuilder to grow the trees
 */
    SequentialBuilder forestBuilder = new SequentialBuilder(rng, treeBuilder, train);

接着用forestBuilder来构造一个随机森林。

  /* grow a forest with m = log2(M)+1*/
    treeBuilder.setM(m);
    DecisionForest forestM = forestBuilder.build(nbtrees);

SequentialBuilder中的build函数如下,它循环使用bagging生成了nbTrees颗树。用trees记录各棵树的根节点。

public class SequentialBuilder {

private final Bagging bagging;

public DecisionForest build(int nbTrees) {
    List<Node> trees = Lists.newArrayList();

    for (int treeId = 0; treeId < nbTrees; treeId++) {
      trees.add(bagging.build(rng));
      logProgress(((float) treeId + 1) / nbTrees);
    }

    return new DecisionForest(trees);
  }
}

1.3Bagging

bagging是如何建树的呢?

如下所示,先是用Data的bagging方法从数据中采样出一个训练集bag,之后用这个bag按照决策树的方法建树就好了。

  /**
   * Builds one tree
   */
  public Node build(Random rng) {
    log.debug("Bagging...");
    Arrays.fill(sampled, false);
    Data bag = data.bagging(rng, sampled);

    log.debug("Building...");
    return treeBuilder.build(rng, bag);
  }

那么如何bagging采样呢?

如下所示,也即有放回从N个样本的数据集中随机采样N次,同一个数据可以多次采样,挺奇怪的这个sampled有什么用呢?

  /**
   * if data has N cases, sample N cases at random -but with replacement.
   *
   * @param sampled
   *          indicating which instance has been sampled
   *
   * @return sampled data
   */
  public Data bagging(Random rng, boolean[] sampled) {
    int datasize = size();
    List<Instance> bag = Lists.newArrayListWithCapacity(datasize);

    for (int i = 0; i < datasize; i++) {
      int index = rng.nextInt(datasize);
      bag.add(instances.get(index));
      sampled[index] = true;
    }

    return new Data(dataset, bag);
  }

接下来又按照m=1的方式再进行一次随机森林的生成

m表示的是number of attributes to select randomly at each node

// grow a forest with m=1
    treeBuilder.setM(1);

    time = System.currentTimeMillis();
    log.info("Growing a forest with m=1");
    DecisionForest forestOne = forestBuilder.build(nbtrees);
    sumTimeOne += System.currentTimeMillis() - time;
    numNodesOne += forestOne.nbNodes();

1.4测试森林的准确率

为两个森林(m= log2(M) + 1以及m=1分别生成的两个森林)进行准确率的测试。

首先得到测试集的标签集合testLabels,

接着定义树的预测集合predictions,它是二维数组,predictions[i][j]表示森林中第j棵树对第i个测试样本的预测。

forestM是按照m = log2(M) + 1方式创建的树,首先用这颗树进行一次预测填满预测集合predictions.

接着定义了一个sumPredictions数组,大小是预测集大小,sumPredictions[i]记录随机森林的所有决策树对第i个样本的预测之和。

注意,每颗树都有个权重值,所有树权重值相加为1,所以把所有树的预测结果相加即为森林预测最终结果.

ErrorEstimate类定义在org.apache.mahout.classifier.df.

它在作用是求错误率,也即sumPredictions与testLabels不相同的比例。

看了一下它的实现,发现还有个特殊情形,就是森林对一个样本没有预测的时候,则忽略这个样本。

if (predictions[index] == -1) {

continue; // instance not classified

}

最后用sumTestErrM将这个错误率累加,是因为一般准确率测试都会做N次取平均,这里也是一样的,最后运行时会做N次Iteration取平均。

// compute the test set error (Selection Error), and mean tree error (One Tree Error),
    double[] testLabels = test.extractLabels();
    double[][] predictions = new double[test.size()][];

    forestM.classify(test, predictions);
    double[] sumPredictions = new double[test.size()];
    Arrays.fill(sumPredictions, 0.0);
    for (int i = 0; i < predictions.length; i++) {
      for (int j = 0; j < predictions[i].length; j++) {
        sumPredictions[i] += predictions[i][j];
      }
    }
    sumTestErrM += ErrorEstimate.errorRate(testLabels, sumPredictions);

    forestOne.classify(test, predictions);
    Arrays.fill(sumPredictions, 0.0);
    for (int i = 0; i < predictions.length; i++) {
      for (int j = 0; j < predictions[i].length; j++) {
        sumPredictions[i] += predictions[i][j];
      }
    }
    sumTestErrOne += ErrorEstimate.errorRate(testLabels, sumPredictions);

DecisionForest类的函数classify用来对data做预测。

它的实现如下:

首先给predictions[index]申请空间,大小即是树的大小啦。

然后每棵树都给一个预测,填上prediction。

  /**
   * Classifies the data and calls callback for each classification
   */
  public void classify(Data data, double[][] predictions) {
    Preconditions.checkArgument(data.size() == predictions.length, "predictions.length must be equal to data.size()");

    if (data.isEmpty()) {
      return; // nothing to classify
    }

    int treeId = 0;
    for (Node tree : trees) {
      for (int index = 0; index < data.size(); index++) {
        if (predictions[index] == null) {
          predictions[index] = new double[trees.size()];
        }
        predictions[index][treeId] = tree.classify(data.get(index));
      }
      treeId++;
    }
  }

Run部分

具体执行过程如下

因为这段代码最后会编译成jar文件执行,所以需要考虑输入参数。

输入参数如下,包括

  1. 数据集data
  2. 数据集描述dataset
  3. 森林中树的棵树nbTrees
  4. 循环运行次数iterations,虽然Iteration是迭代的意思,但这里并没有做迭代处理,我觉得就是循环
  5. 帮助选项help,显示使用说明的
Option dataOpt = obuilder.withLongName("data").withShortName("d").withRequired(true).withArgument(
      abuilder.withName("path").withMinimum(1).withMaximum(1).create()).withDescription("Data path").create();

    Option datasetOpt = obuilder.withLongName("dataset").withShortName("ds").withRequired(true).withArgument(
      abuilder.withName("dataset").withMinimum(1).withMaximum(1).create()).withDescription("Dataset path")
        .create();

    Option nbtreesOpt = obuilder.withLongName("nbtrees").withShortName("t").withRequired(true).withArgument(
      abuilder.withName("nbtrees").withMinimum(1).withMaximum(1).create()).withDescription(
      "Number of trees to grow, each iteration").create();

    Option nbItersOpt = obuilder.withLongName("iterations").withShortName("i").withRequired(true)
        .withArgument(abuilder.withName("numIterations").withMinimum(1).withMaximum(1).create())
        .withDescription("Number of times to repeat the test").create();

    Option helpOpt = obuilder.withLongName("help").withDescription("Print out help").withShortName("h")
        .create();

    Group group = gbuilder.withName("Options").withOption(dataOpt).withOption(datasetOpt).withOption(
      nbItersOpt).withOption(nbtreesOpt).withOption(helpOpt).create();

接着对输入进行解析处理,得到数据文件的输入路径,数据描述文件的输入路径以及树棵树,循环次数。

Path dataPath;
    Path datasetPath;
    int nbTrees;
    int nbIterations;

    try {
      Parser parser = new Parser();
      parser.setGroup(group);
      CommandLine cmdLine = parser.parse(args);

      if (cmdLine.hasOption("help")) {
        CommandLineUtil.printHelp(group);
        return -1;
      }

      String dataName = cmdLine.getValue(dataOpt).toString();
      String datasetName = cmdLine.getValue(datasetOpt).toString();
      nbTrees = Integer.parseInt(cmdLine.getValue(nbtreesOpt).toString());
      nbIterations = Integer.parseInt(cmdLine.getValue(nbItersOpt).toString());

      dataPath = new Path(dataName);
      datasetPath = new Path(datasetName);
    } catch (OptionException e) {
      log.error("Error while parsing options", e);
      CommandLineUtil.printHelp(group);
      return -1;
    }

接着载入数据,这里让我挺头疼的,如果单独运行这段代码,在我已经生成好data,dataset文件之后,它会提示错误,因为dataset里边存储的是json格式的描述,所以这里需要对json进行解析,也就需要相应的lib包。

// load the data
    FileSystem fs = dataPath.getFileSystem(new Configuration());
    Dataset dataset = Dataset.load(getConf(), datasetPath);
    Data data = DataLoader.loadData(dataset, fs, dataPath);

载入数据之后就是运行Iteration啦。

代码中说M is the number of inputs,我觉得不合适,从代码中可以看出M = data.getDataset().nbAttributes(),明明就是属性数嘛!!

这里有个问题,属性数是否包括了label这一项呢?看了下代码,是的哦!

之后生成Iteration次随机森林,记录每次生成的结果,最后输出:

平均错误率(m = log2(M) + 1)

平均错误率(m = 1)

平均生成森林时间(m = log2(M) + 1)

平均生成森林时间(m = 1)

平均随机森林所有树节点之和(m = log2(M) + 1)

平均随机森林所有树节点之和(m = 1)

// take m to be the first integer less than log2(M) + 1, where M is the
    // number of inputs
    int m = (int) Math.floor(FastMath.log(2.0, data.getDataset().nbAttributes()) + 1);

    Random rng = RandomUtils.getRandom();
    for (int iteration = 0; iteration < nbIterations; iteration++) {
      log.info("Iteration {}", iteration);
      runIteration(rng, data, m, nbTrees);
    }

    log.info("********************************************");
    log.info("Random Input Test Error : {}", sumTestErrM / nbIterations);
    log.info("Single Input Test Error : {}", sumTestErrOne / nbIterations);
    log.info("Mean Random Input Time : {}", DFUtils.elapsedTime(sumTimeM / nbIterations));
    log.info("Mean Single Input Time : {}", DFUtils.elapsedTime(sumTimeOne / nbIterations));
    log.info("Mean Random Input Num Nodes : {}", numNodesM / nbIterations);
    log.info("Mean Single Input Num Nodes : {}", numNodesOne / nbIterations);

执行

数据

glass:#lass : http://archive.ics.uci.edu/ml/datasets/Glass+Identification

部分数据如下:

1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.00,1
2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.00,1
3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.00,1
4,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.00,1
5,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.00,1
6,1.51596,12.79,3.61,1.62,72.97,0.64,8.07,0.00,0.26,1

生成数据dataset的命令如下

[email protected]:/home/user/mahout-distribution-0.9# $HADOOP_HOME/bin/hadoop jar mahout-core-0.9-job.jar org.apache.mahout.classifier.df.tools.Describe -p /user/glass.data -f /user/glass.info -d I 9 N L

15/08/24 06:34:02 INFO tools.Describe: Generating the descriptor…

15/08/24 06:34:03 INFO tools.Describe: generating the dataset…

15/08/24 06:34:03 INFO tools.Describe: storing the dataset description

-p 输入data路径

-f 输出dataset路径

-d 数据描述,为I 9 N L

关于数据描述说明如下:

- 第一个是样本编号

- 接着9个是样本属性,都是Numerical类型

- 最后一个是样本的类标签

- 所以写成[I, 9, N, L]

- I表示为忽视,是ignore的缩写

- N是Numerical的缩写,L表示Label

- 当然如果维度中有非数值型的属性,也是可以的用C表示(Categorical的缩写)

- 9表示九个都是N

- 如果属性是这样的[Ignore,Numerical,Numerical,Categorical,Numerical,Categorical,Categorical,Label],那么–descriptor参数就应该写为下面的方式:[I,2,N,C,N,2,C,L]。

上面关于数据描述摘自http://running.iteye.com/blog/923483

数据描述文件具体如下,是用Json格式保存:

[email protected]:/home/user/mahout-distribution-0.9# hadoop dfs -cat /user/glass.info

[{“values”:null,”label”:false,”type”:”ignored”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:null,”label”:false,”type”:”numerical”},{“values”:[“1”,”2”,”3”,”5”,”6”,”7”],”label”:true,”type”:”categorical”}][email protected]:/home/user/mahout-distribution-0.9#

注意,数据文件要放在hdfs中,要不会出如下错误

15/08/24 06:32:11 INFO tools.Describe: Generating the descriptor…

15/08/24 06:32:12 INFO tools.Describe: generating the dataset…

Exception in thread “main” java.io.FileNotFoundException: File does not exist: /home/user/data/glass.data

另外,根据mahout版本不同,可能类的位置也会不同,比如我运行

[email protected]:/home/user/mahout-distribution-0.9# $HADOOP_HOME/bin/hadoop jar mahout-core-0.9-job.jar org.apache.mahout.df.tools.Describe -p /home/user/data/glass.data -f /home/user/data/glass.info -d I 9 N L

便会出现

Exception in thread “main” java.lang.ClassNotFoundException: org.apache.mahout.df.tools.Describe错误。

所以要根据自己mahout的版本查找下core文件夹里边Describe类的位置情况。

最后执行,可以看到依次进行了9次循环,每次循环对m=4,m=4分别生成了一次森林,最后则输出了各项平均结果。

[email protected]:/home/user/mahout-distribution-0.9# hadoop jar mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.BreimanExample -d /user/glass.data -ds /user/glass.info -i 10 -t 100

15/08/24 07:14:03 INFO df.BreimanExample: Iteration 0

15/08/24 07:14:03 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:03 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:03 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:03 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:03 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:03 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:03 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:03 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:04 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:04 INFO df.BreimanExample: Iteration 1

15/08/24 07:14:04 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:04 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:04 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:04 INFO df.BreimanExample: Iteration 2

15/08/24 07:14:04 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:04 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:04 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:04 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:05 INFO df.BreimanExample: Iteration 3

15/08/24 07:14:05 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:05 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:05 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:05 INFO df.BreimanExample: Iteration 4

15/08/24 07:14:05 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:05 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:05 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:05 INFO df.BreimanExample: Iteration 5

15/08/24 07:14:05 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:05 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:05 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:06 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:06 INFO df.BreimanExample: Iteration 6

15/08/24 07:14:06 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:06 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:06 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:06 INFO df.BreimanExample: Iteration 7

15/08/24 07:14:06 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:06 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:06 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:06 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:07 INFO df.BreimanExample: Iteration 8

15/08/24 07:14:07 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:07 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:07 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:07 INFO df.BreimanExample: Iteration 9

15/08/24 07:14:07 INFO df.BreimanExample: Splitting the data

15/08/24 07:14:07 INFO df.BreimanExample: Growing a forest with m=4

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:07 INFO df.BreimanExample: Growing a forest with m=1

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 10%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 20%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 30%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 40%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 50%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 60%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 70%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 80%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 90%

15/08/24 07:14:07 INFO ref.SequentialBuilder: Building 100%

15/08/24 07:14:07 INFO df.BreimanExample: **********************************

15/08/24 07:14:07 INFO df.BreimanExample: Random Input Test Error : 1.0

15/08/24 07:14:07 INFO df.BreimanExample: Single Input Test Error : 1.0

15/08/24 07:14:07 INFO df.BreimanExample: Mean Random Input Time : 0h 0m 0s 288

15/08/24 07:14:07 INFO df.BreimanExample: Mean Single Input Time : 0h 0m 0s 107

15/08/24 07:14:07 INFO df.BreimanExample: Mean Random Input Num Nodes : 6761

15/08/24 07:14:07 INFO df.BreimanExample: Mean Single Input Num Nodes : 11326

版权声明:本文为博主原创文章,未经博主允许不得转载。

时间: 2024-07-30 10:13:07

Mahout的BreimanExample例子分析的相关文章

spark JavaDirectKafkaWordCount 例子分析

spark  JavaDirectKafkaWordCount 例子分析: 1. KafkaUtils.createDirectStream( jssc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet );后面参数意思: 源码是这样 @param ssc StreamingContext object * @param kafkaParams Kafka <

ENode视频分享系列之- 架构简介和QuickStart例子分析

ENode是一个应用开发框架,为开发人员提供了一整套基于DDD+CQRS+ES+EDA架构风格的解决方案.ENode的最大特点是将并发降低到最低,并做到最大程度的并行:ENode的主要目标是解决CQRS架构的C端的高并发写的问题.ENode从发布1.0开始到现在的差不多两年多时间,我几乎每周都在更新设计和代码实现.以至于这两年来从来没有一个稳定的版本可以提供给大家,非常惭愧.但我相信,随着时间的推移和我的努力的积累,ENode一定会越来越稳定和成熟的,目前ENode的版本已经比较稳定了,如果有谁

FFMpeg写MP4文件例子分析

#define STREAM_DURATION 5.0 #define STREAM_FRAME_RATE 25 /* 25 images/s */ #define STREAM_NB_FRAMES ((int)(STREAM_DURATION * STREAM_FRAME_RATE)) #define STREAM_PIX_FMT PIX_FMT_YUV420P /* default pix_fmt */ static int sws_flags = SWS_BICUBIC; /*******

一个App架构例子分析--使用MVP模式;使用Otto实现模块通信

一.这个App整体的架构划分: 分为四大模块: 1.app模块 2.common模块 3.domain模块 4.model模块 app模块的依赖: dependencies {     compile fileTree(dir: 'libs', include: ['*.jar'])     compile project(':domain') ... } 它依赖domain,领域层模块.在app模块中,应用了MVP模式,把一个activity中的View和Presenter划分掉. domai

用Java开发gRPC服务的例子分析

本文的代码例子来自:https://github.com/grpc/grpc-java  定义服务 这一步与其他语言完全一样,需要定义gRPC的服务.方法.request和response的类型. 完整的proto定义代码在:grpc-java/examples/src/main/proto/route_guide.proto 在proto中,下面几个定义是为我们生成 Java 类所特有的定义. option java_multiple_files = true; option java_pac

六、通过KFC例子分析建造者模式

1 建造者模式? 将一个复杂对象的构建和他的表示分离,使得同样的构建过程可以创建不同的表示. 分析: 复杂对象:指的要具体生产的产品. 复杂对象的构建:指的指挥者指挥具体的构建者去构建产品. 复杂对象的表示:指的生产出来的产品长啥样. 创建不同的表示:因为具体的创建者不同,所以创建出了不同的表示. 2 优缺点分析? 优点: 具体的建造者之间相互独立,这样更方便扩展.比如要增加一个新的产品,只需要增加一个新的构建者类就好了. 产品的创建过程(步骤,就是对应的具体构建者中的各个方法)清晰. 缺点:

thrift 脚本语言例子分析

警告 尽量使用tutorial下面的模板 1 注释 有如下的三种方式 1)// 2)/**/ 支持C语言的注释方法 3)# 2 命名空间 namespace cpp tutorial namespace d tutorial namespace java tutorial namespace php tutorial namespace perl tutorial 3 脚本嵌套 include "shared.thrift" 4 基本类型 *  bool        Boolean,

ansj 2.0.7 错误例子分析

我在做一个solr的项目,分词选定了ansj分词. 选择ansj的原因: 1)身边若干朋友的念叨,说是效果不错 2)网上看了若干评论,说是不错 3)自己尝试了一些case,觉得确实不错. 好了,项目中选择了ansj2.0.7-min.jar作为实际使用的版本. 结果...愿望是美好的,现实是残酷的. 碰到了若干case,效果不好(使用IndexAnalysis): 1)上海马勒别墅 切分结果:(上/海马/勒/别墅) 预期结果:(上海/马勒/别墅) 调试了一下,在构建Graph的时候没有问题,问题

小例子分析C#继承机制

using System; class test { public class A { public virtual void fun1(int i) { Console.WriteLine(i); } public void fun2(A a) { a.fun1(3); fun1(7); } } public class B : A { public override void fun1(int i) { Console.WriteLine(i+1); } } public static vo