数据集:seeds.tsv
15.26 14.84 0.871 5.763 3.312 2.221 5.22 Kama 14.88 14.57 0.8811 5.554 3.333 1.018 4.956 Kama 14.29 14.09 0.905 5.291 3.337 2.699 4.825 Kama 13.84 13.94 0.8955 5.324 3.379 2.259 4.805 Kama 16.14 14.99 0.9034 5.658 3.562 1.355 5.175 Kama 14.38 14.21 0.8951 5.386 3.312 2.462 4.956 Kama 14.69 14.49 0.8799 5.563 3.259 3.586 5.219 Kama 14.11 14.1 0.8911 5.42 3.302 2.7 5.0 Kama 16.63 15.46 0.8747 6.053 3.465 2.04 5.877 Kama 16.44 15.25 0.888 5.884 3.505 1.969 5.533 Kama 15.26 14.85 0.8696 5.714 3.242 4.543 5.314 Kama 14.03 14.16 0.8796 5.438 3.201 1.717 5.001 Kama 13.89 14.02 0.888 5.439 3.199 3.986 4.738 Kama 13.78 14.06 0.8759 5.479 3.156 3.136 4.872 Kama 13.74 14.05 0.8744 5.482 3.114 2.932 4.825 Kama 14.59 14.28 0.8993 5.351 3.333 4.185 4.781 Kama 13.99 13.83 0.9183 5.119 3.383 5.234 4.781 Kama 15.69 14.75 0.9058 5.527 3.514 1.599 5.046 Kama 14.7 14.21 0.9153 5.205 3.466 1.767 4.649 Kama 12.72 13.57 0.8686 5.226 3.049 4.102 4.914 Kama 14.16 14.4 0.8584 5.658 3.129 3.072 5.176 Kama 14.11 14.26 0.8722 5.52 3.168 2.688 5.219 Kama 15.88 14.9 0.8988 5.618 3.507 0.7651 5.091 Kama 12.08 13.23 0.8664 5.099 2.936 1.415 4.961 Kama 15.01 14.76 0.8657 5.789 3.245 1.791 5.001 Kama 16.19 15.16 0.8849 5.833 3.421 0.903 5.307 Kama 13.02 13.76 0.8641 5.395 3.026 3.373 4.825 Kama 12.74 13.67 0.8564 5.395 2.956 2.504 4.869 Kama 14.11 14.18 0.882 5.541 3.221 2.754 5.038 Kama 13.45 14.02 0.8604 5.516 3.065 3.531 5.097 Kama 13.16 13.82 0.8662 5.454 2.975 0.8551 5.056 Kama 15.49 14.94 0.8724 5.757 3.371 3.412 5.228 Kama 14.09 14.41 0.8529 5.717 3.186 3.92 5.299 Kama 13.94 14.17 0.8728 5.585 3.15 2.124 5.012 Kama 15.05 14.68 0.8779 5.712 3.328 2.129 5.36 Kama 16.12 15.0 0.9 5.709 3.485 2.27 5.443 Kama 16.2 15.27 0.8734 5.826 3.464 2.823 5.527 Kama 17.08 15.38 0.9079 5.832 3.683 2.956 5.484 Kama 14.8 14.52 0.8823 5.656 3.288 3.112 5.309 Kama 14.28 14.17 0.8944 5.397 3.298 6.685 5.001 Kama 13.54 13.85 0.8871 5.348 3.156 2.587 5.178 Kama 13.5 13.85 0.8852 5.351 3.158 2.249 5.176 Kama 13.16 13.55 0.9009 5.138 3.201 2.461 4.783 Kama 15.5 14.86 0.882 5.877 3.396 4.711 5.528 Kama 15.11 14.54 0.8986 5.579 3.462 3.128 5.18 Kama 13.8 14.04 0.8794 5.376 3.155 1.56 4.961 Kama 15.36 14.76 0.8861 5.701 3.393 1.367 5.132 Kama 14.99 14.56 0.8883 5.57 3.377 2.958 5.175 Kama 14.79 14.52 0.8819 5.545 3.291 2.704 5.111 Kama 14.86 14.67 0.8676 5.678 3.258 2.129 5.351 Kama 14.43 14.4 0.8751 5.585 3.272 3.975 5.144 Kama 15.78 14.91 0.8923 5.674 3.434 5.593 5.136 Kama 14.49 14.61 0.8538 5.715 3.113 4.116 5.396 Kama 14.33 14.28 0.8831 5.504 3.199 3.328 5.224 Kama 14.52 14.6 0.8557 5.741 3.113 1.481 5.487 Kama 15.03 14.77 0.8658 5.702 3.212 1.933 5.439 Kama 14.46 14.35 0.8818 5.388 3.377 2.802 5.044 Kama 14.92 14.43 0.9006 5.384 3.412 1.142 5.088 Kama 15.38 14.77 0.8857 5.662 3.419 1.999 5.222 Kama 12.11 13.47 0.8392 5.159 3.032 1.502 4.519 Kama 11.42 12.86 0.8683 5.008 2.85 2.7 4.607 Kama 11.23 12.63 0.884 4.902 2.879 2.269 4.703 Kama 12.36 13.19 0.8923 5.076 3.042 3.22 4.605 Kama 13.22 13.84 0.868 5.395 3.07 4.157 5.088 Kama 12.78 13.57 0.8716 5.262 3.026 1.176 4.782 Kama 12.88 13.5 0.8879 5.139 3.119 2.352 4.607 Kama 14.34 14.37 0.8726 5.63 3.19 1.313 5.15 Kama 14.01 14.29 0.8625 5.609 3.158 2.217 5.132 Kama 14.37 14.39 0.8726 5.569 3.153 1.464 5.3 Kama 12.73 13.75 0.8458 5.412 2.882 3.533 5.067 Kama 17.63 15.98 0.8673 6.191 3.561 4.076 6.06 Rosa 16.84 15.67 0.8623 5.998 3.484 4.675 5.877 Rosa 17.26 15.73 0.8763 5.978 3.594 4.539 5.791 Rosa 19.11 16.26 0.9081 6.154 3.93 2.936 6.079 Rosa 16.82 15.51 0.8786 6.017 3.486 4.004 5.841 Rosa 16.77 15.62 0.8638 5.927 3.438 4.92 5.795 Rosa 17.32 15.91 0.8599 6.064 3.403 3.824 5.922 Rosa 20.71 17.23 0.8763 6.579 3.814 4.451 6.451 Rosa 18.94 16.49 0.875 6.445 3.639 5.064 6.362 Rosa 17.12 15.55 0.8892 5.85 3.566 2.858 5.746 Rosa 16.53 15.34 0.8823 5.875 3.467 5.532 5.88 Rosa 18.72 16.19 0.8977 6.006 3.857 5.324 5.879 Rosa 20.2 16.89 0.8894 6.285 3.864 5.173 6.187 Rosa 19.57 16.74 0.8779 6.384 3.772 1.472 6.273 Rosa 19.51 16.71 0.878 6.366 3.801 2.962 6.185 Rosa 18.27 16.09 0.887 6.173 3.651 2.443 6.197 Rosa 18.88 16.26 0.8969 6.084 3.764 1.649 6.109 Rosa 18.98 16.66 0.859 6.549 3.67 3.691 6.498 Rosa 21.18 17.21 0.8989 6.573 4.033 5.78 6.231 Rosa 20.88 17.05 0.9031 6.45 4.032 5.016 6.321 Rosa 20.1 16.99 0.8746 6.581 3.785 1.955 6.449 Rosa 18.76 16.2 0.8984 6.172 3.796 3.12 6.053 Rosa 18.81 16.29 0.8906 6.272 3.693 3.237 6.053 Rosa 18.59 16.05 0.9066 6.037 3.86 6.001 5.877 Rosa 18.36 16.52 0.8452 6.666 3.485 4.933 6.448 Rosa 16.87 15.65 0.8648 6.139 3.463 3.696 5.967 Rosa 19.31 16.59 0.8815 6.341 3.81 3.477 6.238 Rosa 18.98 16.57 0.8687 6.449 3.552 2.144 6.453 Rosa 18.17 16.26 0.8637 6.271 3.512 2.853 6.273 Rosa 18.72 16.34 0.881 6.219 3.684 2.188 6.097 Rosa 16.41 15.25 0.8866 5.718 3.525 4.217 5.618 Rosa 17.99 15.86 0.8992 5.89 3.694 2.068 5.837 Rosa 19.46 16.5 0.8985 6.113 3.892 4.308 6.009 Rosa 19.18 16.63 0.8717 6.369 3.681 3.357 6.229 Rosa 18.95 16.42 0.8829 6.248 3.755 3.368 6.148 Rosa 18.83 16.29 0.8917 6.037 3.786 2.553 5.879 Rosa 18.85 16.17 0.9056 6.152 3.806 2.843 6.2 Rosa 17.63 15.86 0.88 6.033 3.573 3.747 5.929 Rosa 19.94 16.92 0.8752 6.675 3.763 3.252 6.55 Rosa 18.55 16.22 0.8865 6.153 3.674 1.738 5.894 Rosa 18.45 16.12 0.8921 6.107 3.769 2.235 5.794 Rosa 19.38 16.72 0.8716 6.303 3.791 3.678 5.965 Rosa 19.13 16.31 0.9035 6.183 3.902 2.109 5.924 Rosa 19.14 16.61 0.8722 6.259 3.737 6.682 6.053 Rosa 20.97 17.25 0.8859 6.563 3.991 4.677 6.316 Rosa 19.06 16.45 0.8854 6.416 3.719 2.248 6.163 Rosa 18.96 16.2 0.9077 6.051 3.897 4.334 5.75 Rosa 19.15 16.45 0.889 6.245 3.815 3.084 6.185 Rosa 18.89 16.23 0.9008 6.227 3.769 3.639 5.966 Rosa 20.03 16.9 0.8811 6.493 3.857 3.063 6.32 Rosa 20.24 16.91 0.8897 6.315 3.962 5.901 6.188 Rosa 18.14 16.12 0.8772 6.059 3.563 3.619 6.011 Rosa 16.17 15.38 0.8588 5.762 3.387 4.286 5.703 Rosa 18.43 15.97 0.9077 5.98 3.771 2.984 5.905 Rosa 15.99 14.89 0.9064 5.363 3.582 3.336 5.144 Rosa 18.75 16.18 0.8999 6.111 3.869 4.188 5.992 Rosa 18.65 16.41 0.8698 6.285 3.594 4.391 6.102 Rosa 17.98 15.85 0.8993 5.979 3.687 2.257 5.919 Rosa 20.16 17.03 0.8735 6.513 3.773 1.91 6.185 Rosa 17.55 15.66 0.8991 5.791 3.69 5.366 5.661 Rosa 18.3 15.89 0.9108 5.979 3.755 2.837 5.962 Rosa 18.94 16.32 0.8942 6.144 3.825 2.908 5.949 Rosa 15.38 14.9 0.8706 5.884 3.268 4.462 5.795 Rosa 16.16 15.33 0.8644 5.845 3.395 4.266 5.795 Rosa 15.56 14.89 0.8823 5.776 3.408 4.972 5.847 Rosa 15.38 14.66 0.899 5.477 3.465 3.6 5.439 Rosa 17.36 15.76 0.8785 6.145 3.574 3.526 5.971 Rosa 15.57 15.15 0.8527 5.92 3.231 2.64 5.879 Rosa 15.6 15.11 0.858 5.832 3.286 2.725 5.752 Rosa 16.23 15.18 0.885 5.872 3.472 3.769 5.922 Rosa 13.07 13.92 0.848 5.472 2.994 5.304 5.395 Canadian 13.32 13.94 0.8613 5.541 3.073 7.035 5.44 Canadian 13.34 13.95 0.862 5.389 3.074 5.995 5.307 Canadian 12.22 13.32 0.8652 5.224 2.967 5.469 5.221 Canadian 11.82 13.4 0.8274 5.314 2.777 4.471 5.178 Canadian 11.21 13.13 0.8167 5.279 2.687 6.169 5.275 Canadian 11.43 13.13 0.8335 5.176 2.719 2.221 5.132 Canadian 12.49 13.46 0.8658 5.267 2.967 4.421 5.002 Canadian 12.7 13.71 0.8491 5.386 2.911 3.26 5.316 Canadian 10.79 12.93 0.8107 5.317 2.648 5.462 5.194 Canadian 11.83 13.23 0.8496 5.263 2.84 5.195 5.307 Canadian 12.01 13.52 0.8249 5.405 2.776 6.992 5.27 Canadian 12.26 13.6 0.8333 5.408 2.833 4.756 5.36 Canadian 11.18 13.04 0.8266 5.22 2.693 3.332 5.001 Canadian 11.36 13.05 0.8382 5.175 2.755 4.048 5.263 Canadian 11.19 13.05 0.8253 5.25 2.675 5.813 5.219 Canadian 11.34 12.87 0.8596 5.053 2.849 3.347 5.003 Canadian 12.13 13.73 0.8081 5.394 2.745 4.825 5.22 Canadian 11.75 13.52 0.8082 5.444 2.678 4.378 5.31 Canadian 11.49 13.22 0.8263 5.304 2.695 5.388 5.31 Canadian 12.54 13.67 0.8425 5.451 2.879 3.082 5.491 Canadian 12.02 13.33 0.8503 5.35 2.81 4.271 5.308 Canadian 12.05 13.41 0.8416 5.267 2.847 4.988 5.046 Canadian 12.55 13.57 0.8558 5.333 2.968 4.419 5.176 Canadian 11.14 12.79 0.8558 5.011 2.794 6.388 5.049 Canadian 12.1 13.15 0.8793 5.105 2.941 2.201 5.056 Canadian 12.44 13.59 0.8462 5.319 2.897 4.924 5.27 Canadian 12.15 13.45 0.8443 5.417 2.837 3.638 5.338 Canadian 11.35 13.12 0.8291 5.176 2.668 4.337 5.132 Canadian 11.24 13.0 0.8359 5.09 2.715 3.521 5.088 Canadian 11.02 13.0 0.8189 5.325 2.701 6.735 5.163 Canadian 11.55 13.1 0.8455 5.167 2.845 6.715 4.956 Canadian 11.27 12.97 0.8419 5.088 2.763 4.309 5.0 Canadian 11.4 13.08 0.8375 5.136 2.763 5.588 5.089 Canadian 10.83 12.96 0.8099 5.278 2.641 5.182 5.185 Canadian 10.8 12.57 0.859 4.981 2.821 4.773 5.063 Canadian 11.26 13.01 0.8355 5.186 2.71 5.335 5.092 Canadian 10.74 12.73 0.8329 5.145 2.642 4.702 4.963 Canadian 11.48 13.05 0.8473 5.18 2.758 5.876 5.002 Canadian 12.21 13.47 0.8453 5.357 2.893 1.661 5.178 Canadian 11.41 12.95 0.856 5.09 2.775 4.957 4.825 Canadian 12.46 13.41 0.8706 5.236 3.017 4.987 5.147 Canadian 12.19 13.36 0.8579 5.24 2.909 4.857 5.158 Canadian 11.65 13.07 0.8575 5.108 2.85 5.209 5.135 Canadian 12.89 13.77 0.8541 5.495 3.026 6.185 5.316 Canadian 11.56 13.31 0.8198 5.363 2.683 4.062 5.182 Canadian 11.81 13.45 0.8198 5.413 2.716 4.898 5.352 Canadian 10.91 12.8 0.8372 5.088 2.675 4.179 4.956 Canadian 11.23 12.82 0.8594 5.089 2.821 7.524 4.957 Canadian 10.59 12.41 0.8648 4.899 2.787 4.975 4.794 Canadian 10.93 12.8 0.839 5.046 2.717 5.398 5.045 Canadian 11.27 12.86 0.8563 5.091 2.804 3.985 5.001 Canadian 11.87 13.02 0.8795 5.132 2.953 3.597 5.132 Canadian 10.82 12.83 0.8256 5.18 2.63 4.853 5.089 Canadian 12.11 13.27 0.8639 5.236 2.975 4.132 5.012 Canadian 12.8 13.47 0.886 5.16 3.126 4.873 4.914 Canadian 12.79 13.53 0.8786 5.224 3.054 5.483 4.958 Canadian 13.37 13.78 0.8849 5.32 3.128 4.67 5.091 Canadian 12.62 13.67 0.8481 5.41 2.911 3.306 5.231 Canadian 12.76 13.38 0.8964 5.073 3.155 2.828 4.83 Canadian 12.38 13.44 0.8609 5.219 2.989 5.472 5.045 Canadian 12.67 13.32 0.8977 4.984 3.135 2.3 4.745 Canadian 11.18 12.72 0.868 5.009 2.81 4.051 4.828 Canadian 12.7 13.41 0.8874 5.183 3.091 8.456 5.0 Canadian 12.37 13.47 0.8567 5.204 2.96 3.919 5.001 Canadian 12.19 13.2 0.8783 5.137 2.981 3.631 4.87 Canadian 11.23 12.88 0.8511 5.14 2.795 4.325 5.003 Canadian 13.2 13.66 0.8883 5.236 3.232 8.315 5.056 Canadian 11.84 13.21 0.8521 5.175 2.836 3.598 5.044 Canadian 12.3 13.34 0.8684 5.243 2.974 5.637 5.063 Canadian
第一步:加载数据
load.py
import numpy as np def load_dataset(dataset_name): data = [] label = [] with open(‘{0}.tsv‘.format(dataset_name),‘r‘) as f: lines = f.readlines() for line in lines: linedata = line.strip().split(‘\t‘) data.append([float(da) for da in linedata[:-1]]) label.append(linedata[-1]) data = np.array(data) label = np.array(label) return data,label
第二步:设计分类模型
阈值分类模型是在所有的训练数据中找最佳的阈值,这个阈值使得训练集的预测效果最好。
threshold.py
#coding:utf-8 import numpy as np def learn_model(features,labels): best_acc = -1.0 thresh = features.copy() for fi in range(features.shape[1]): # 逐列 thresh = features[:,fi].copy() thresh.sort() for t in thresh: # 列中每一个元素 pred = (features[:,fi]>t) acc = (pred == labels).mean() if acc > best_acc: best_acc = acc best_fi = fi best_t = t print ‘model->best_fi,t,acc:‘,best_fi,best_t,best_acc return best_t,best_fi def apply_model(features,model): t,fi = model return features[:,fi] > t def accurcy(features,labels,model): predictions = apply_model(features,model) return (predictions == labels).mean() #prediction == labels 同为真或同为假
第三步:测试模型的预测准确性
在这里采用十折交叉验证,即把样本数据分成10份,每次取其中一份作为测试数据,其余9份作为训练数据。这种方法的优点是充分利用了数据样本资源,缺点是计算量大。
seeds_threshold.py
#coding:utf-8 from load import load_dataset import numpy as np from threshold import learn_model,accurcy,apply_model features,labels = load_dataset(‘seeds‘) labels = (labels ==‘Canadian‘) #相等就为 True 不相等就为 False sumacc = 0.0 for flod in xrange(10): print ‘第 ‘,flod+1,‘ 次交叉验证‘ training = np.ones(len(features),bool) training[flod::10] = False testing = ~training model = learn_model(features[training],labels[training]) acc = accurcy(features[testing],labels[testing],model) print ‘测试集预测准确率{0:.1%}‘.format(acc) sumacc += acc sumacc /= 10 print ‘平均测试集预测准确率{0:.1%}‘.format(sumacc)
运行 seeds_threshold.py
这样下来一个简单的阈值分类模型就建好了。
第五次交叉验证的准确率都是 81% 分类的阈值分别是第 fi=5 列,t = 4.308.我们就可以用这个阈值预测给定种子是否是 Canadian.
现在我们在已有的基础上把三种 seed 的分类阈值都求出来:
seeds.py
#coding:utf-8 from load import load_dataset import numpy as np from threshold import learn_model,accurcy,apply_model features,rawlabels = load_dataset(‘seeds‘) labelset = set(rawlabels) print labelset #labels = (labels ==‘Canadian‘) #相等就为 True 不相等就为 False for label in labelset: print label labels = rawlabels.copy() labels = (labels == label) sumacc = 0.0 bestacc = 0.0 for flod in xrange(10): print ‘第 ‘,flod+1,‘ 次交叉验证‘ training = np.ones(len(features),bool) training[flod::10] = False testing = ~training model = learn_model(features[training],labels[training]) acc = accurcy(features[testing],labels[testing],model) print ‘测试集预测准确率{0:.1%}‘.format(acc) if acc > bestacc: bestacc = acc bestmodel = model sumacc += acc sumacc /= 10 print ‘平均测试集预测准确率{0:.1%}‘.format(sumacc) print ‘最佳模型:‘,model
分别求得各个类别的分类阈值。
根据阈值可以画出分类树。
用data表示一个待分类数据 data 有 7个元素,分别表示 7 个不同的特征。
分类树:
时间: 2024-10-22 02:16:25