1. 监督学习--分类
机器学习肿分类和预测算法的评估:
a. 准确率
b.速度
c. 强壮行
d.可规模性
e. 可解释性
2. 什么是决策树/判定树(decision tree)?
https://scikit-learn.org/stable/modules/tree.html
3. 熵(entropy)概念:
变量的不确定越大,熵也就越大。
4. 决策树归纳算法(ID3)
5. 其他算法及优缺点
6. 决策树的应用
生成后的决策树
逻辑代码:
但这段代码不是特别通用,而且有bug, 需要修改,但基本逻辑是正确的
# -*- coding:utf-8 -*- from sklearn.feature_extraction import DictVectorizer import csv from sklearn import preprocessing from sklearn import tree # 要求是数值型的值 from sklearn.externals.six import StringIO """ 注意: 决策树要求要数值型的值,不能是字符串类型的值 例如: no, yes这样的值是不允许的 需要转换成矩阵 ==================================== age income student youth high no youth high no middle_aged high no senior medium no senior low yes ==================================== 比如上面这种数据: youth middle_aged senior high medium low ...... 1 0 0 1 0 0 1 0 0 1 0 0 ..... """ allElectronicsData = open(r"C:\Users\Administrator\Desktop\data.xlsx", ‘r‘) reader = csv.reader(allElectronicsData) print(reader) headers = next(reader) print(headers) # ["RID", ‘age‘.....] featureList = [] labelList = [] for row in reader: labelList.append(row[len(row) - 1]) rowDict = {} for i in range(1, len(row) - 1): rowDict[headers[i]] = row[i] featureList.append(rowDict) print(featureList) # [ # {"credit_rating": "fair", "age": "youth"}, # .... #作用,方便转换成矩阵。将数据转换成对象 # ] vec = DictVectorizer() dummyX = vec.fit_transform(featureList).toarray() print("dummyX:" + str(dummyX)) # 转换成矩阵的数据了二维 print(vec.get_feature_names()) print("labelList: " + str(labelList)) lb = preprocessing.LabelBinarizer() dummyY = lb.fit_transform(labelList) print("dummyY: " + str(dummyY)) clf = tree.DecisionTreeClassifier(criterion="entropy") clf = clf.fit(dummyX, dummyY) print("clf: ", str(clf)) # 存储决策树信息 # Graphviz 将dot转换成pdf的命令: dot -T pdf iris.dot -o output.pdf # 可以查看decision tree 的形状了(看pdf的值) with open(r"C:\Users\Administrator\Desktop\code\mechine_learning\allElectronicInformationGainOri.dot", "w") as f: f = tree.export_graphviz(clf, feature_names = vec.get_feature_names(), out_file = f) # 下面的代码属于预测的代码 # 属于转化后的矩阵数值,其实就是进行复制修改 oneRowX = dummyX[0, :] print("oneRowX: " + str(oneRowX)) newRowX = oneRowX newRowX[0] = 1 newRowX[2] = 0 print("newRowX: ", str(newRowX)) predictedY = clf.predicted(newRowX) # 预测 class_buys_labels的值 predicted("predictedY: " + str(predictedY)) if __name__ == ‘__main__‘: main()
原文地址:https://www.cnblogs.com/renfanzi/p/11078489.html
时间: 2024-10-10 07:24:08