k近邻算法python实现 -- 《机器学习实战》

 1 ‘‘‘
 2 Created on Nov 06, 2017
 3 kNN: k Nearest Neighbors
 4
 5 Input:      inX: vector to compare to existing dataset (1xN)
 6             dataSet: size m data set of known vectors (NxM)
 7             labels: data set labels (1xM vector)
 8             k: number of neighbors to use for comparison (should be an odd number)
 9
10 Output:     the most popular class label
11
12 @author: Liu Chuanfeng
13 ‘‘‘
14 import operator
15 import numpy as np
16 import matplotlib.pyplot as plt
17
18 def classify0(inX, dataSet, labels, k):
19     dataSetSize = dataSet.shape[0]
20     diffMat = np.tile(inX, (dataSetSize,1)) - dataSet
21     sqDiffMat = diffMat ** 2
22     sqDistances = sqDiffMat.sum(axis=1)
23     distances = sqDistances ** 0.5
24     sortedDistIndicies = distances.argsort()
25     classCount = {}
26     for i in range(k):
27         voteIlabel = labels[sortedDistIndicies[i]]
28         classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
29     sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)
30     return sortedClassCount[0][0]
31
32 def file2matrix(filename):
33     fr = open(filename)
34     arrayLines = fr.readlines()
35     numberOfLines = len(arrayLines)
36     returnMat = np.zeros((numberOfLines, 3))
37     classLabelVector = []
38     index = 0
39     for line in arrayLines:
40         line = line.strip()
41         listFromLine = line.split(‘\t‘)
42         returnMat[index,:] = listFromLine[0:3]
43         classLabelVector.append(int(listFromLine[-1]))
44         index += 1
45     return returnMat, classLabelVector
46
47 def autoNorm(dataSet):
48     maxVals = dataSet.max(0)
49     minVals = dataSet.min(0)
50     ranges = maxVals -  minVals
51     m = dataSet.shape[0]
52     normDataSet = (dataSet - np.tile(minVals, (m, 1))) / np.tile(ranges, (m, 1))
53     return normDataSet, ranges, minVals
54
55 def datingClassTest():
56     hoRatio = 0.10
57     datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘)
58     normMat, ranges, minVals = autoNorm(datingDataMat)
59     m = normMat.shape[0]
60     numTestVecs = int(m * hoRatio)
61     errorCount = 0.0
62     for i in range(numTestVecs):
63         classifyResult = classify0(normMat[i,:], normMat[numTestVecs:m, :], datingLabels[numTestVecs:m], 3)
64         print(‘theclassifier came back with: %d, the real answer is: %d‘ % (classifyResult, datingLabels[i]))
65         if ( classifyResult != datingLabels[i]):
66             errorCount += 1.0
67         print (‘the total error rate is: %.1f%%‘ % (errorCount/float(numTestVecs) * 100))
68
69 def classifyPerson():
70     resultList = [‘not at all‘, ‘in small doses‘, ‘in large doses‘]
71     percentTats = float(input("percentage of time spent playing video games?"))
72     ffMiles = float(input("frequent flier miles earned per year?"))
73     iceCream = float(input("liters of ice cream consumed per year?"))
74     datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘)
75     normMat, ranges, minVals = autoNorm(datingDataMat)
76     inArr = np.array([ffMiles, percentTats, iceCream])
77     classifyResult = classify0((inArr-minVals)/ranges, normMat, datingLabels, 3)
78     print ("You will probably like this persoon:", resultList[classifyResult - 1])
79
80 # Unit test of func: file2matrix()
81 #datingDataMat, datingLabels = file2matrix(‘datingTestSet2.txt‘)
82 #print (datingDataMat)
83 #print (datingLabels)
84
85 # Usage of figure construction of matplotlib
86 #fig=plt.figure()
87 #ax = fig.add_subplot(111)
88 #ax.scatter(datingDataMat[:,1], datingDataMat[:,2], 15.0*np.array(datingLabels), 15.0*np.array(datingLabels))
89 #plt.show()
90
91 #Unit test of func: autoNorm()
92 #normMat, ranges, minVals = autoNorm(datingDataMat)
93 #print (normMat)
94 #print (ranges)
95 #print (minVals)
96
97 datingClassTest()
98 classifyPerson()

Output:

theclassifier came back with: 3, the real answer is: 3
the total error rate is: 0.0%
theclassifier came back with: 2, the real answer is: 2
the total error rate is: 0.0%
theclassifier came back with: 1, the real answer is: 1
the total error rate is: 0.0%

...

theclassifier came back with: 2, the real answer is: 2
the total error rate is: 4.0%
theclassifier came back with: 1, the real answer is: 1
the total error rate is: 4.0%
theclassifier came back with: 3, the real answer is: 1
the total error rate is: 5.0%

percentage of time spent playing video games?10
frequent flier miles earned per year?10000
liters of ice cream consumed per year?0.5
You will probably like this persoon: in small doses

Reference:

《机器学习实战》

时间： 2024-08-03 08:53:24

k近邻算法python实现 -- 《机器学习实战》

k近邻算法python实现 -- 《机器学习实战》的相关文章

机器学习实战笔记-K近邻算法1（分类动作片与爱情片）

机器学习实战笔记-K近邻算法2(改进约会网站的配对效果)

机器学习实战笔记--k近邻算法

用Python从零开始实现K近邻算法

k近邻算法的Python实现

第2章 K近邻算法实战（KNN）

机器学习随笔01 - k近邻算法

机器学习(四) 机器学习(四) 分类算法--K近邻算法 KNN (下)

02-16 k近邻算法