简单分类器的python代码实现

本文是stanford大学课程:Convolutional Neural Networks for Visual Recognition 的一些笔记与第一次作业。主要内容为简单(多类)分类器的实现:KNN, SVM, softmax。

softmax与SVM的一点区别,其中一张PPT说明:



KNN部分框架都给出,只需要补充L2距离的实现:

  1 import numpy as np
  2
  3 class KNearestNeighbor:
  4   """ a kNN classifier with L2 distance """
  5
  6   def __init__(self):
  7     pass
  8
  9   def train(self, X, y):
 10     """
 11     Train the classifier. For k-nearest neighbors this is just
 12     memorizing the training data.
 13
 14     Input:
 15     X - A num_train x dimension array where each row is a training point.
 16     y - A vector of length num_train, where y[i] is the label for X[i, :]
 17     """
 18     self.X_train = X
 19     self.y_train = y
 20
 21   def predict(self, X, k=1, num_loops=0):
 22     """
 23     Predict labels for test data using this classifier.
 24
 25     Input:
 26     X - A num_test x dimension array where each row is a test point.
 27     k - The number of nearest neighbors that vote for predicted label
 28     num_loops - Determines which method to use to compute distances
 29                 between training points and test points.
 30
 31     Output:
 32     y - A vector of length num_test, where y[i] is the predicted label for the
 33         test point X[i, :].
 34     """
 35     if num_loops == 0:
 36       dists = self.compute_distances_no_loops(X)
 37     elif num_loops == 1:
 38       dists = self.compute_distances_one_loop(X)
 39     elif num_loops == 2:
 40       dists = self.compute_distances_two_loops(X)
 41     else:
 42       raise ValueError(‘Invalid value %d for num_loops‘ % num_loops)
 43
 44     return self.predict_labels(dists, k=k)
 45
 46   def compute_distances_two_loops(self, X):
 47     """
 48     Compute the distance between each test point in X and each training point
 49     in self.X_train using a nested loop over both the training data and the
 50     test data.
 51
 52     Input:
 53     X - An num_test x dimension array where each row is a test point.
 54
 55     Output:
 56     dists - A num_test x num_train array where dists[i, j] is the distance
 57             between the ith test point and the jth training point.
 58     """
 59     num_test = X.shape[0]
 60     num_train = self.X_train.shape[0]
 61     dists = np.zeros((num_test, num_train))
 62     for i in xrange(num_test):
 63       for j in xrange(num_train):
 64         #####################################################################
 65         # TODO:                                                             #
 66         # Compute the l2 distance between the ith test point and the jth    #
 67         # training point, and store the result in dists[i, j]               #
 68         #####################################################################
 69         dists[i][j] = np.sqrt(np.sum((X[i,:] - self.X_train[j,:])**2))
 70         #####################################################################
 71         #                       END OF YOUR CODE                            #
 72         #####################################################################
 73     return dists
 74
 75   def compute_distances_one_loop(self, X):
 76     """
 77     Compute the distance between each test point in X and each training point
 78     in self.X_train using a single loop over the test data.
 79
 80     Input / Output: Same as compute_distances_two_loops
 81     """
 82     num_test = X.shape[0]
 83     num_train = self.X_train.shape[0]
 84     dists = np.zeros((num_test, num_train))
 85     for i in xrange(num_test):
 86       #######################################################################
 87       # TODO:                                                               #
 88       # Compute the l2 distance between the ith test point and all training #
 89       # points, and store the result in dists[i, :].                        #
 90       #######################################################################
 91       dists[i,:] = np.sqrt(np.sum((self.X_train - X[i,:])**2, axis = 1))
 92       #######################################################################
 93       #                         END OF YOUR CODE                            #
 94       #######################################################################
 95     return dists
 96
 97   def compute_distances_no_loops(self, X):
 98     """
 99     Compute the distance between each test point in X and each training point
100     in self.X_train using no explicit loops.
101
102     Input / Output: Same as compute_distances_two_loops
103     """
104     num_test = X.shape[0]
105     num_train = self.X_train.shape[0]
106     dists = np.zeros((num_test, num_train))
107     #########################################################################
108     # TODO:                                                                 #
109     # Compute the l2 distance between all test points and all training      #
110     # points without using any explicit loops, and store the result in      #
111     # dists.                                                                #
112     # HINT: Try to formulate the l2 distance using matrix multiplication    #
113     #       and two broadcast sums.                                         #
114     #########################################################################
115     dists = np.sqrt(np.dot((X**2), np.ones((np.transpose(self.X_train)).shape))116     + np.dot(np.ones(X.shape), np.transpose(self.X_train ** 2))117     - 2 * np.dot(X, np.transpose(self.X_train)))
118     #########################################################################
119     #                         END OF YOUR CODE                              #
120     #########################################################################
121     return dists
122
123   def predict_labels(self, dists, k=1):
124     """
125     Given a matrix of distances between test points and training points,
126     predict a label for each test point.
127
128     Input:
129     dists - A num_test x num_train array where dists[i, j] gives the distance
130             between the ith test point and the jth training point.
131
132     Output:
133     y - A vector of length num_test where y[i] is the predicted label for the
134         ith test point.
135     """
136     num_test = dists.shape[0]
137     y_pred = np.zeros(num_test)
138     for i in xrange(num_test):
139       # A list of length k storing the labels of the k nearest neighbors to
140       # the ith test point.
141       closest_y = []
142       #########################################################################
143       # TODO:                                                                 #
144       # Use the distance matrix to find the k nearest neighbors of the ith    #
145       # training point, and use self.y_train to find the labels of these      #
146       # neighbors. Store these labels in closest_y.                           #
147       # Hint: Look up the function numpy.argsort.                             #
148       #########################################################################
149       idx = np.argsort(dists[i, :])
150       closest_y = list(self.y_train[idx[0:k]])
151       #########################################################################
152       # TODO:                                                                 #
153       # Now that you have found the labels of the k nearest neighbors, you    #
154       # need to find the most common label in the list closest_y of labels.   #
155       # Store this label in y_pred[i]. Break ties by choosing the smaller     #
156       # label.                                                                #
157       #########################################################################
158       labelCount = {}
159       for j in xrange(k):
160         labelCount[closest_y[j]] = labelCount.get(closest_y[j], 0) + 1
161       sortedLabel = sorted(labelCount.iteritems(), key = lambda line:line[1], reverse = True)
162       y_pred[i] = sortedLabel[0][0]
163       #########################################################################
164       #                           END OF YOUR CODE                            #
165       #########################################################################
166
167     return y_pred

linear classifier的实现,需要补全 train 与 predict 两部分:

  1 import numpy as np
  2 from cs231n.classifiers.linear_svm import *
  3 from cs231n.classifiers.softmax import *
  4
  5 class LinearClassifier:
  6
  7   def __init__(self):
  8     self.W = None
  9
 10   def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
 11             batch_size=200, verbose=False):
 12     """
 13     Train this linear classifier using stochastic gradient descent.
 14
 15     Inputs:
 16     - X: D x N array of training data. Each training point is a D-dimensional
 17          column.
 18     - y: 1-dimensional array of length N with labels 0...K-1, for K classes.
 19     - learning_rate: (float) learning rate for optimization.
 20     - reg: (float) regularization strength.
 21     - num_iters: (integer) number of steps to take when optimizing
 22     - batch_size: (integer) number of training examples to use at each step.
 23     - verbose: (boolean) If true, print progress during optimization.
 24
 25     Outputs:
 26     A list containing the value of the loss function at each training iteration.
 27     """
 28     dim, num_train = X.shape
 29     num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
 30     if self.W is None:
 31       # lazily initialize W
 32       self.W = np.random.randn(num_classes, dim) * 0.001
 33
 34     # Run stochastic gradient descent to optimize W
 35     loss_history = []
 36     for it in xrange(num_iters):
 37       X_batch = None
 38       y_batch = None
 39
 40       #########################################################################
 41       # TODO:                                                                 #
 42       # Sample batch_size elements from the training data and their           #
 43       # corresponding labels to use in this round of gradient descent.        #
 44       # Store the data in X_batch and their corresponding labels in           #
 45       # y_batch; after sampling X_batch should have shape (dim, batch_size)   #
 46       # and y_batch should have shape (batch_size,)                           #
 47       #                                                                       #
 48       # Hint: Use np.random.choice to generate indices. Sampling with         #
 49       # replacement is faster than sampling without replacement.              #
 50       #########################################################################
 51       sample_idx = np.random.choice(num_train, batch_size, replace = True)
 52       X_batch = X[:, sample_idx]
 53       y_batch = y[sample_idx]
 54       #########################################################################
 55       #                       END OF YOUR CODE                                #
 56       #########################################################################
 57
 58       # evaluate loss and gradient
 59       loss, grad = self.loss(X_batch, y_batch, reg)
 60       loss_history.append(loss)
 61
 62       # perform parameter update
 63       #########################################################################
 64       # TODO:                                                                 #
 65       # Update the weights using the gradient and the learning rate.          #
 66       #########################################################################
 67       self.W += -learning_rate*grad
 68       #########################################################################
 69       #                       END OF YOUR CODE                                #
 70       #########################################################################
 71
 72       if verbose and it % 100 == 0:
 73         print ‘iteration %d / %d: loss %f‘ % (it, num_iters, loss)
 74
 75     return loss_history
 76
 77   def predict(self, X):
 78     """
 79     Use the trained weights of this linear classifier to predict labels for
 80     data points.
 81
 82     Inputs:
 83     - X: D x N array of training data. Each column is a D-dimensional point.
 84
 85     Returns:
 86     - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
 87       array of length N, and each element is an integer giving the predicted
 88       class.
 89     """
 90     y_pred = np.zeros(X.shape[1])
 91     ###########################################################################
 92     # TODO:                                                                   #
 93     # Implement this method. Store the predicted labels in y_pred.            #
 94     ###########################################################################
 95     y_pred = np.argmax(np.dot(self.W, X), axis = 0)
 96     ###########################################################################
 97     #                           END OF YOUR CODE                              #
 98     ###########################################################################
 99     return y_pred
100
101   def loss(self, X_batch, y_batch, reg):
102     """
103     Compute the loss function and its derivative.
104     Subclasses will override this.
105
106     Inputs:
107     - X_batch: D x N array of data; each column is a data point.
108     - y_batch: 1-dimensional array of length N with labels 0...K-1, for K classes.
109     - reg: (float) regularization strength.
110
111     Returns: A tuple containing:
112     - loss as a single float
113     - gradient with respect to self.W; an array of the same shape as W
114     """
115     pass
116
117
118 class LinearSVM(LinearClassifier):
119   """ A subclass that uses the Multiclass SVM loss function """
120
121   def loss(self, X_batch, y_batch, reg):
122     return svm_loss_vectorized(self.W, X_batch, y_batch, reg)
123
124
125 class Softmax(LinearClassifier):
126   """ A subclass that uses the Softmax + Cross-entropy loss function """
127
128   def loss(self, X_batch, y_batch, reg):
129     return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)

然后就是svm与softmax分类器的loss与梯度的实现。

SVM 的loss function 与 gradient :

loss function: $$L = \frac{1}{N} \sum_i \sum_{y_i \ne j} [ \max( 0, \mathrm{f}(\mathrm{x}_i, W)_j - \mathrm{f}(\mathrm{x}_i, W)_{y_i} + 1  )] + \frac{\lambda}{2} \sum_k\sum_l W_{k,l}^2$$

gradient: $$ \nabla_{\mathrm{w}_j} L = \frac{1}{N} \sum_i 1\{\mathrm{w}_j^T \mathrm{x}_i-w_{y_i}^T\mathrm{x}_i+1>0\}\mathrm{x}_i + \lambda \mathrm{w}_j$$

 1 import numpy as np
 2 from random import shuffle
 3
 4 def svm_loss_naive(W, X, y, reg):
 5   """
 6   Structured SVM loss function, naive implementation (with loops)
 7   Inputs:
 8   - W: C x D array of weights
 9   - X: D x N array of data. Data are D-dimensional columns
10   - y: 1-dimensional array of length N with labels 0...K-1, for K classes
11   - reg: (float) regularization strength
12   Returns:
13   a tuple of:
14   - loss as single float
15   - gradient with respect to weights W; an array of same shape as W
16   """
17   dW = np.zeros(W.shape) # initialize the gradient as zero
18
19   # compute the loss and the gradient
20   num_classes = W.shape[0]
21   num_train = X.shape[1]
22   loss = 0.0
23   for i in xrange(num_train):
24     scores = W.dot(X[:, i])
25     correct_class_score = scores[y[i]]
26     for j in xrange(num_classes):
27       if j == y[i]:
28         continue
29       margin = scores[j] - correct_class_score + 1 # note delta = 1
30       if margin > 0:
31         loss += margin
32         dW[j, :] += (X[:, i]).transpose()
33
34   # Right now the loss is a sum over all training examples, but we want it
35   # to be an average instead so we divide by num_train.
36   loss /= num_train
37   dW /= num_train
38
39   # Add regularization to the loss.
40   loss += 0.5 * reg * np.sum(W * W)
41   dW += reg * W
42
43   #############################################################################
44   # TODO:                                                                     #
45   # Compute the gradient of the loss function and store it dW.                #
46   # Rather that first computing the loss and then computing the derivative,   #
47   # it may be simpler to compute the derivative at the same time that the     #
48   # loss is being computed. As a result you may need to modify some of the    #
49   # code above to compute the gradient.                                       #
50   #############################################################################
51
52
53   return loss, dW
54
55
56 def svm_loss_vectorized(W, X, y, reg):
57   """
58   Structured SVM loss function, vectorized implementation.
59
60   Inputs and outputs are the same as svm_loss_naive.
61   """
62   loss = 0.0
63   dW = np.zeros(W.shape) # initialize the gradient as zero
64
65   #############################################################################
66   # TODO:                                                                     #
67   # Implement a vectorized version of the structured SVM loss, storing the    #
68   # result in loss.                                                           #
69   #############################################################################
70   num_train = y.shape[0]
71   Y_hat = W.dot(X)
72   err_dist = Y_hat - Y_hat[tuple([y, range(num_train)])] + 1
73   err_dist[err_dist <= 0] = 0.0
74   err_dist[tuple([y, range(num_train)])] = 0.0
75   loss += np.sum(err_dist)/num_train
76   loss += 0.5 * reg * np.sum(W * W)
77
78   #############################################################################
79   #                             END OF YOUR CODE                              #
80   #############################################################################
81
82
83   #############################################################################
84   # TODO:                                                                     #
85   # Implement a vectorized version of the gradient for the structured SVM     #
86   # loss, storing the result in dW.                                           #
87   #                                                                           #
88   # Hint: Instead of computing the gradient from scratch, it may be easier    #
89   # to reuse some of the intermediate values that you used to compute the     #
90   # loss.                                                                     #
91   #############################################################################
92   err_dist[err_dist>0] = 1.0/num_train
93   dW += err_dist.dot(X.transpose()) + reg * W
94
95   #############################################################################
96   #                             END OF YOUR CODE                              #
97   #############################################################################
98
99   return loss, dW

softmax的loss function 与 gradient :

loss function: $$ L = \frac {1}{N} \sum_i \sum_j \mathrm{1}\{y_i=j\}\cdot \log(\frac{e^{\mathrm{f}_j}}{\sum_m e^{\mathrm{f}_m}}) + \frac{\lambda}{2} \sum_k\sum_l W_{k,l}^2$$

gradient: $$\nabla_{\mathrm{w}_j} L = -\frac{1}{N} \sum_i \left[1\{y_i=j\} - p(y_i=j|\mathrm{x}_i;W)\right]\mathrm{x}_i + \lambda \mathrm{w}_j$$

其中 $p(y_i=j | \mathrm{x}_i;W) = \frac{e^{\mathrm{f}_j}}{\sum_m e^{\mathrm{f}_m}}$

 1 import numpy as np
 2 from random import shuffle
 3
 4 def softmax_loss_naive(W, X, y, reg):
 5   """
 6   Softmax loss function, naive implementation (with loops)
 7   Inputs:
 8   - W: C x D array of weights
 9   - X: D x N array of data. Data are D-dimensional columns
10   - y: 1-dimensional array of length N with labels 0...K-1, for K classes
11   - reg: (float) regularization strength
12   Returns:
13   a tuple of:
14   - loss as single float
15   - gradient with respect to weights W, an array of same size as W
16   """
17   # Initialize the loss and gradient to zero.
18   loss = 0.0
19   dW = np.zeros_like(W)
20
21   #############################################################################
22   # TODO: Compute the softmax loss and its gradient using explicit loops.     #
23   # Store the loss in loss and the gradient in dW. If you are not careful     #
24   # here, it is easy to run into numeric instability. Don‘t forget the        #
25   # regularization!                                                           #
26   #############################################################################
27   dim, num_data = X.shape
28   num_class = W.shape[0]
29   Y_hat = np.exp(np.dot(W, X))
30   prob = Y_hat / np.sum(Y_hat, axis = 0)
31
32   # C x N array, element(i,j)=1 if y[j]=i
33   ground_truth = np.zeros_like(prob)
34   ground_truth[tuple([y, range(len(y))])] = 1.0
35
36   for i in xrange(num_data):
37     for j in xrange(num_class):
38       loss += -(ground_truth[j, i] * np.log(prob[j, i]))/num_data
39       dW[j, :] += -(ground_truth[j, i] - prob[j, i])*(X[:,i]).transpose()/num_data
40   loss += 0.5*reg*np.sum(np.sum(W**2, axis = 0)) # reg term
41   dW += reg*W
42
43   #############################################################################
44   #                          END OF YOUR CODE                                 #
45   #############################################################################
46
47   return loss, dW
48
49
50 def softmax_loss_vectorized(W, X, y, reg):
51   """
52   Softmax loss function, vectorized version.
53
54   Inputs and outputs are the same as softmax_loss_naive.
55   """
56   # Initialize the loss and gradient to zero.
57   loss = 0.0
58   dW = np.zeros_like(W)
59
60   #############################################################################
61   # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
62   # Store the loss in loss and the gradient in dW. If you are not careful     #
63   # here, it is easy to run into numeric instability. Don‘t forget the        #
64   # regularization!                                                           #
65   #############################################################################
66   dim, num_data = X.shape
67   Y_hat = np.exp(np.dot(W, X))
68   prob = Y_hat / np.sum(Y_hat, axis = 0)#probabilities
69
70   # C x N array, element(i,j)=1 if y[j]=i
71   ground_truth = np.zeros_like(prob)
72   ground_truth[tuple([y, range(len(y))])] = 1.0
73
74   loss = -np.sum(ground_truth*np.log(prob)) / num_data + 0.5*reg*np.sum(W*W)
75   dW = (-np.dot(ground_truth - prob, X.transpose()))/num_data + reg*W
76   #############################################################################
77   #                          END OF YOUR CODE                                 #
78   #############################################################################
79
80   return loss, dW

然后测试代码在IPython Notebook上完成,可以进行单元测试,一点一点运行。

比如,基于给定数据集,KNN的测试结果(K=10):

SVM测试结果:

Softmax测试结果:

时间: 2024-08-05 09:07:13

简单分类器的python代码实现的相关文章

研磨设计模式解析及python代码实现——(一)简单工厂模式

最近在学设计模式,正巧书之前学了些python,但用的还不是很成熟.<研磨设计模式>书上只给了java代码,本着以练手为目标,我照着书上打了一遍java代码,在仔细体会其思想后,将其写成了python的代码.有不对的地方希望各位批评指正~ 具体原理不多做介绍,具体可以参考http://chjavach.iteye.com的博客,或者<研磨设计模式>的书. 一.背景介绍 接口思想: 众所周知,面向对象语言最大的特点便是封装,继承,多态这三个概念.而像Java等面向对象语言最核心的思想

python写的简单有效的爬虫代码

python写的简单有效的爬虫代码 by 伍雪颖 import re import urllib def getHtml(url): html = urllib.urlopen(url) scode = html.read() return scode def getImage(source): reg = r'src="(.*?\.jpg)"' imgre = re.compile(reg) images = re.findall(imgre,source) x = 0 for i

数据预处理速度高倍提升,3行python代码简单搞定!

Python 是机器学习领域内的首选编程语言,它易于使用,也有很多出色的库来帮助你更快处理数据.但当我们面临大量数据时,一些问题就会显现-- 目前,大数据(Big Data)这个术语通常用于表示包含数十万数据点的数据集.在这样的尺度上,工作进程中加入任何额外的计算都需要时刻注意保持效率.在设计机器学习系统时,数据预处理非常重要--在这里,我们必须对所有数据点使用某种操作. 在默认情况下,Python 程序是单个进程,使用单 CPU 核心执行.而大多数当代机器学习硬件都至少搭载了双核处理器.这意味

教你用Python代码实现微信遥控电脑,简单骚操作值得学习

利用python设计一个程序来实现遥控电脑.功能简单,但是能够拓展的范围极大!功能电脑开机时,手机能收到通知 由手机发个特定的邮件,控制电脑关机 步骤 申请一个邮箱,并绑定自己的手机号码(163邮箱) 在这个邮箱中设置当有新的邮件就发短信通知 电脑开机时往这个邮箱发个邮件,我手机就会收到短信通知思路第二个步骤就是通过python 脚本,定时去检查163.com邮箱中是否有指定的邮件,如果有,则执行特定功能(我的是关机).申请成功后,手机也可以通过一个号码来发送邮件,如果想关机,就会用手机发个邮件

XGBoost参数调优完全指南(附Python代码)

XGBoost参数调优完全指南(附Python代码):http://www.2cto.com/kf/201607/528771.html https://www.zhihu.com/question/41354392 [以下转自知乎] https://www.zhihu.com/question/45487317 为什么xgboost/gbdt在调参时为什么树的深度很少就能达到很高的精度? 参加kaggle的时候,用xgboost/gbdt在在调参的时候把树的最大深度调成6就有很高的精度了.但是

机器学习完整过程案例分布解析,python代码解析

所谓学习问题,是指观察由n个样本组成的集合,并依据这些数据来预測未知数据的性质. 学习任务(一个二分类问题): 区分一个普通的互联网检索Query是否具有某个垂直领域的意图.如果如今有一个O2O领域的垂直搜索引擎,专门为用户提供团购.优惠券的检索:同一时候存在一个通用的搜索引擎,比方百度,通用搜索引擎希望可以识别出一个Query是否具有O2O检索意图,如果有则调用O2O垂直搜索引擎,获取结果作为通用搜索引擎的结果补充. 我们的目的是学习出一个分类器(classifier),分类器能够理解为一个函

Python代码样例列表

├─algorithm│       Python用户推荐系统曼哈顿算法实现.py│      NFA引擎,Python正则测试工具应用示例.py│      Python datetime计时程序的实现方法.py│      python du熊学斐波那契实现.py│      python lambda实现求素数的简短代码.py│      Python localtime()方法计算今天是一年中第几周.py│      Python math方法算24点代码详解.py│      Pyth

10 行 Python 代码实现模糊查询/智能提示

10 行 Python 代码实现模糊查询/智能提示 1.导语: 模糊匹配可以算是现代编辑器(如 Eclipse 等各种 IDE)的一个必备特性了,它所做的就是根据用户输入的部分内容,猜测用户想要的文件名,并提供一个推荐列表供用户选择. 样例如下: Vim (Ctrl-P) Sublime Text (Cmd-P) '模糊匹配'这是一个极为有用的特性,同时也非常易于实现. 2.问题分析: 我们有一堆字符串(文件名)集合,我们根据用户的输入不断进行过滤,用户的输入可能是字符串的一部分.我们就以下面的

200行Python代码实现2048

200行Python代码实现2048 一.实验说明 1. 环境登录 无需密码自动登录,系统用户名shiyanlou 2. 环境介绍 本实验环境采用带桌面的Ubuntu Linux环境,实验中会用到桌面上的程序: LX终端(LXTerminal): Linux命令行终端,打开后会进入Bash环境,可以使用Linux命令 GVim:非常好用的编辑器,最简单的用法可以参考课程Vim编辑器 3. 环境使用 使用GVim编辑器输入实验所需的代码及文件,使用LX终端(LXTerminal)运行所需命令进行操