《西瓜书》第三章，线性回归

? 使用线性回归来为散点作分类

● 代码

  1 import numpy as np
  2 import matplotlib.pyplot as plt
  3 from mpl_toolkits.mplot3d import Axes3D
  4 from mpl_toolkits.mplot3d.art3d import Poly3DCollection
  5 from matplotlib.patches import Rectangle
  6
  7 dataSize = 10000
  8 trainRatio = 0.3
  9 colors = [[0.5,0.25,0],[1,0,0],[0,0.5,0],[0,0,1],[1,0.5,0]] # 棕红绿蓝橙
 10 trans = 0.5
 11
 12 def dataSplit(data, part):                                  # 将数据集分割为训练集和测试集
 13     return data[0:part,:],data[part:,:]
 14
 15 def function(x,para):                                       # 连续回归函数
 16     return np.sum(x * para[0]) + para[1]
 17
 18 def judge(x, para):                                         # 分类函数，用 0.5 作突跃点
 19     return int(function(x, para) > 0.5)
 20
 21 def createData(dim, len):                                   # 生成测试数据
 22     np.random.seed(103)
 23     output=np.zeros([len,dim+1])
 24
 25     if dim == 1:
 26         temp = 2 * np.random.rand(len)
 27         output[:,0] = temp
 28         output[:,1] = list(map(lambda x : int(x > 1), temp))
 29         #print(output, "\n", np.sum(output[:,-1])/len)
 30         return output
 31     if dim == 2:
 32         output[:,0] = 2 * np.random.rand(len)
 33         output[:,1] = 2 * np.random.rand(len)
 34         output[:,2] = list(map(lambda x,y : int(y > 0.5 * (x + 1)), output[:,0], output[:,1]))
 35         #print(output, "\n", np.sum(output[:,-1])/len)
 36         return output
 37     if dim == 3:
 38         output[:,0] = 2 * np.random.rand(len)
 39         output[:,1] = 2 * np.random.rand(len)
 40         output[:,2] = 2 * np.random.rand(len)
 41         output[:,3] = list(map(lambda x,y,z : int(-3 * x + 2 * y + 2 * z > 0), output[:,0], output[:,1], output[:,2]))
 42         #print(output, "\n", np.sum(output[:,-1])/len)
 43         return output
 44     else:
 45         for i in range(dim):
 46             output[:,i] = 2 * np.random.rand(len)
 47         output[:,dim] = list(map(lambda x : int(x > 1), (3 - 2 * dim)*output[:,0] + 2 * np.sum(output[:,1:dim], 1)))
 48         #print(output, "\n", np.sum(output[:,-1])/len)
 49         return output
 50
 51 def linearRegression(data):                                 # 线性回归
 52     len = np.shape(data)[0]
 53     dim = np.shape(data)[1] - 1
 54     if(dim) == 1:                                           # 一元
 55         sumX = np.sum(data[:,0])
 56         sumY = np.sum(data[:,1])
 57         sumXY = np.sum([x*y for x,y in data])
 58         sumXX = np.sum([x*x for x in data[:,0]])
 59         w = (sumXY * len - sumX * sumY) / (sumXX * len - sumX * sumX)
 60         b = (sumY - w * sumX) / len
 61         return (w , b)
 62     else:                                                   # 二元及以上，暂不考虑降秩的问题
 63         dataE = np.concatenate((data[:, 0:-1], np.ones(len)[:,np.newaxis]), axis = 1)
 64         w = np.matmul(np.matmul(np.linalg.inv(np.matmul(dataE.T, dataE)),dataE.T),data[:,-1]) # w = (X^T * X)^(-1) * X^T * y
 65         return (w[0:-1],w[-1])
 66
 67 def test(dim):                                              # 测试函数
 68     allData = createData(dim, dataSize)
 69     trainData, testData = dataSplit(allData, int(dataSize * trainRatio))
 70
 71     para = linearRegression(trainData)
 72
 73     myResult = [ judge(i[0:dim], para) for i in testData ]
 74     errorRatio = np.sum((np.array(myResult) - testData[:,-1].astype(int))**2) / (dataSize*(1-trainRatio))
 75     print("dim = "+ str(dim) + ", errorRatio = " + str(round(errorRatio,4)))
 76     if dim >= 4:                                            # 4维以上不画图，只输出测试错误率
 77         return
 78
 79     errorP = []                                             # 画图部分，测试数据集分为错误类，1 类和 0 类
 80     class1 = []
 81     class0 = []
 82     for i in range(np.shape(testData)[0]):
 83         if myResult[i] != testData[i,-1]:
 84             errorP.append(testData[i])
 85         elif myResult[i] == 1:
 86             class1.append(testData[i])
 87         else:
 88             class0.append(testData[i])
 89     errorP = np.array(errorP)
 90     class1 = np.array(class1)
 91     class0 = np.array(class0)
 92
 93     fig = plt.figure(figsize=(10, 8))
 94
 95     if dim == 1:
 96         plt.xlim(0.0,2.0)
 97         plt.ylim(-0.5,1.25)
 98         plt.plot([1, 1], [-0.5, 1.25], color = colors[0],label = "realBoundary")
 99         xx = np.arange(0,2,0.2)
100         plt.plot(xx, [function(i, para) for i in xx],color = colors[4], label = "myF")
101         plt.scatter(class1[:,0], class1[:,1],color = colors[1], s = 2,label = "class1Data")
102         plt.scatter(class0[:,0], class0[:,1],color = colors[2], s = 2,label = "class0Data")
103         plt.scatter(errorP[:,0], errorP[:,1],color = colors[3], s = 16,label = "errorData")
104         plt.text(0.4, 1.12, "realBoundary: 2x = 1\nmyF(x) = " + str(round(para[0],2)) + " x + " + str(round(para[1],2)) + "\n errorRatio = " + str(round(errorRatio,4)),105             size=15, ha="center", va="center", bbox=dict(boxstyle="round", ec=(1., 0.5, 0.5), fc=(1., 1., 1.)))
106         R = [Rectangle((0,0),0,0, color = colors[k]) for k in range(5)]
107         plt.legend(R, ["realBoundary", "class1Data", "class0Data", "errorData", "myF"], loc=[0.81, 0.2], ncol=1, numpoints=1, framealpha = 1)
108
109     if dim == 2:
110         plt.xlim(0.0,2.0)
111         plt.ylim(0.0,2.0)
112         xx = np.arange(0, 2 + 0.2, 0.2)
113         plt.plot(xx, [function(i,(0.5,0.5)) for i in xx], color = colors[0],label = "realBoundary")
114         X,Y = np.meshgrid(xx, xx)
115         contour = plt.contour(X, Y, [ [ function((X[i,j],Y[i,j]), para) for j in range(11)] for i in range(11) ])
116         plt.clabel(contour, fontsize = 10,colors=‘k‘)
117         plt.scatter(class1[:,0], class1[:,1],color = colors[1], s = 2,label = "class1Data")
118         plt.scatter(class0[:,0], class0[:,1],color = colors[2], s = 2,label = "class0Data")
119         plt.scatter(errorP[:,0], errorP[:,1],color = colors[3], s = 8,label = "errorData")
120         plt.text(1.48, 1.85, "realBoundary: -x + 2y = 1\nmyF(x,y) = " + str(round(para[0][0],2)) + " x + " + str(round(para[0][1],2)) + " y + " + str(round(para[1],2)) + "\n errorRatio = " + str(round(errorRatio,4)), 121             size = 15, ha="center", va="center", bbox=dict(boxstyle="round", ec=(1., 0.5, 0.5), fc=(1., 1., 1.)))
122         R = [Rectangle((0,0),0,0, color = colors[k]) for k in range(4)]
123         plt.legend(R, ["realBoundary", "class1Data", "class0Data", "errorData"], loc=[0.81, 0.2], ncol=1, numpoints=1, framealpha = 1)
124
125     if dim == 3:
126         ax = Axes3D(fig)
127         ax.set_xlim3d(0.0, 2.0)
128         ax.set_ylim3d(0.0, 2.0)
129         ax.set_zlim3d(0.0, 2.0)
130         ax.set_xlabel(‘X‘, fontdict={‘size‘: 15, ‘color‘: ‘k‘})
131         ax.set_ylabel(‘Y‘, fontdict={‘size‘: 15, ‘color‘: ‘k‘})
132         ax.set_zlabel(‘W‘, fontdict={‘size‘: 15, ‘color‘: ‘k‘})
133         v = [(0, 0, 0.5), (0, 0.5, 0), (1, 2, 0), (2, 2, 1.5), (2, 1.5, 2), (1, 0, 2)]
134         f = [[0,1,2,3,4,5]]
135         poly3d = [[v[i] for i in j] for j in f]
136         ax.add_collection3d(Poly3DCollection(poly3d, edgecolor = ‘k‘, facecolors = colors[0]+[trans], linewidths=1))
137         ax.scatter(class1[:,0], class1[:,1],class1[:,2], color = colors[1], s = 2, label = "class1")
138         ax.scatter(class0[:,0], class0[:,1],class0[:,2], color = colors[2], s = 2, label = "class0")
139         ax.scatter(errorP[:,0], errorP[:,1],errorP[:,2], color = colors[3], s = 8, label = "errorData")
140         ax.text3D(1.62, 2, 2.35, "realBoundary: -3x + 2y +2z = 1\nmyF(x,y,z) = " + str(round(para[0][0],2)) + " x + " + 141             str(round(para[0][1],2)) + " y + " + str(round(para[0][2],2)) + " z + " + str(round(para[1],2)) + "\n errorRatio = " + str(round(errorRatio,4)), 142             size = 12, ha="center", va="center", bbox=dict(boxstyle="round", ec=(1, 0.5, 0.5), fc=(1, 1, 1)))
143         R = [Rectangle((0,0),0,0, color = colors[k]) for k in range(4)]
144         plt.legend(R, ["realBoundary", "class1Data", "class0Data", "errorData"], loc=[0.83, 0.1], ncol=1, numpoints=1, framealpha = 1)
145
146     fig.savefig("R:\\dim" + str(dim) + ".png")
147     plt.close()
148
149 if __name__ == ‘__main__‘:
150     test(1)
151     test(2)
152     test(3)
153     test(4)

● 输出结果

dim = 1, errorRatio = 0.003
dim = 2, errorRatio = 0.0307
dim = 3, errorRatio = 0.0186
dim = 4, errorRatio = 0.0349

原文地址：https://www.cnblogs.com/cuancuancuanhao/p/11111014.html

时间： 2024-10-10 01:44:41

《西瓜书》第三章，线性回归的相关文章

西瓜书第三章线性模型

读书笔记周志华老师的<机器学习> 因为边看边记,所以写在随笔里,如果涉及版权问题,请您联系我立马删除,[email protected] 3.1 基本形式给定d个属性描述的示例 x = (x_1;x_2;...;x_3), 其中x_i是X在第i个属性上的取值,线性模型视图学得一个通过属性的线性组合来进行预测的函数,即 f(x) = w_1*x_1 + w_2*x_2 + ... + w_d*x_d + b, 向量形式其中 w = (w_1;w_2;...;w_d). w直观表达了各属性在

紫书第三章数组和字符串

1 序系统的整理下第三章的学习笔记.例题代码是在未看书本方法前自己尝试并AC的代码,不一定比书上的标程好:习题除了3-8百度了求解方法,其它均独立完成后,会适当查阅网上资料进行整理总结.希望本博文方便自己日后复习的同时,也能给他人带来点有益的帮助(建议配合紫书--<算法竞赛入门经典(第2版)>阅读本博客).有不足或错误之处,欢迎读者指出. 2 例题 2.1 UVa272--Tex Quotes #include <stdio.h> int main() { bool log

PRML 第三章 - 线性回归

这段时间组里在有计划地学习书籍PRML (Pattern Recognition and Machine Learning),前两天自己做了一个里面第三章linear regression的分享,这里把当时做的这个ppt分享给大家. 对于线性回归这一章,首先列一下我认为比较重要的几个问题(ppt slide 4有),建议大家在读的过程总带着这几个问题: linear basis function model中过拟合问题处理方式: 如何分别从频率角度(Frequentist Viewpoint)和

ISLR第三章线性回归应用练习题答案(下)

ISLR:R语言: 机器学习 :线性回归一些专业词汇只知道英语的,中文可能不标准,请轻喷 12.没有截距的简单线性回归 a)观察3.38式可发现当x^2之和与y^2之和相等时,具有相同的参数估计. b) set.seed(1) x=rnorm(100) y=2*x lm.fit=lm(y~x+0) lm.fit2=lm(x~y+0) summary(lm.fit) 输出结果: Call: lm(formula = y ~ x + 0) Residuals: Min 1Q Median 3Q

ISLR第三章线性回归应用练习题答案(上)

ISLR:R语言: 机器学习 :线性回归一些专业词汇只知道英语的,中文可能不标准,请轻喷 8.利用简单的线性回归处理Auto数据集 library(MASS) library(ISLR) library(car) Auto=read.csv("Auto.csv",header=T,na.strings="?") Auto=na.omit(Auto) attach(Auto) summary(Auto) 输出结果: mpg cylinders displacemen

《机器学习》西瓜书第四章决策树

本章主要对决策树算法进行了讲解,主要有决策树的生成过程.决策树的划分选择.决策树的剪枝处理.连续与缺失值的处理以及多变量决策树. 4.1 基本流程决策树是基于树的结构来进行决策的.包含一个根节点.若干内部节点和若干叶节点.叶节点对应于决策结果,其他每个结点对应于一个属性测试. 决策树学习的目的是产生一颗泛化能力强的决策树,其基本流程遵循简单的“分而治之”策略. 决策树学习的基本算法输入:训练集D = {(x1,y1),(x2,y2),...,(xn,yn)}; 属性集 A = {a1,a2,

紫书第三章训练2 暴力集

A - Master-Mind Hints MasterMind is a game for two players. One of them, Designer, selects a secret code. The other, Breaker, tries to break it. A code is no more than a row of colored dots. At the beginning of a game, the players agree upon the leng

紫书第三章练习题：UVA 1225 Digit Counting by 15邱盼威

来源:http://m.blog.csdn.net/article/details?id=70861055 Trung is bored with his mathematicshomeworks. He takes a piece of chalk and starts writing a sequence ofconsecutive integers starting with 1 to N (1 < N < 10000). After that, hecounts the number

西瓜书第四章决策树

读书笔记周志华老师的<机器学习> 4.1 基本流程一个决策树包含一个根结点.若干个内部结点和若干个叶结点:叶结点对应于决策结果,其他每个结点则对应于一个属性测试:每个结点包含的样本集合根据属性测试的结果被划分到子结点中:根结点包含样本全集,从根结点到每个叶结点的路径对应了一个判定测试序列. 决策树的生成是一个递归过程,在决策树基本算法中,有三种情形会导致递归返回:(1)当前节点包含的样本全属于同一类别,无需划分:(2)当前属性集为空,或是所有样本在所有属性上取值相同,无法划分:把当前结点标

《机器学习》西瓜书第六章支持向量机

支持向量机分为:线性可分支持向量机.线性支持向量机.非线性支持向量机线性可分支持向量机:训练数据线性可分,通过硬间隔最大化,学习一个线性的分类器: 线性支持向量机:训练数据近似线性可分,通过软间隔最大化,学习一个线性分类器非线性支持向量机:训练数据线性不可分,通过使用核技巧以及软间隔最大化,学习一个非线性支持向量机. 6.1 间隔与支持向量在样本空间中,划分超平面课通过如下线性方程来描述: ωT x+ b = 0 其中ω = (ω1:ω2:...:ωd )为法向量,决定超平面的方向