萌新向Python数据分析及数据挖掘 第三章 机器学习常用算法 第二节 线性回归算法 (下)实操篇

线性回归算法

In [ ]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

In [ ]:

boston  = datasets.load_boston()
X = boston.data[:,5] #- RM       average number of rooms per dwelling
y = boston.target
print(X.shape)
print(y.shape)

In [ ]:

print(boston.DESCR) #数据描述

In [ ]:

plt.scatter(X,y)#使用单个变量 RM -price  用散点图表示

Signature: plt.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, hold=None, data=None, **kwargs) Docstring: Make a scatter plot of x vs y.

Marker size is scaled by s and marker color is mapped to c.

Parameters

x, y : array_like, shape (n, ) Input data

s : scalar or array_like, shape (n, ), optional size in points^2. Default is rcParams[‘lines.markersize‘] ** 2.

c : color, sequence, or sequence of color, optional, default: ‘b‘ c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however, including the case of a single row to specify the same color for all points.

marker : ~matplotlib.markers.MarkerStyle, optional, default: ‘o‘ See ~matplotlib.markers for more information on the different styles of markers scatter supports. marker can be either an instance of the class or the text shorthand for a particular marker.

cmap : ~matplotlib.colors.Colormap, optional, default: None A ~matplotlib.colors.Colormap instance or registered name. cmap is only used if c is an array of floats. If None, defaults to rc image.cmap.

norm : ~matplotlib.colors.Normalize, optional, default: None A ~matplotlib.colors.Normalize instance is used to scale luminance data to 0, 1. norm is only used if c is an array of floats. If None, use the default :func:normalize.

vmin, vmax : scalar, optional, default: None vmin and vmax are used in conjunction with norm to normalize luminance data. If either are None, the min and max of the color array is used. Note if you pass a norminstance, your settings for vmin and vmax will be ignored.

alpha : scalar, optional, default: None The alpha blending value, between 0 (transparent) and 1 (opaque)

linewidths : scalar or array_like, optional, default: None If None, defaults to (lines.linewidth,).

verts : sequence of (x, y), optional If marker is None, these vertices will be used to construct the marker. The center of the marker is located at (0,0) in normalized units. The overall marker is rescaled by s.

edgecolors : color or sequence of color, optional, default: None If None, defaults to ‘face‘

If ‘face‘, the edge color will always be the same as
the face color.

If it is ‘none‘, the patch boundary will not
be drawn.

For non-filled markers, the `edgecolors` kwarg
is ignored and forced to ‘face‘ internally.

Returns

paths : ~matplotlib.collections.PathCollection

Other Parameters

**kwargs : ~matplotlib.collections.Collection properties

See Also

plot : to plot scatter plots when markers are identical in size and color

Notes

  • The plot function will be faster for scatterplots where markers don‘t vary in size or color.
  • Any or all of xys, and c may be masked arrays, in which case all masks will be combined and only unmasked points will be plotted.

    Fundamentally, scatter works with 1-D arrays; xys, and c may be input as 2-D arrays, but within scatter they will be flattened. The exception is c, which will be flattened only if its size matches the size of x and y.

.. note:: In addition to the above described arguments, this function can take a data keyword argument. If such a data argument is given, the following arguments are replaced by data[]:

* All arguments with the following names: ‘c‘, ‘color‘, ‘edgecolors‘, ‘facecolor‘, ‘facecolors‘, ‘linewidths‘, ‘s‘, ‘x‘, ‘y‘.

In [ ]:

X

In [ ]:

y.max()

Docstring: a.max(axis=None, out=None, keepdims=False)

In [ ]:

X = X[y < 50]#去掉y>=50de
y = y[y < 50]
print(X.shape)
print(y.shape)

In [ ]:

plt.scatter(X,y)

多元线性回归

In [ ]:

X = boston.data
y = boston.target
X = X[y < 50]
y = y[y < 50]

In [ ]:

from sklearn.model_selection import train_test_split #载入数据切分工具
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2)#数据切分

In [ ]:

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()

Init signature: LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1) Docstring:
Ordinary least squares Linear Regression.

Parameters

fit_intercept : boolean, optional, default True whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).

normalize : boolean, optional, default False This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use :class:sklearn.preprocessing.StandardScaler before calling fiton an estimator with normalize=False.

copy_X : boolean, optional, default True If True, X will be copied; else, it may be overwritten.

n_jobs : int, optional, default 1 The number of jobs to use for the computation. If -1 all CPUs are used. This will only provide speedup for n_targets > 1 and sufficient large problems.

Attributes

coef_ : array, shape (n_features, ) or (n_targets, n_features) Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

intercept_ : array Independent term in the linear model.

Notes

From the implementation point of view, this is just plain Ordinary Least Squares (scipy.linalg.lstsq) wrapped as a predictor object. File: c:\users\qq123\anaconda3\lib\site-packages\sklearn\linear_model\base.py Type: ABCMeta

In [ ]:

lin_reg.fit(X_train,y_train)

Signature: lin_reg.fit(X, y, sample_weight=None) Docstring: Fit linear model.

Parameters

X : numpy array or sparse matrix of shape [n_samples,n_features] Training data

y : numpy array of shape [n_samples, n_targets] Target values. Will be cast to X‘s dtype if necessary

sample_weight : numpy array of shape [n_samples] Individual weights for each sample

.. versionadded:: 0.17
   parameter *sample_weight* support to LinearRegression.

Returns

self : returns an instance of self.

In [ ]:

lin_reg.coef_#系数

In [ ]:

lin_reg.intercept_#截距

In [ ]:

lin_reg.score(X_test,y_test)

In [ ]:

K近邻回归算法

In [ ]:

from sklearn.neighbors import KNeighborsRegressor #载入KNN分类器

In [ ]:

knn_reg = KNeighborsRegressor()# 设置分类器
knn_reg.fit(X_train,y_train)
knn_reg.score(X_test,y_test)

In [ ]:

from sklearn.model_selection import GridSearchCV
para_grid = [
    {
        ‘weights‘:[‘uniform‘],
        ‘n_neighbors‘:[i for i in range(1,11)]
    },
    {
        ‘weights‘:[‘distance‘],
        ‘n_neighbors‘:[i for i in range(1,11)],
        ‘p‘:[i for i in range(1,6)]
    }
]

In [ ]:

knn_reg_grid = KNeighborsRegressor(n_jobs = -1)
grid_search = GridSearchCV(knn_reg_grid,para_grid,verbose =1)
grid_search.fit(X_train,y_train)

In [ ]:

grid_search.best_estimator_

In [ ]:

grid_search.best_score_

In [ ]:

grid_search.best_estimator_.score(X_test,y_test)

参数权重排序

In [ ]:

lin_reg.coef_#参数

In [ ]:

np.argsort(lin_reg.coef_)

Signature: np.argsort(a, axis=-1, kind=‘quicksort‘, order=None) Docstring: Returns the indices that would sort an array.

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.

Parameters

a : array_like Array to sort. axis : int or None, optional Axis along which to sort. The default is -1 (the last axis). If None, the flattened array is used. kind : {‘quicksort‘, ‘mergesort‘, ‘heapsort‘}, optional Sorting algorithm. order : str or list of str, optional When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. A single field can be specified as a string, and not all fields need be specified, but unspecified fields will still be used, in the order in which they come up in the dtype, to break ties.

Returns

index_array : ndarray, int Array of indices that sort a along the specified axis. If a is one-dimensional, a[index_array] yields a sorted a.

See Also

sort : Describes sorting algorithms used. lexsort : Indirect stable sort with multiple keys. ndarray.sort : Inplace sort. argpartition : Indirect partial sort.

Notes

See sort for notes on the different sorting algorithms.

As of NumPy 1.4.0 argsort works with real/complex arrays containing nan values. The enhanced sort order is documented in sort.

Examples

One dimensional array:

x = np.array([3, 1, 2]) np.argsort(x) array([1, 2, 0])

Two-dimensional array:

x = np.array([[0, 3], [2, 2]]) x array([[0, 3], [2, 2]])

np.argsort(x, axis=0) # sorts along first axis (down) array([[0, 1], [1, 0]])

np.argsort(x, axis=1) # sorts along last axis (across) array([[0, 1], [0, 1]])

Indices of the sorted elements of a N-dimensional array:

ind = np.unravel_index(np.argsort(x, axis=None), x.shape) ind (array([0, 1, 1, 0]), array([0, 0, 1, 1])) x[ind] # same as np.sort(x, axis=None) array([0, 2, 2, 3])

Sorting with keys:

x = np.array([(1, 0), (0, 1)], dtype=[(‘x‘, ‘<i4‘), (‘y‘, ‘<i4‘)]) x array([(1, 0), (0, 1)], dtype=[(‘x‘, ‘<i4‘), (‘y‘, ‘<i4‘)])

np.argsort(x, order=(‘x‘,‘y‘)) array([1, 0])

np.argsort(x, order=(‘y‘,‘x‘)) array([0, 1])

In [ ]:

lin_reg.coef_[np.argsort(lin_reg.coef_)]#升序

In [ ]:

boston.feature_names

In [ ]:

boston.feature_names[np.argsort(lin_reg.coef_)]

原文地址:https://www.cnblogs.com/romannista/p/10735270.html

时间: 2024-10-10 10:42:09

萌新向Python数据分析及数据挖掘 第三章 机器学习常用算法 第二节 线性回归算法 (下)实操篇的相关文章

萌新向Python数据分析及数据挖掘 第一章 Python基础 第一节 python安装以及环境搭建 第二节 变量和简单的数据类型

本文将参考<Python编程 从入门到实践>的讲述顺序和例子,加上自己的理解,让大家快速了解Python的基础用法,并将拓展内容的链接添加在相关内容之后,方便大家阅读. 好了!我们开始第一章的学习. 第一章 Python基础 第一节 Python安装以及环境搭建 Python的安装和环境变量的配置通过百度查询即可解决,这里不作赘述. IDE的选择:因为后期需要用来做数据分析,所以直接安装Anaconda会是一个不错的选择. Anaconda详细安装使用教程 https://blog.csdn.

萌新向Python数据分析及数据挖掘 第一章 Python基础 (上)未排版

因word和博客编辑器格式不能完全对接,正在重新排版,2019年1月1日发出第一章完整版 本文将参考<Python编程 从入门到实践>的讲述顺序和例子,加上自己的理解,让大家快速了解Python的基础用法,并将拓展内容的链接添加在相关内容之后,方便大家阅读. 好了!我们开始第一章的学习. 第一章 Python基础 python安装以及环境搭建 python的安装和环境变量的配置通过百度查询即可解决,这里不作赘述. IDE的选择:因为后期需要用来做数据分析,所以直接安装Anaconda会是一个不

萌新向Python数据分析及数据挖掘 第一章 Python基础 第九节 类

第一章 Python基础 第九节 类 面向对象编程时,都会遇到一个概念,类,python也有这个概念,下面我们通过代码来深入了解下. 其实类 和函数都是为了节省代码,有了类的的概念,就可以把相同的代码写在父类,子类继承后就可以直接使用,而且通过选择对应的父类就可以直接使用对应父类的内容. 创建和使用类 1 class Dog(): #认识狗这类东西 2 def __init__(self, name, age): #狗是肯定有名字和年龄的,是个狗就有,要用户填写 3 self.name = na

萌新向Python数据分析及数据挖掘 第一章 Python基础 第八节 函数

第一章 Python基础 第八节 函数 定义函数 函数 其实就可以理解为外挂,把一些常用的.重复率比较多你又不想重复写的东西写进函数,加上开关实现简化操作 举个简单的例子 1 def greet_user(username): 2 #定义一个叫做"迎接用户"的外挂,让他能直接打印一个问候语,括号里面是函数需要输入的东西,也就是个性化的东西 3 """先是简单的问候语""" 4 print("Hello! "

萌新向Python数据分析及数据挖掘 第一章 Python基础 第十节 文件和异常

第一章 Python基础 第十节 文件和异常 从文件中读取数据 读取文件.文件路径   1 filename = 'pi_digits.txt' #文件名取个代号 2 #读取整个文件 3 with open(filename) as file_object: 4 contents = file_object.read()# 给内容取个代号 5 print(contents.rstrip()) 6 #逐行读取 7 with open(filename) as file_object: 8 for

萌新向Python数据分析及数据挖掘 第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks

Python Language Basics, IPython, and Jupyter Notebooks In [5]: import numpy as np #导入numpy np.random.seed(12345)#设定再现的的随机数 np.set_printoptions(precision=4, suppress=True) #设置打印设置 Signature: np.set_printoptions(precision=None, threshold=None, edgeitem

萌新向Python数据分析及数据挖掘 第二章 pandas 第五节 Getting Started with pandas

Getting Started with pandas In [1]: import pandas as pd In [2]: from pandas import Series, DataFrame In [3]: import numpy as np np.random.seed(12345) import matplotlib.pyplot as plt plt.rc('figure', figsize=(10, 6)) PREVIOUS_MAX_ROWS = pd.options.dis

Python数据分析与挖掘所需的Pandas常用知识

Python数据分析与挖掘所需的Pandas常用知识 前言Pandas基于两种数据类型:series与dataframe.一个series是一个一维的数据类型,其中每一个元素都有一个标签.series类似于Numpy中元素带标签的数组.其中,标签可以是数字或者字符串.一个dataframe是一个二维的表结构.Pandas的dataframe可以存储许多种不同的数据类型,并且每一个坐标轴都有自己的标签.你可以把它想象成一个series的字典项. Pandas常用知识 一.读取csv文件为dataf

全体快三源码开发Python数据分析与挖掘所需的Pandas常用知识

前言 全体快三源码开发 (http://www.1159880099.com) QQ1159880099 Pandas基于两种数据类型:series与dataframe. 一个series是一个一维的数据类型,其中每一个元素都有一个标签.series类似于Numpy中元素带标签的数组.其中,标签可以是数字或者字符串. 一个dataframe是一个二维的表结构.Pandas的dataframe可以存储许多种不同的数据类型,并且每一个坐标轴都有自己的标签.你可以把它想象成一个series的字典项.