Python Machine Learning

Chapter 3:A Tour of Machine Learning Classifiers Using Scikit-learn

3.1：Training a perceptron via scikit-learn

from sklearn import datasets
import numpy as np
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
np.unique(y)

from sklearn.cross_validation import train_test_split
#从150个样本中随机抽取30%的样本作为test_data
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=0)

#数据归一化
#StandardScaler estimated the parameters μ(sample mean) and (standard deviation)
#(x - mean)/(standard deviation)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

#Perceptron分类
#eta0 is equivalent to the learning rate
from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)
ppn.fit(X_train_std, y_train)

y_pred = ppn.predict(X_test_std)
#y_test != y_pred
‘‘‘
array([False, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False,  True,
       False,  True, False, False, False, False, False, False, False])

‘‘‘
print(‘Misclassified samples: %d‘ % (y_test != y_pred).sum())
#Misclassified samples: 4
#Thus, the misclassification error on the test dataset is 0.089 or 8.9 percent (4/45)

#the metrics module:performance metrics
from sklearn.metrics import accuracy_score
print(‘Accuracy: %.2f‘ % accuracy_score(y_test, y_pred))
#Accuracy:0.91

#plot_decision_regions:visualize how well it separates the different flower samples
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt

def plot_decision_regions(X, y, classifier, test_idx=None, resolution=0.02):
      #setup marker generator and color map
      markers = (‘s‘, ‘x‘, ‘o‘, ‘^‘, ‘v‘)
      colors = (‘red‘, ‘blue‘, ‘lightgreen‘, ‘black‘, ‘cyan‘)
      cmap = ListedColormap(colors[:len(np.unique(y))])

      # plot the decision surface
      x1_min, x1_max = X[:, 0].min() -1, X[:, 0].max() + 1
      x2_min, x2_max = X[:, 1].min() -1, X[:, 1].max() + 1
      xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
      Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
      Z = Z.reshape(xx1.shape)
      plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
      plt.xlim(xx1.min(), xx1.max())
      plt.ylim(xx2.min(), xx2.max())

      # plot all samples
      for idx, c1 in enumerate(np.unique(y)):
            print idx,c1
            plt.scatter(x=X[y == c1, 0], y=X[y == c1, 1], alpha=0.8, c=cmap(idx),marker=markers[idx],label=c1)

      #highlight test samples
      if test_idx :
            X_test, y_test = X[test_idx, :], y[test_idx]

            #把 corlor 设置为空，通过edgecolors来控制颜色
            plt.scatter(X_test[:, 0],X_test[:, 1], color=‘‘,edgecolors=‘black‘, alpha=1.0, linewidths=2, marker=‘o‘,s=150, label=‘test set‘)

X_combined_std = np.vstack((X_train_std, X_test_std))
y_combined = np.hstack((y_train, y_test))
plot_decision_regions(X=X_combined_std, y=y_combined, classifier=ppn, test_idx=range(105,150))
plt.xlabel(‘petal length [standardized]‘)
plt.ylabel(‘petal width [standardized]‘)
plt.legend(loc=‘upper left‘)
plt.show()

原文地址：https://www.cnblogs.com/always-fight/p/9134802.html

时间： 2024-11-07 01:18:18

Python Machine Learning的相关文章

Python -- machine learning， neural network -- PyBrain 机器学习神经网络

I am using pybrain on my Linuxmint 13 x86_64 PC. As what it is described: PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of p

Python Machine Learning 中文版

Python机器学习机器学习,如今最令人振奋的计算机领域之一.看看那些大公司,Google.Facebook.Apple.Amazon早已展开了一场关于机器学习的军备竞赛.从手机上的语音助手.垃圾邮件过滤到逛淘宝时的物品推荐,无一不用到机器学习技术. 如果你对机器学习感兴趣,甚至是想从事相关职业,那么这本书非常适合作为你的第一本机器学习资料.市面上大部分的机器学习书籍要么是告诉你如何推导模型公式要么就是如何代码实现模型算法,这对于零基础的新手来说,阅读起来相当困难.而这本书,在介绍必要的基础概

机器学习【1】（Python Machine Learning读书笔记）

依旧是作为读书笔记发布,不涉及太多代码和工具,作为了解性文章来介绍机器学习. 文章主要分为两个部分,machine learning的概述和 scikit-learn的简单介绍,两部分关系紧密,合并书写,以致整体篇幅较长,分为1.2两篇. 首先,是关于机器学习.要点如下: 1.1 机器学习三种主要方式 1.2 四大过程 1.3 python相关实现(安装包) [1.1] 机器学习方式主要有三大类:supervised learning(监督式学习), unsupervised learning(

[Python & Machine Learning] 学习笔记之scikit-learn机器学习库

1. scikit-learn介绍 scikit-learn是Python的一个开源机器学习模块,它建立在NumPy,SciPy和matplotlib模块之上.值得一提的是,scikit-learn最先是由David Cournapeau在2007年发起的一个Google Summer of Code项目,从那时起这个项目就已经拥有很多的贡献者了,而且该项目目前为止也是由一个志愿者团队在维护着. scikit-learn最大的特点就是,为用户提供各种机器学习算法接口,可以让用户简单.高效地进行数

Getting started with machine learning in Python

Getting started with machine learning in Python Machine learning is a field that uses algorithms to learn from data and make predictions. Practically, this means that we can feed data into an algorithm, and use it to make predictions about what might

Awesome Machine Learning

Awesome Machine Learning A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. If you want to contribute to this list (please do), send me a pull request or contact me @josephmisiti Als

Decision Boundaries for Deep Learning and other Machine Learning classifiers

Decision Boundaries for Deep Learning and other Machine Learning classifiers H2O, one of the leading deep learning framework in python, is now available in R. We will show how to get started with H2O, its working, plotting of decision boundaries and

机器学习(Machine Learning)&深度学习(Deep Learning)资料

机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.Deep Learning. <Deep Learning in Neural Networks: An Overview> 介绍:这是瑞士人工智能实验室Jurgen Schmidhuber写的最新版本

机器学习(Machine Learning)&深入学习(Deep Learning)资料

<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林.Deep Learning. <Deep Learning in Neural Networks: An Overview> 介绍:这是瑞士人工智能实验室 Jurgen Schmidhuber 写的最新版本<神经网络与深度学习综述>本综述的特点是以时间排序,从 1940 年开始讲起,到