python 随机分类

#encoding:utf-8import pandas as pdimport numpy as npfrom sklearn import datasets,linear_modelfrom sklearn.metrics import roc_curve,aucimport pylab as plfrom matplotlib.pyplot import plot

def confusionMatrix(predicted,actual,threshold):    if len(predicted)!=len(actual):return -1    tp = 0.0    fp = 0.0    tn = 0.0    fn = 0.0    for i in range(len(actual)):        if actual[i]>0.5:            if predicted[i]>threshold:                tp += 1.0            else:                fn += 1.0        else:            if predicted[i]<threshold:                tn += 1.0            else:                fp += 1.0    rtn = [fp,fn,fp,tn]    return rtn#获取数据rockdata = open(‘sonar.all-data‘)xList = []labels = []#将标签转换成数值,M转换成1.0,R转换为0.0for line in rockdata:    row = line.strip().split(",")    if(row[-1] ==‘M‘):        labels.append(1.0)    else:        labels.append(0.0)    row.pop()    floatRow = [float(num) for num in row]    xList.append(floatRow)print labels#获取数据的行数,通过对3的求余,将数据划分为2个子集,1/3的测试集,2/3的训练集indices = range(len(xList))xListTest = [xList[i] for i in indices if i%3==0]xListTrain = [xList[i] for i in indices if i%3!=0]labelsTest = [labels[i] for i in indices if i%3==0]labelsTrain = [labels[i] for i in indices if i%3!=0]#将列表转换成数组xTrain = np.array(xListTrain)yTrain = np.array(labelsTrain)xTest = np.array(xListTest)yTest = np.array(labelsTest)#预测模型rocksVMinesModel = linear_model.LinearRegression()#训练数据rocksVMinesModel.fit(xTrain,yTrain)# 预测训练数据trainingPredictions = rocksVMinesModel.predict(xTrain)print ("---------",trainingPredictions[0:5],trainingPredictions[-6:-1])#生成训练数据的混淆矩阵confusionMatTrain = confusionMatrix(trainingPredictions,yTrain,0.5)print confusionMatTrain#预测测试数据testPredictions = rocksVMinesModel.predict(xTest)#生成测试数据的混淆矩阵confusionTest = confusionMatrix(testPredictions,yTest,0.5)print confusionTest#通过roc_curve函数计算fpt,tpr,并计算roc_auc,AUC越高代表越好fpr,tpr,thresholds = roc_curve(yTrain,trainingPredictions)roc_auc = auc(fpr,tpr)print roc_auc#生成训练集上的ROC曲线#plot roc curvepl.clf()#清楚图形,初始化图形的时候需要pl.plot(fpr,tpr,label=‘ROC curve (area=%0.2f)‘ %roc_auc)#画ROC曲线pl.plot([0,1],[0,1],‘k-‘)#生成对角线pl.xlim([0.0,1.0])#X轴范围pl.ylim([0.0,1.0])#Y轴范围pl.xlabel(‘False Positive Rate‘)#X轴标签显示pl.ylabel(‘True Positive Rate‘)#Y轴标签显示pl.title(‘In sample ROC rocks versus mines‘)#标题pl.legend(loc="lower left")#图例位置pl.show()

#生成测试集上的ROC曲线fpr,tpr,thresholds = roc_curve(yTest,testPredictions)roc_auc = auc(fpr,tpr)print roc_auc#plot roc curvepl.clf()pl.plot(fpr,tpr,label=‘ROC curve (area=%0.2f)‘ %roc_auc)pl.plot([0,1],[0,1],‘k-‘)pl.xlim([0.0,1.0])pl.ylim([0.0,1.0])pl.xlabel(‘False Positive Rate‘)pl.ylabel(‘True Positive Rate‘)pl.title(‘In sample ROC rocks versus mines‘)pl.legend(loc="lower right")pl.show()

训练集上的ROC曲线

测试集上的ROC曲线

时间： 2024-11-05 06:25:00

python 随机分类的相关文章

Python随机播放电脑里的音乐

就是找到硬盘中所有的MP3文件和wma文件,再随机打开其中的一个. import os,random disk=['D','E','F','G','H'] def search_file(filename,search_path,pathsep=os.pathsep): for path in search_path.split(pathsep): candidate = os.path.join(path,filename) if os.path.isfile(candidate): retu

用python随机生成数据，再插入到postgresql中

用python随机生成学生姓名,三科成绩和班级数据,再插入到postgresql中. 模块用psycopg2 random import random import psycopg2 fname=['金','赵','李','陈','许','龙','王','高','张','侯','艾','钱','孙','周','郑'] mname=['玉','明','玲','淑','偑','艳','大','小','风','雨','雪','天','水','奇','鲸','米','晓','泽','恩','葛','玄'

13、Selenium+python+API分类总结

Selenium+python+API分类总结 http://selenium-python.readthedocs.org/index.html 分类方法方法描述客户端操作 __init__(self, host, port, browserStartCommand, browserURL) 构造函数.host:selenium server的ip:port:elenium server的port,默认为4444:browserStartCommand:浏览器类型,iexplore,fi

Python随机生成验证码的两种方法

Python随机生成验证码的方法有很多,今天给大家列举两种,大家也可以在这个基础上进行改造,设计出适合自己的验证码方法方法一:利用range Python随机生成验证码的方法有很多,今天给大家列举两种,大家也可以在这个基础上进行改造,设计出适合自己的验证码方法方法一: 利用range方法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # -*- coding: utf-8 -*- import random def generate_verification_c

python随机生成个人信息

python随机生成个人信息 #!/usr/bin/env python3 # -*- coding:utf-8 -*- import random import sys, pymysql import threading'''PyMySQL==0.9.3''' class PersonalInformation(): # 生成姓名 def Names_of_generated(self): list_Xing = [ '赵', '钱', '孙', '李', '周', '吴', '郑', '王'

spark机器学习笔记：（五）用Spark Python构建分类模型（下）

声明:版权所有,转载请联系作者并注明出处 http://blog.csdn.net/u013719780?viewmode=contents 博主简介:风雪夜归子(英文名:Allen),机器学习算法攻城狮,喜爱钻研Meachine Learning的黑科技,对Deep Learning和Artificial Intelligence充满兴趣,经常关注Kaggle数据挖掘竞赛平台,对数据.Machine Learning和Artificial Intelligence有兴趣的童鞋可以一起探讨哦,

【机器学习实验】学习Python来分类现实世界的数据

引入一个机器能够依据照片来辨别鲜花的品种吗?在机器学习角度,这事实上是一个分类问题.即机器依据不同品种鲜花的数据进行学习.使其能够对未标记的測试图片数据进行分类. 这一小节.我们还是从scikit-learn出发,理解主要的分类原则,多动手实践. Iris数据集 Iris flower数据集是1936年由Sir Ronald Fisher引入的经典多维数据集.能够作为判别分析(discriminant analysis)的样本.该数据集包括Iris花的三个品种(Iris setosa, Iri

【scikit-learn】学习Python来分类现实世界的数据

引入一个机器可以根据照片来辨别鲜花的品种吗?在机器学习角度,这其实是一个分类问题,即机器根据不同品种鲜花的数据进行学习,使其可以对未标记的测试图片数据进行分类. 这一小节,我们还是从scikit-learn出发,理解基本的分类原则,多动手实践. Iris数据集 Iris flower数据集是1936年由Sir Ronald Fisher引入的经典多维数据集,可以作为判别分析(discriminant analysis)的样本.该数据集包含Iris花的三个品种(Iris setosa, Iris

python入门-分类和回归各种初级算法

引自:http://www.cnblogs.com/taichu/p/5251332.html ########################### #说明: # 撰写本文的原因是,笔者在研究博文"http://python.jobbole.com/83563/"中发现 # 原内容有少量笔误,并且对入门学友缺少一些信息.于是笔者做了增补,主要有: # 1.查询并简述了涉及的大部分算法: # 2.添加了连接或资源供进一步查询: # 3.增加了一些lib库的基本操作及说明: # 4.增加