scikit-learn包的学习资料

http://scikit-learn.org/stable/modules/clustering.html#k-means

http://my.oschina.net/u/175377/blog/84420

K-Means clustering参数说明：

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

class sklearn.cluster.KMeans(n_clusters=8, init=‘k-means++‘, n_init=10, max_iter=300, tol=0.0001,precompute_distances=‘auto‘, verbose=0, random_state=None, copy_x=True, n_jobs=1)

n_clusters : int, optional, default: 8

The number of clusters to form as well as the number of centroids to generate.

max_iter : int, default: 300

Maximum number of iterations of the k-means algorithm for a single run.

n_init : int, default: 10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

init : {‘k-means++’, ‘random’ or an ndarray}

Method for initialization, defaults to ‘k-means++’:

‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

‘random’: choose k observations (rows) at random from data for the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

precompute_distances : {‘auto’, True, False}

Precompute distances (faster but takes more memory).

‘auto’ : do not precompute distances if n_samples * n_clusters > 12 million. This corresponds to about 100MB overhead per job using double precision.

True : always precompute distances

False : never precompute distances

tol : float, default: 1e-4

Relative tolerance with regards to inertia to declare convergence

n_jobs : int

The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

random_state : integer or numpy.RandomState, optional

The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

verbose : int, default 0

Verbosity mode.

copy_x : boolean, default True When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True, then the original data is not modified. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean.
cluster_centers_ : array, [n_clusters, n_features] Coordinates of cluster centers labels_ : : Labels of each point inertia_ : float Sum of distances of samples to their closest cluster center.

copy_x : boolean, default True

When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True, then the original data is not modified. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean.

cluster_centers_ : array, [n_clusters, n_features]

Coordinates of cluster centers

labels_ : :

Labels of each point

inertia_ : float

Sum of distances of samples to their closest cluster center.

时间： 2024-08-07 23:10:06

scikit-learn包的学习资料的相关文章

Query意图分析：记一次完整的机器学习过程（scikit learn library学习笔记）

所谓学习问题,是指观察由n个样本组成的集合,并根据这些数据来预测未知数据的性质. 学习任务(一个二分类问题): 区分一个普通的互联网检索Query是否具有某个垂直领域的意图.假设现在有一个O2O领域的垂直搜索引擎,专门为用户提供团购.优惠券的检索:同时存在一个通用的搜索引擎,比如百度,通用搜索引擎希望能够识别出一个Query是否具有O2O检索意图,如果有则调用O2O垂直搜索引擎,获取结果作为通用搜索引擎的结果补充. 我们的目的是学习出一个分类器(classifier),分类器可以理解为一个函数,

Python之扩展包安装（scikit learn）

scikit learn 是Python下开源的机器学习包.(安装环境:win7.0 32bit和Python2.7) Python安装第三方扩展包较为方便的方法:easy_install + packages name 在官网 https://pypi.python.org/pypi/setuptools/#windows-simplified 下载名字为的文件. 在命令行窗口运行 ,安装后,可在python2.7文件夹下生成Scripts文件夹.把路径D:\Python27\Scripts

【转】机器学习最佳入门学习资料汇总

机器学习最佳入门学习资料汇总专为机器学习初学者推荐的优质学习资源,帮助初学者快速入门. 这篇文章的确很难写,因为我希望它真正地对初学者有帮助.面前放着一张空白的纸,我坐下来问自己一个难题:面对一个对机器学习领域完全陌生的初学者,我该推荐哪些最适合的库,教程,论文及书籍帮助他们入门? 资源的取舍很让人纠结,我不得不努力从一个机器学习的程序员和初学者的角度去思考哪些资源才是最适合他们的. 我为每种类型的资源选出了其中最佳的学习资料.如果你是一个真正的初学者,并且有兴趣开始机器学习领域的学习,我希望

机器学习最佳入门学习资料汇总（转）

这篇文章的确很难写,因为我希望它真正地对初学者有帮助.面前放着一张空白的纸,我坐下来问自己一个难题:面对一个对机器学习领域完全陌生的初学者,我该推荐哪些最适合的库,教程,论文及书籍帮助他们入门? 资源的取舍很让人纠结,我不得不努力从一个机器学习的程序员和初学者的角度去思考哪些资源才是最适合他们的. 我为每种类型的资源选出了其中最佳的学习资料.如果你是一个真正的初学者,并且有兴趣开始机器学习领域的学习,我希望你能在其中找到有用的东西.我的建议是,选取其中一项资源,一本书,或者一个库,从头到尾的读一

机器学习最佳入门学习资料汇总

来自http://article.yeeyan.org/view/22139/410514 这篇文章的确很难写,因为我希望它真正地对初学者有帮助.面前放着一张空白的纸,我坐下来问自己一个难题:面对一个对机器学习领域完全陌生的初学者,我该推荐哪些最适合的库,教程,论文及书籍帮助他们入门? 资源的取舍很让人纠结,我不得不努力从一个机器学习的程序员和初学者的角度去思考哪些资源才是最适合他们的. 我为每种类型的资源选出了其中最佳的学习资料.如果你是一个真正的初学者,并且有兴趣开始机器学习领域的学习,我希

优秀游戏程序员学习资料推荐

这两天给单位的技术做的一次学习材料推荐培训,直接ppt上拷过来的. 优秀游戏程序员学习资料推荐主讲人:臧旭前言今天提到的纯粹是我个人心得和理解,可能片面,可能以偏概全. 目的是给大家做一定的指引作用,想让大家知道自己还有哪些可以去学习,还有哪些不足,我们距离优秀还有多远. 对我今天提到的东西,如果大家有时间,一定要去深入了解,在技术的道路上才有可能看得远.走得稳.飞得高. 另外有一句对所有技术人员想说的话: 学无止境.切忌坐井观天.有一点小小的成就就沾沾自喜.止足不前. 扎实的基础万丈高

Python学习之路——强力推荐的Python学习资料

资料一:程序媛想事儿(Alexia)总结 Python是一种面向对象.直译式计算机程序设计语言.它的语法简捷和清晰,尽量使用无异义的英语单词,与其它大多数程序设计语言使用大括号不一样,它使用縮进来定义语句块.与Scheme.Ruby.Perl.Tcl等动态语言一样,Python具备垃圾回收功能,能够自动管理内存使用.它经常被当作脚本语言用于处理系统管理任务和网络程序编写,然而它也非常适合完成各种高级任务. Python上手虽然容易,但与其它任何语言一样要学好Python并非一日之功.我的Pyth

【汇总】前端技术及学习资料汇总

[一些前端框架和库] 1. backbone.js (前端MVC框架,实际针对Web就是 MVR. (Model, View, Router路由),backbone依赖underscore.js库.) a) Backbone.js(1.1.2) API中文文档:http://www.css88.com/doc/backbone/ b) Backbone源码分析-Backbone架构+流程图:http://www.cnblogs.com/nuysoft/archive/2012/03/19/240

nodejs学习资料

NodeJS基础什么是NodeJS JS是脚本语言,脚本语言都需要一个解析器才能运行.对于写在HTML页面里的JS,浏览器充当了解析器的角色.而对于需要独立运行的JS,NodeJS就是一个解析器. 每一种解析器都是一个运行环境,不但允许JS定义各种数据结构,进行各种计算,还允许JS使用运行环境提供的内置对象和方法做一些事情.例如运行在浏览器中的JS的用途是操作DOM,浏览器就提供了document之类的内置对象.而运行在NodeJS中的JS的用途是操作磁盘文件或搭建HTTP服务器,NodeJS