cluster analysis in data mining

https://en.wikipedia.org/wiki/K-means_clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm^{[citation needed]}.

时间： 2024-10-13 06:03:18

cluster analysis in data mining的相关文章

Cluster analysis

https://en.wikipedia.org/wiki/Cluster_analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to t

A web crawler design for data mining

Abstract The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pa

data mining，machine learning，AI，data science，data science，business analytics

数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系? 本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答不出来,我在知乎和博客上查了查这个问题,发现还没有人写过比较详细和有说服力的对比

Introduction - Notes of Data Mining

Introduction @(Pattern Discovery in Data Mining)[Data Mining, Notes] Jiawei Han的Pattern Discovery课程笔记 Why data mining? data explosion and abundant(but unstructured) data everywhere drowning in data but starving in knowledge keyword: interdisciplinary

Data Mining Note

Week 1 Reading: Han Chapter 1~3 Overview Data mining: Automatic knowledge discovery from data (KDD). Data warehousing: Efficient data analysis Data warehouse: a repository of multiple heterogeneous data sources organized under a unified schema at a s

Weka 3: Data Mining Software in Java

官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客: WEKA使用教程(经典教程转载) Weka初步一.二.三.四使用Weka进行数据挖掘一个小时速度入门数据挖掘WEKA(一个完整的小例子) 百度文库: WEKA中文详细教程(全) WEKA 3-5-3 Experimenter 指南数据挖掘工具(weka教程)

Datasets for Data Mining and Data Science

From kdnuggets Data repositories AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Bioassay da

Big Data Analytics and Data Mining 第一天.

今天是上课的第一天.真心很感激导师能让我出来学习.今天突然觉得自己要好好学习英语.并不是上课的时候我看不懂裴教授的课件.而是觉得如果英语不好就很像乡巴佬那样,很难接触到高级的东西. 通过今天的听讲,我感觉对数据挖掘的理解更深刻些. 以前总觉得自己研究生的目标是要好好学习算法,好好学习相关的技术. 现在觉得除了要好好学习算法外,我也期待自己能做出一些研究. 记录下今天讲课的内容. 今天我觉得主要讲了三部分: 1,数据挖掘相关的概念及相关的学术期刊. 从广义上来定义数据挖掘:The art of d

搭建Data Mining环境（Spark版本）

前言:工欲善其事,必先利其器.倘若不懂得构建一套大数据挖掘环境,何来谈Data Mining!何来领悟“Data Mining Engineer”中的工程二字!也仅仅是在做数据分析相关的事罢了!此文来自于笔者在实践项目开发中的记录,真心希望日后成为所有进入大数据领域挖掘工程师们的良心参考资料.下面是它的一些说明: 它是部署在Windows环境,在项目的实践开发过程中,你将通过它去完成与集群的交互,测试和发布: 你可以部署成使用MapReduce框架,而本文主要优先采用Spark版本: 于你而言,

猜你喜欢

vs 使用git管理bin obj 去除版本控制

VS使用Git时,如何忽略不想上传的文件在.net开发中,有很多文件是不希望上传,加入协助开发中,例如生成在的bin/Debug, bin/Release文件等. 在代码目录下建立.gitignor ...

几道简单容易被问的题

1. ArrayList,Vector,LinkList区别?2. HashMap和Hashtable的区别?3. String类为什么要重写hashCode和equals方法?如果重写equals方 ...

windows使用asmcmd报'perl.exe' 不是内部或外部命令

windows使用asmcmd时需要设置ORACLE_SID和ORACLE_HOME,设置ORACLE_HOME时路径不需要加引号'' C:\Users\Administrator>asmcmd ...

Good Bye 2015 C. New Year and Domino 二维前缀

C. New Year and Domino They say "years are like dominoes, tumbling one after the other". B ...

PageRank算法（python实现）

Python 实现的PageRank算法,纯粹使用python原生模块,没有使用numpy.scipy.这个程序实现还比较原始,可优化的地方较多. #-*- coding:utf-8 -*- impo ...

A Tour of Go Exercise: Loops and Functions

As a simple way to play with functions and loops, implement the square root function using Newton's ...

[Linux] sizeof 小记

1. sizeof 是关键字,不是函数,所以不要以函数的眼光去看待它. 以下是正确的: int i = 10; printf("%d\n", sizeof i); printf(& ...

mybatis分页方式对比

mybatis有两种分页方法(转自:http://blog.csdn.net/leozhou13/article/details/50394242) 1.内存分页,也就是假分页.本质是查出所有的数据然 ...

关于构造函数和this调用的思考

文中一系列思考和内容引发自以下问题:我需要在一个类的构造函数中调用另一个对象的构造函数,并使用this初始化其中的一个引用成员. 主要遇到的问题: 1. 构造函数的初始化列表中能访问this吗? 很明 ...

Django之CSRF以及CBV补充

1.CSRF a.基本应用 form表单中添加 {% csrf_token %} b.全栈禁用 # 'django.middleware.csrf.CsrfViewMiddleware', c. 局部 ...

多重背景&过渡

背景图片的调节,可以直接用属性调整背景图片的大小:background-size:contain是宽高缩放,直到某一边到底为止 background-size:cover是缩放图片,有可能某一边会出盒 ...

刚刚过凌晨

按照时间来说,引来了新的一天,在床上学着几乎崩溃的nodejs知识,有点后悔有空时间的无作为,从明天开始,希望真的要努力了,就希望专研的前端来说,自己html+css+js希望尽快攻破,如果有不会的希 ...

说说我的2015（做了一年多的程序媛）

14年底我开始了我的实习生涯,15年6月底我毕业成了一名正式的员工,确切的说是一名程序媛,也是15年的六月底,男朋友辞了北京的工作,回家创业去了,可笑的是都毕业了我依然谈着我的异地恋,很美,但也很累. ...

RHEL搭建LAMP所用到的安装包

apache: #yum -y install httpd httpd-devel mysql: #yum -y install mysql mysql-server mysql-devel php: ...

CSS layout入门

元素与盒在HTML中常常使用的概念是元素,而在CSS中,布局的基本单位是盒,盒总是矩形的. 元素与盒并非一一对应的关系,一个元素可能生成多个盒,CSS规则中的伪元素也可能生成盒,display属性为 ...

Quick_Cocos2d_x V3.3 Protobuf Android

ios集成protobuf之后,调用pro.android/build_native.sh 生成android工程的时候会出现 jni/../../Classes/AppDelegate.cpp:12 ...

手把手教你搭建caffe及手写数字识别（全程命令提示、纯小白教程）

手把手教你搭建caffe及手写数字识别作者:七月在线课程助教团队,骁哲.小蔡.李伟.July时间:二零一六年十一月九日交流:深度学习实战交流Q群 472899334,有问题可以加此群共同交流.另探究 ...

乾隆盛世，居然是“饥饿的盛世”？

乾隆盛世,居然是“饥饿的盛世”? http://zhidao.baidu.com/daily/view?id=10549 1793年,也就是乾隆五十八年夏天,英国派出的第一个访华使团到达中国. 英国人 ...

转数据库分库分表(sharding)系列(二) 全局主键生成策略

本文将主要介绍一些常见的全局主键生成策略,然后重点介绍flickr使用的一种非常优秀的全局主键生成方案.关于分库分表(sharding)的拆分策略和实施细则,请参考该系列的前一篇文章:数据库分库分表( ...

Codeforces 446B DZY Loves Modification 矩阵行列分开考虑优先队列+构造

题目链接:点击打开链接题意: 给定n行m列的矩阵 k次操作,一个常数p ans = 0; 对于每次操作可以任选一行或一列, 则ans += 这行(列)的数字和然后这行(列)上的每个数字都-=p ...

专题

随机推荐

© 2024 憋错料 | info#biecuoliao.com | 10 q. 0.022 s.