BK: Data mining

data ------> knowledge

Are all patterns interesting?

No. only a small fraction of the patterns potentially generated would actually be of interest to a given user.

What makes a pattern interesting?

  • easily understood by humans
  • valid
  • potentially useful
  • novel
  • An interesting pattern represents knowledge.

Can a data mining system generate all of the interesting patterns?

It is often unrealistic and inefficient for data mining systems to generate all possible pattern.

1.7 Major issue in data mining

major issues:

  1. mining methodology
  2. user interaction
  3. efficiency and scalability可扩展性
  4. diversity of data types
  5. data mining and society

原文地址:https://www.cnblogs.com/dulun/p/12254532.html

时间: 2024-11-07 20:42:13

BK: Data mining的相关文章

BK: Data mining, Chapter 2 - getting to know your data

Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of heterogeneous sources. mean; median; mode(most common value); distribution; Knowing such basic statistics regarding each attribute makes it easier to

A web crawler design for data mining

Abstract The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pa

Weka 3: Data Mining Software in Java

官方网站: Weka 3: Data Mining Software in Java   相关使用方法博客: WEKA使用教程(经典教程转载) Weka初步一.二.三.四 使用Weka进行数据挖掘 一个小时速度入门数据挖掘WEKA(一个完整的小例子)   百度文库: WEKA中文详细教程(全) WEKA 3-5-3 Experimenter 指南 数据挖掘工具(weka教程)

Datasets for Data Mining and Data Science

From kdnuggets Data repositories AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. BigML big list of public data sources. Bioassay da

Big Data Analytics and Data Mining 第一天.

今天是上课的第一天.真心很感激导师能让我出来学习.今天突然觉得自己要好好学习英语.并不是上课的时候我看不懂裴教授的课件.而是觉得如果英语不好就很像乡巴佬那样,很难接触到高级的东西. 通过今天的听讲,我感觉对数据挖掘的理解更深刻些. 以前总觉得自己研究生的目标是要好好学习算法,好好学习相关的技术. 现在觉得除了要好好学习算法外,我也期待自己能做出一些研究. 记录下今天讲课的内容. 今天我觉得主要讲了三部分: 1,数据挖掘相关的概念及相关的学术期刊. 从广义上来定义数据挖掘:The art of d

搭建Data Mining环境(Spark版本)

前言:工欲善其事,必先利其器.倘若不懂得构建一套大数据挖掘环境,何来谈Data Mining!何来领悟“Data Mining Engineer”中的工程二字!也仅仅是在做数据分析相关的事罢了!此文来自于笔者在实践项目开发中的记录,真心希望日后成为所有进入大数据领域挖掘工程师们的良心参考资料.下面是它的一些说明: 它是部署在Windows环境,在项目的实践开发过程中,你将通过它去完成与集群的交互,测试和发布: 你可以部署成使用MapReduce框架,而本文主要优先采用Spark版本: 于你而言,

data mining,machine learning,AI,data science,data science,business analytics

数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系? 本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答不出来,我在知乎和博客上查了查这个问题,发现还没有人写过比较详细和有说服力的对比

Introduction to Data Mining

(此文为学习笔记,课程来自Bigdata University:http://bigdatauniversity.com.cn/courses/BigDataUniversity/PA0101/2016_06/courseware/c4323451afcd4b05946917efc8fc86f5/be5f0606db034b559b014e87ab62e418/) Why we do data mining? Market Context. Analytics Drive Decision-Ma

cluster analysis in data mining

https://en.wikipedia.org/wiki/K-means_clustering k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k