Constraint-Based Pattern Mining

在数据挖掘中,如何进行有约束地挖掘,如何对待挖掘数据进行条件约束与筛选,是本文探讨的话题。

Why do we use constraint-based pattern mining? Because we’d like to apply different pruning methods to constrain pattern mining process.

And for those reasons:

  • Finding all the patterns in a dataset autonomously? — unrealistic!

    • Too many patterns but not necessarily user-interested!
  • Pattern mining should be an interactive process
    • User directs what to be mined using a data mining query language (or a graphical user interface)
  • Constraint-based mining
    • User flexibility: provides constraints on what to be mined
    • Optimization: explores such constraints for efficient mining
      • Constraint-based mining: Constraint-pushing, similar to push selection first in DB query processing

Constraints in General Data Mining

A data mining query can be in the form of a meta-rule or with the following language primitives

* Knowledge type constraint:

* Ex.: classification, association, clustering, outlier finding, ….

* Data constraint — using SQL-like queries

* Ex.: find products sold together in NY stores this year

* Dimension/level constraint

* Ex.: in relevance to region, price, brand, customer category

* Rule (or pattern) constraint

* Ex.: small sales (price < $10) triggers big sales (sum > $200)

* Interestingness constraint

* Ex.: strong rules: min_sup ≤ 0.02, min_conf ≥ 0.6, min_correlation ≥ 0.7

Different Kinds of Constraints: Different Pruning Methods

  • Constraints can be categorized as

    • Pattern space pruning constraints vs. data space pruning constraints
  • Pattern space pruning constraints
    • Anti-monotonic: If constraint c is violated, its further mining can be terminated
    • Monotonic: If c is satisfied, no need to check c again
    • Succinct: if the constraint c can be enforced by directly manipulating the data
    • Convertible: c can be converted to monotonic or anti-monotonic if items can be properly ordered in processing
  • Data space pruning constraints
    • Data succinct: Data space can be pruned at the initial pattern mining process
    • Data anti-monotonic: If a transaction t does not satisfy c, then t can be pruned to reduce data processing effort.

Pattern Anti-monotonicity

这里range(S.profit)指的是max-min

这里因为随着item的增多,itemset S的support会逐渐减小,所以ex4的答案是yes

Pattern Monotonicity

Data Anti-monotonicity

Succinct Constraints

Convertible Constraints

这里,我们将transaction里面的item进行递减或递增排序,此时就可以将constraint转化为monotone或anti-monotone.

参考上面的说明,即可得出结论:这里我们会选择一个T中的一个或几个item,此时item是有顺序的。

注意我们都将按照right order进行pattern generation

时间: 2024-10-03 23:15:56

Constraint-Based Pattern Mining的相关文章

Efficient Pattern Mining Methods

Efficient Pattern Mining Methods @(Pattern Discovery in Data Mining) 本文介绍了几个模式挖掘的高效算法.主要以Apriori思想为框架,主要讲解了FP-Growth算法. The Downward Closure Property of Frequent Patterns Property The downward closure (also called "Apriori") property of frequent

Spark FPGrowth (Frequent Pattern Mining)

给定交易数据集,FP增长的第一步是计算项目频率并识别频繁项目.与为同样目的设计的类似Apriori的算法不同,FP增长的第二步使用后缀树(FP-tree)结构来编码事务,而不会显式生成候选集,生成的代价通常很高.第二步之后,可以从FP树中提取频繁项集. import org.apache.spark.sql.SparkSession import org.apache.spark.mllib.fpm.FPGrowth import org.apache.spark.rdd.RDD val spa

数据挖掘文章翻译--Mining Emerging Patterns by Streaming Feature Selection

学习数据挖掘,可以用到的工具-机器学习,SPSS(IBM),MATLAB,HADOOP,建议业余时间都看文章,扩充视野,下面是本人翻译的一篇文章,供大家学习.另外,本人感兴趣的领域是机器学习,大数据,目标跟踪方面,有兴趣的可以互相学习一下,本人Q Q邮箱 657831414.,word格式翻译和理解可以发邮件 " 原文题目是Mining Emerging Patterns by Streaming Feature Selection 通过流特征的选择挖掘显露模式 俞奎,丁薇,Dan A. Sim

Pattern Discovery Basic Concepts

Pattern Discovery Basic Concepts @(Pattern Discovery in Data Mining)[Pattern Discovery] 本文介绍了基本的模式挖掘的概念 Pattern: A set of items, subsequences, or substructures that occur frequently together (or strongly correlated) in a data set. Motivation to do pa

Introduction - Notes of Data Mining

Introduction @(Pattern Discovery in Data Mining)[Data Mining, Notes] Jiawei Han的Pattern Discovery课程笔记 Why data mining? data explosion and abundant(but unstructured) data everywhere drowning in data but starving in knowledge keyword: interdisciplinary

【转载】发个有用的:国际学术期刊会议大排名

Rank Conference Full Name1 OSDI Operating Systems Design and Implementation2 SOSP Symposium on Operating Systems Principles3 SIGCOMM Special Interest Group on Data Communication4 MOBICOM Mobile Computing and Networking5 SIGGRAPH Annual Conference on

KDD2015,Accepted Papers

Accepted Papers by Session Research Session RT01: Social and Graphs 1Tuesday 10:20 am–12:00 pm | Level 3 – Ballroom AChair: Tanya Berger-Wolf Efficient Algorithms for Public-Private Social NetworksFlavio Chierichetti,Sapienza University of Rome; Ales

{ICIP2014}{收录论文列表}

This article come from HEREARS-L1: Learning Tuesday 10:30–12:30; Oral Session; Room: Leonard de Vinci 10:30  ARS-L1.1—GROUP STRUCTURED DIRTY DICTIONARY LEARNING FOR CLASSIFICATION Yuanming Suo, Minh Dao, Trac Tran, Johns Hopkins University, USA; Hojj

数据挖掘方面重要会议的最佳paper集合

数据挖掘方面重要会议的最佳paper集合,后续将陆续分析一下内容: 主要有KDD.SIGMOD.VLDB.ICML.SIGIR KDD (Data Mining) 2013 Simple and Deterministic Matrix Sketching Edo Liberty, Yahoo! Research 2012 Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping T