在数据挖掘中,如何进行有约束地挖掘,如何对待挖掘数据进行条件约束与筛选,是本文探讨的话题。
Why do we use constraint-based pattern mining? Because we’d like to apply different pruning methods to constrain pattern mining process.
And for those reasons:
- Finding all the patterns in a dataset autonomously? — unrealistic!
- Too many patterns but not necessarily user-interested!
- Pattern mining should be an interactive process
- User directs what to be mined using a data mining query language (or a graphical user interface)
- Constraint-based mining
- User flexibility: provides constraints on what to be mined
- Optimization: explores such constraints for efficient mining
- Constraint-based mining: Constraint-pushing, similar to push selection first in DB query processing
Constraints in General Data Mining
A data mining query can be in the form of a meta-rule or with the following language primitives
* Knowledge type constraint:
* Ex.: classification, association, clustering, outlier finding, ….
* Data constraint — using SQL-like queries
* Ex.: find products sold together in NY stores this year
* Dimension/level constraint
* Ex.: in relevance to region, price, brand, customer category
* Rule (or pattern) constraint
* Ex.: small sales (price < $10) triggers big sales (sum > $200)
* Interestingness constraint
* Ex.: strong rules: min_sup ≤ 0.02, min_conf ≥ 0.6, min_correlation ≥ 0.7
Different Kinds of Constraints: Different Pruning Methods
- Constraints can be categorized as
- Pattern space pruning constraints vs. data space pruning constraints
- Pattern space pruning constraints
- Anti-monotonic: If constraint c is violated, its further mining can be terminated
- Monotonic: If c is satisfied, no need to check c again
- Succinct: if the constraint c can be enforced by directly manipulating the data
- Convertible: c can be converted to monotonic or anti-monotonic if items can be properly ordered in processing
- Data space pruning constraints
- Data succinct: Data space can be pruned at the initial pattern mining process
- Data anti-monotonic: If a transaction t does not satisfy c, then t can be pruned to reduce data processing effort.
Pattern Anti-monotonicity
这里range(S.profit)指的是max-min
这里因为随着item的增多,itemset S的support会逐渐减小,所以ex4的答案是yes
Pattern Monotonicity
Data Anti-monotonicity
Succinct Constraints
Convertible Constraints
这里,我们将transaction里面的item进行递减或递增排序,此时就可以将constraint转化为monotone或anti-monotone.
参考上面的说明,即可得出结论:这里我们会选择一个T中的一个或几个item,此时item是有顺序的。
注意我们都将按照right order进行pattern generation