Semester 3 - Data Mining (Week 8-10)
What is Frequent Patterns?
- Patterns that appear frequently in a dataset.
- Sets of items that commonly occur together in transactions.
- Example: {milk, bread} is a frequent itemset if customers often buy them together.
Define Frequent Itemset Mining
- Identifies and lists itemsets that meet a minimum support threshold.
- Important for tasks like association rule mining and correlation mining.
- Aims to find relationships between items in large datasets.
Association Rule
- Association rules find interesting relations between variables in large databases.
- They identify strong rules based on measures of interestingness.
- Example: “If a customer buys a coffee, they are 80% likely to also purchase a muffin.”
Market Basket Analysis
- Modelling technique that explores item associations in shopping baskets.
- Used by retailers to uncover item relationships.
- Analyzes frequent co-occurrences of items in transactions.
- Example: Customers buying computers often purchase antivirus software, suggesting strategic product placement.
Name Applications of Frequent Pattern Mining
- Data Preprocessing and Noise Filtering:
- Helps filter out noise and clean data.
- Distinguishes frequent item co-occurrences from random noise.
- Discovery of Inherent Structures and Clusters:
- Reveals hidden structures and clusters in data.
- Identifies coauthor and conference clusters in DBLP data set.
- Pattern-based Classification:
- Constructs reliable classification models using frequent patterns.
- Provides more information gain compared to infrequent patterns.
- Subspace Clustering in High-Dimensional Space:
- Supports effective clustering in high-dimensional space.
- Overcomes challenges of measuring distances in such spaces.
- Microarray Data Analysis:
- Analyzes noisy microarray data with thousands of dimensions.
- Helps differentiate noise from meaningful patterns.
- Semantic Annotation of Frequent Patterns:
- Enhances understanding of frequent patterns with annotations.
- Provides additional context and meaning to the patterns.
Applications of Market Basket Analysis
- Marketing and Advertising Strategies:
- Tailoring marketing efforts based on frequently bought together products.
- Optimizing advertising strategies using market basket analysis insights.
- Store Layout Design:
- Placing frequently purchased items together in store layout.
- Utilizing strategic placement to maximize sales and customer experience.
- Sales Planning:
- Identifying items to put on sale based on frequent associations.
- Promoting complementary items to boost sales.
- Catalog Design:
- Grouping frequently bought together items in catalogs.
- Enhancing catalog layout for increased sales potential.
- Cross-Marketing:
- Utilizing correlations to make effective cross-marketing decisions.
- Understanding product associations to drive sales across different categories.
- Customer Shopping Behavior Analysis:
- Analyzing buying habits and preferences of customers.
- Improving shopping experience and increasing sales based on insights.
Frequent Itemset Mining Methods
- Apriori Algorithm:
- Mines frequent itemsets for Boolean association rules.
- Uses the Apriori property and iterative approach.
- Forms candidate itemsets based on frequent (k-1)-itemsets.
- Scans the database to find complete sets of frequent itemsets.
- Variations include hashing, transaction reduction, partitioning, and sampling for efficiency improvement.
- Frequent Pattern Growth (FP-growth):
- Mines frequent itemsets without candidate generation.
- Constructs a compact FP-tree data structure.
- Compresses the transaction database.
- Focuses on frequent pattern growth rather than generate-and-test.
- Offers improved efficiency compared to Apriori-like methods.
Mining Association Rules
Mining association rules involves several steps:
- Finding Frequent Itemsets:
- Identify itemsets (e.g., A and B) that meet a minimum support threshold.
- Support threshold ensures sufficient occurrence in the data.
- Generating Strong Association Rules:
- Generate association rules (A -> B) from frequent itemsets.
- Rules meet a minimum confidence threshold.
- Confidence indicates the probability of B given A.
- Uncovering Correlation Rules:
- Analyze associations to uncover correlation rules.
- Reveal statistical correlations between itemsets A and B.
- Classifying Algorithms:
- Efficient and scalable algorithms for mining frequent itemsets.
- Three categories: Apriori-like, FP-growth, and vertical data format.
- Evaluating Patterns:
- Augment support-confidence framework with pattern evaluation.
- Evaluate interestingness of association rules.
- Consider null-invariant measures (unaffected by null-transactions).
- Common measures: lift, χ2, all confidence, max confidence, Kulczynski, cosine.