Semester 3 - Data Mining (Week 8-10)

2 minute read

What is Frequent Patterns?

  • Patterns that appear frequently in a dataset.
  • Sets of items that commonly occur together in transactions.
  • Example: {milk, bread} is a frequent itemset if customers often buy them together.

Define Frequent Itemset Mining

  • Identifies and lists itemsets that meet a minimum support threshold.
  • Important for tasks like association rule mining and correlation mining.
  • Aims to find relationships between items in large datasets.

Association Rule

  • Association rules find interesting relations between variables in large databases.
  • They identify strong rules based on measures of interestingness.
  • Example: “If a customer buys a coffee, they are 80% likely to also purchase a muffin.”

Market Basket Analysis

  • Modelling technique that explores item associations in shopping baskets.
  • Used by retailers to uncover item relationships.
  • Analyzes frequent co-occurrences of items in transactions.
  • Example: Customers buying computers often purchase antivirus software, suggesting strategic product placement.

Name Applications of Frequent Pattern Mining

  • Data Preprocessing and Noise Filtering:
    • Helps filter out noise and clean data.
    • Distinguishes frequent item co-occurrences from random noise.
  • Discovery of Inherent Structures and Clusters:
    • Reveals hidden structures and clusters in data.
    • Identifies coauthor and conference clusters in DBLP data set.
  • Pattern-based Classification:
    • Constructs reliable classification models using frequent patterns.
    • Provides more information gain compared to infrequent patterns.
  • Subspace Clustering in High-Dimensional Space:
    • Supports effective clustering in high-dimensional space.
    • Overcomes challenges of measuring distances in such spaces.
  • Microarray Data Analysis:
    • Analyzes noisy microarray data with thousands of dimensions.
    • Helps differentiate noise from meaningful patterns.
  • Semantic Annotation of Frequent Patterns:
    • Enhances understanding of frequent patterns with annotations.
    • Provides additional context and meaning to the patterns.

Applications of Market Basket Analysis

  • Marketing and Advertising Strategies:
    • Tailoring marketing efforts based on frequently bought together products.
    • Optimizing advertising strategies using market basket analysis insights.
  • Store Layout Design:
    • Placing frequently purchased items together in store layout.
    • Utilizing strategic placement to maximize sales and customer experience.
  • Sales Planning:
    • Identifying items to put on sale based on frequent associations.
    • Promoting complementary items to boost sales.
  • Catalog Design:
    • Grouping frequently bought together items in catalogs.
    • Enhancing catalog layout for increased sales potential.
  • Cross-Marketing:
    • Utilizing correlations to make effective cross-marketing decisions.
    • Understanding product associations to drive sales across different categories.
  • Customer Shopping Behavior Analysis:
    • Analyzing buying habits and preferences of customers.
    • Improving shopping experience and increasing sales based on insights.

Frequent Itemset Mining Methods

  • Apriori Algorithm:
    • Mines frequent itemsets for Boolean association rules.
    • Uses the Apriori property and iterative approach.
    • Forms candidate itemsets based on frequent (k-1)-itemsets.
    • Scans the database to find complete sets of frequent itemsets.
    • Variations include hashing, transaction reduction, partitioning, and sampling for efficiency improvement.
  • Frequent Pattern Growth (FP-growth):
    • Mines frequent itemsets without candidate generation.
    • Constructs a compact FP-tree data structure.
    • Compresses the transaction database.
    • Focuses on frequent pattern growth rather than generate-and-test.
    • Offers improved efficiency compared to Apriori-like methods.

Mining Association Rules

Mining association rules involves several steps:

  • Finding Frequent Itemsets:
    • Identify itemsets (e.g., A and B) that meet a minimum support threshold.
    • Support threshold ensures sufficient occurrence in the data.
  • Generating Strong Association Rules:
    • Generate association rules (A -> B) from frequent itemsets.
    • Rules meet a minimum confidence threshold.
    • Confidence indicates the probability of B given A.
  • Uncovering Correlation Rules:
    • Analyze associations to uncover correlation rules.
    • Reveal statistical correlations between itemsets A and B.
  • Classifying Algorithms:
    • Efficient and scalable algorithms for mining frequent itemsets.
    • Three categories: Apriori-like, FP-growth, and vertical data format.
  • Evaluating Patterns:
    • Augment support-confidence framework with pattern evaluation.
    • Evaluate interestingness of association rules.
    • Consider null-invariant measures (unaffected by null-transactions).
    • Common measures: lift, χ2, all confidence, max confidence, Kulczynski, cosine.