Semester 3 - Data Mining (Week 8-10)

2 minute read

Patterns that appear frequently in a dataset.
Sets of items that commonly occur together in transactions.
Example: {milk, bread} is a frequent itemset if customers often buy them together.

Association rules find interesting relations between variables in large databases.
They identify strong rules based on measures of interestingness.
Example: “If a customer buys a coffee, they are 80% likely to also purchase a muffin.”

Modelling technique that explores item associations in shopping baskets.
Used by retailers to uncover item relationships.
Analyzes frequent co-occurrences of items in transactions.
Example: Customers buying computers often purchase antivirus software, suggesting strategic product placement.

Data Preprocessing and Noise Filtering:
- Helps filter out noise and clean data.
- Distinguishes frequent item co-occurrences from random noise.
Discovery of Inherent Structures and Clusters:
- Reveals hidden structures and clusters in data.
- Identifies coauthor and conference clusters in DBLP data set.
Pattern-based Classification:
- Constructs reliable classification models using frequent patterns.
- Provides more information gain compared to infrequent patterns.
Subspace Clustering in High-Dimensional Space:
- Supports effective clustering in high-dimensional space.
- Overcomes challenges of measuring distances in such spaces.
Microarray Data Analysis:
- Analyzes noisy microarray data with thousands of dimensions.
- Helps differentiate noise from meaningful patterns.
Semantic Annotation of Frequent Patterns:
- Enhances understanding of frequent patterns with annotations.
- Provides additional context and meaning to the patterns.

Marketing and Advertising Strategies:
- Tailoring marketing efforts based on frequently bought together products.
- Optimizing advertising strategies using market basket analysis insights.
Store Layout Design:
- Placing frequently purchased items together in store layout.
- Utilizing strategic placement to maximize sales and customer experience.
Sales Planning:
- Identifying items to put on sale based on frequent associations.
- Promoting complementary items to boost sales.
Catalog Design:
- Grouping frequently bought together items in catalogs.
- Enhancing catalog layout for increased sales potential.
Cross-Marketing:
- Utilizing correlations to make effective cross-marketing decisions.
- Understanding product associations to drive sales across different categories.
Customer Shopping Behavior Analysis:
- Analyzing buying habits and preferences of customers.
- Improving shopping experience and increasing sales based on insights.

Apriori Algorithm:
- Mines frequent itemsets for Boolean association rules.
- Uses the Apriori property and iterative approach.
- Forms candidate itemsets based on frequent (k-1)-itemsets.
- Scans the database to find complete sets of frequent itemsets.
- Variations include hashing, transaction reduction, partitioning, and sampling for efficiency improvement.
Frequent Pattern Growth (FP-growth):
- Mines frequent itemsets without candidate generation.
- Constructs a compact FP-tree data structure.
- Compresses the transaction database.
- Focuses on frequent pattern growth rather than generate-and-test.
- Offers improved efficiency compared to Apriori-like methods.

Mining association rules involves several steps:

Finding Frequent Itemsets:
- Identify itemsets (e.g., A and B) that meet a minimum support threshold.
- Support threshold ensures sufficient occurrence in the data.
Generating Strong Association Rules:
- Generate association rules (A -> B) from frequent itemsets.
- Rules meet a minimum confidence threshold.
- Confidence indicates the probability of B given A.
Uncovering Correlation Rules:
- Analyze associations to uncover correlation rules.
- Reveal statistical correlations between itemsets A and B.
Classifying Algorithms:
- Efficient and scalable algorithms for mining frequent itemsets.
- Three categories: Apriori-like, FP-growth, and vertical data format.
Evaluating Patterns:
- Augment support-confidence framework with pattern evaluation.
- Evaluate interestingness of association rules.
- Consider null-invariant measures (unaffected by null-transactions).
- Common measures: lift, χ2, all confidence, max confidence, Kulczynski, cosine.

Semester 4 - Distributed System