Data mining
Lecture handouts
Introduction
Predictions and classifications
- Decision trees.
- What is a decision tree?
PDF
- Supervised learning. Algorithm for learning decision trees. PRIMER: node impurity, GINI, rntropy, information gain.
PDF.
- ID3 algorithm in action: step-by-step examples. PDF.
- Special cases. Multi-valued attributes, gain ratio. Numeric attributes. PRIMER: variance.
Regression trees. Missing values. Tree overfitting.
PDF.
- Applications and limitations. PDF.
- Rule-Based Classifiers. Coverage and Accuracy. Full step-by-step example. Rule induction algorithms. Decision Trees vs. rules.
PDF.
- Probabilistic classifiers.
- PRIMER: Uncertain knowledge. Belief and Probability. Conditional probability.
PDF.
- Bayes' Rule. Conditional Independence. Naive Bayes Classifier. Application: Text Categorization.
PDF.
- Bayesian Belief Networks: semantics, inference, classification, construction, applications.
PDF.
- Instance-Based Learning. Nearest neighbour reasoning. PRIMER: similarity and distance. Application: Recommender Systems.
PDF.
- Evaluating classifiers.
- Credibility: Evaluating what's been learned.
Holdout estimation. Cross-validation. The bootstrap. Predicting performance. PRIMER: Confidence intervals.
PDF.
- Comparing two classifiers. PRIMER: T-test.
PDF.
- Cost-based evaluation.
PDF.
- Specificity-sensitivity trade-off. ROC curves.
PDF.
Associations and correlations
- Association analysis. Basic concepts. Support and confidence.
PDF
- Frequent itemsets. Apriori algorithm. Compact representations of frequent itemsets.
PDF
- Generating association rules from frequent itemsets.
PDF. Full step-by-step example. PDF.
- Alternative Methods for Frequent Itemset Generation. FP-Growth Algorithm. FP-Tree.
PDF.
Full step-by-step example. PDF.
- Evaluation of Association Patterns. Top-support patterns.
Top-confidence patterns. Statistical evaluation. Null-invariant measures. PRIMER: Correlation measures.
PDF.
- Associations: special cases. Attribute types, concept hierarchies and negative associations.
PDF.
Clustering
- What is cluster analysis? More on similarity and distance.
PDF.
- K-means clustering. Problems with Selecting Initial Points. Bisecting K-means. Limitations of K-means.
PDF.
- Agglomerative Hierarchical Clustering.
PDF.
- Density-based clustering. DBSCAN.
PDF.
Bonus topics
- Genetic algorithms. Optimization problems. Finding the best rules.
PDF.
- Artificial Neural Networks.
PDF.
- Mining networks. PageRank.
PDF.