Week |
Subject |
Related Preparation |
1) |
• What Motivated Data Mining? Why Is It Important?
• So, What Is Data Mining?
• Data Mining—On What Kind of Data?
• Data Mining Functionalities—What Kinds of Patterns Can Be Mined?
|
Reading chapter 1 |
2) |
• What is data?
• Attributes.
• Types of attributes.
• Discrete and continuous variables.
• Types of data set.
• Record data.
• Data matrix.
• Document data.
• Transaction data.
• Graph data.
• Chemical data.
• Ordered data.
• Why data preprocessing?
• Why is data dirty?
• Why is data preprocessing important?
• Multi-dimensional measure of data quality.
• Major tasks in data preprocessing.
• Data quality.
• Noise.
• Outliers.
|
Reading chapter 2. |
3) |
• Missing values.
• Duplicate data.
• Mining data descriptive characteristics.
• Measuring the central tendency.
• Symmetric vs. skewed data.
• Properties of normal distribution curve.
• Histogram analysis.
• Positively and negatively correlated data.
• Not correlated data.
• Data cleaning.
• How to handle missing data?
• How to handle noisy data?
• Simple discretization methods: Binning.
• Regression.
• Cluster analysis.
• Data cleaning as a process.
• Aggregation.
• Sampling.
• Types of sampling.
• Sample size.
|
Reading chapter 2. |
4) |
• Classification.
• Illustrating classification task.
• Examples of classification task.
• Classification techniques.
• Example of a decision tree.
• Another example of decision tree.
• Apply model to test data.
• Decision tree induction.
• Issues: data preparation.
• Issues: evaluating classification methods.
• Algorithm for decision tree induction(ID3/C4.3).
• Attribute selection measure: Information gain.
• Decision tree example. |
Reading chapter 8. |
5) |
• Numeric variables and missing values.
• Overfitting and tree pruning.
• Enhancements to basic decision tree induction.
• Model evaluation.
• Metrics for performance evaluation.
• Limitation of accuracy.
• Cost matrix.
• Calculation of accuracy.
• Cost-sensitive measures.
• Model evaluation.
• Methods for performance evaluation.
• Methods of estimation.
• ROC (Receiver Operating Characteristic).
• Instance Based Classification.
• Nearest neighbor classification.
• k-Nearest neighbor algorithm example. |
Reading chapter 8-9. |
6) |
• What is cluster analysis?
• Applications of cluster analysis
• What is not cluster analysis?
• Notion of a cluster can be ambiguous.
• Types of clustering.
• Characteristics of the input data are important.
• Clustering algorithms.
• Hierarchical clustering.
• Agglomerative clustering algorithm.
• Cluster distance measures.
• Single link(min) hierarchical clustering.
• Single link(min) hierarchical clustering example.
|
Reading chapter 10. |
7) |
• Complete link(max) hierarchical clustering example.
• K-means clustering.
• Importance of choosing initial centroids.
• Limitations of k-means.
• Overcoming k-means limitations.
• K-means clustering example.
|
Reading chapter 10. |
8) |
Midterm1 |
|
9) |
• Association rule mining
• Frequent itemset
• Association rule
• Association rule mining task
•
• Apriori algorithm
• Apriori algorithm example
|
Reading chapter 6. |
10) |
• Statistical classification models.
• Bayes theorem and classifier.
• Bayes classifier example.
• Continuous variables.
|
Reading chapter 6. |
11) |
• Text and web mining.
• Natural language processing.
• Part-of-speech tagging.
• Word sense disambiguation.
• Text databases and IR.
• Indexing techniques.
• Types of text data mining.
• Text classification.
• Document clustering.
• Text categorization.
• Categorization methods.
• Vector space model.
|
|
12) |
Midterm2 |
|
13) |
Project presentations |
|
14) |
Project presentations |
|
15) |
Final exam |
|