AIE509 Data Mining and Big DataIstanbul Okan UniversityDegree Programs Power Electronics and Clean Energy Systems (English) with thesisGeneral Information For StudentsDiploma SupplementErasmus Policy StatementNational Qualifications
Power Electronics and Clean Energy Systems (English) with thesis
Master TR-NQF-HE: Level 7 QF-EHEA: Second Cycle EQF-LLL: Level 7

General course introduction information

Course Code: AIE509
Course Name: Data Mining and Big Data
Course Semester: Fall
Course Credits:
Theoretical Practical Credit ECTS
3 0 3 10
Language of instruction: EN
Course Requisites:
Does the Course Require Work Experience?: No
Type of course: Department Elective
Course Level:
Master TR-NQF-HE:7. Master`s Degree QF-EHEA:Second Cycle EQF-LLL:7. Master`s Degree
Mode of Delivery: Face to face
Course Coordinator : Prof. Dr. PINAR YILDIRIM
Course Lecturer(s): Dr.Öğr.Üyesi SİNA ALP
Course Assistants:

Course Objective and Content

Course Objectives: The purpose of the course is to educate students about the main concepts and methods of data mining. The course contains these topics: classification, clustering, association algorithms and data mining studies in different areas.

Course Content: The purpose of the course is to educate students about the main concepts and methods of data mining. The course contains these topics: classification, clustering, association algorithms and data mining studies in different areas.

Learning Outcomes

The students who have succeeded in this course;
Learning Outcomes
1 - Knowledge
Theoretical - Conceptual
1) Ability to explain the concepts of data mining
2) Ability to apply data preprocessing techniques
3) Ability to use the algorithms of data mining.
4) Ability to use the tools of data mining
5) Ability to produce solutions in the subjects of data mining
2 - Skills
Cognitive - Practical
3 - Competences
Communication and Social Competence
Learning Competence
Field Specific Competence
Competence to Work Independently and Take Responsibility

Lesson Plan

Week Subject Related Preparation
1) • What Motivated Data Mining? Why Is It Important? • So, What Is Data Mining? • Data Mining—On What Kind of Data? • Data Mining Functionalities—What Kinds of Patterns Can Be Mined? Reading chapter 1
2) • What is data? • Attributes. • Types of attributes. • Discrete and continuous variables. • Types of data set. • Record data. • Data matrix. • Document data. • Transaction data. • Graph data. • Chemical data. • Ordered data. • Why data preprocessing? • Why is data dirty? • Why is data preprocessing important? • Multi-dimensional measure of data quality. • Major tasks in data preprocessing. • Data quality. • Noise. • Outliers. Reading chapter 2.
3) • Missing values. • Duplicate data. • Mining data descriptive characteristics. • Measuring the central tendency. • Symmetric vs. skewed data. • Properties of normal distribution curve. • Histogram analysis. • Positively and negatively correlated data. • Not correlated data. • Data cleaning. • How to handle missing data? • How to handle noisy data? • Simple discretization methods: Binning. • Regression. • Cluster analysis. • Data cleaning as a process. • Aggregation. • Sampling. • Types of sampling. • Sample size. Reading chapter 2.
4) • Classification. • Illustrating classification task. • Examples of classification task. • Classification techniques. • Example of a decision tree. • Another example of decision tree. • Apply model to test data. • Decision tree induction. • Issues: data preparation. • Issues: evaluating classification methods. • Algorithm for decision tree induction(ID3/C4.3). • Attribute selection measure: Information gain. • Decision tree example. Reading chapter 8.
5) • Numeric variables and missing values. • Overfitting and tree pruning. • Enhancements to basic decision tree induction. • Model evaluation. • Metrics for performance evaluation. • Limitation of accuracy. • Cost matrix. • Calculation of accuracy. • Cost-sensitive measures. • Model evaluation. • Methods for performance evaluation. • Methods of estimation. • ROC (Receiver Operating Characteristic). • Instance Based Classification. • Nearest neighbor classification. • k-Nearest neighbor algorithm example. Reading chapter 8-9.
6) • What is cluster analysis? • Applications of cluster analysis • What is not cluster analysis? • Notion of a cluster can be ambiguous. • Types of clustering. • Characteristics of the input data are important. • Clustering algorithms. • Hierarchical clustering. • Agglomerative clustering algorithm. • Cluster distance measures. • Single link(min) hierarchical clustering. • Single link(min) hierarchical clustering example. Reading chapter 10.
7) • Complete link(max) hierarchical clustering example. • K-means clustering. • Importance of choosing initial centroids. • Limitations of k-means. • Overcoming k-means limitations. • K-means clustering example. Reading chapter 10.
8) Midterm1
9) • Association rule mining • Frequent itemset • Association rule • Association rule mining task • • Apriori algorithm • Apriori algorithm example Reading chapter 6.
10) • Statistical classification models. • Bayes theorem and classifier. • Bayes classifier example. • Continuous variables. Reading chapter 6.
11) • Text and web mining. • Natural language processing. • Part-of-speech tagging. • Word sense disambiguation. • Text databases and IR. • Indexing techniques. • Types of text data mining. • Text classification. • Document clustering. • Text categorization. • Categorization methods. • Vector space model.
12) Midterm2
13) Project presentations
14) Project presentations
15) Final exam

Sources

Course Notes / Textbooks: Data Mining Concept and Techniques
J.Han and M.Kamber
@2012| Morgan Kaufmann Publishers
ISBN 978-0-12-381479-1
References: İnternet Kaynakları

Course-Program Learning Outcome Relationship

Learning Outcomes

1

2

3

4

5

Program Outcomes
1) Reaches the information in the field of power electronics and clean energy systems in depth through scientific researches; evaluates the knowledge, interprets and implements.
2) Has the extensive information about current techniques and their constraints in the field of Power Electronics .
3) Using limited or missing data, completes the information through scientific methods and applies; integrates the information from different disciplines.
4) Aware of new and emerging applications of his/her profession; learn and examine them if needed.
5) Builds the Power Electronics problems, develops methods to solve and implements innovative ways for solution.
6) Develops new and/or original ideas and methods; develops innovative solutions for the design of a process, system or component.
7) Designs and implements the analytical, modeling and experimental-based researches; resolves the complex situations encountered in this process and interprets.
8) Leads multi-disciplinary teams, develops solution approaches to complex situations and takes responsibility.
9) Uses at least one foreign language at the general level of European Language Portfolio B2 and communicates effectively in oral and written language.
10) Presents the process and results of the work in national and international media systematically and clearly in written or oral language.
11) Describe the social and environmental dimensions of Power Electronics Engineering applications.
12) In the stages of data collection, interpretation and publication as well as all professional activities, he/she considers the social, scientific and ethical values.

Course - Learning Outcome Relationship

No Effect 1 Lowest 2 Low 3 Average 4 High 5 Highest
           
Program Outcomes Level of Contribution
1) Reaches the information in the field of power electronics and clean energy systems in depth through scientific researches; evaluates the knowledge, interprets and implements.
2) Has the extensive information about current techniques and their constraints in the field of Power Electronics .
3) Using limited or missing data, completes the information through scientific methods and applies; integrates the information from different disciplines. 2
4) Aware of new and emerging applications of his/her profession; learn and examine them if needed.
5) Builds the Power Electronics problems, develops methods to solve and implements innovative ways for solution.
6) Develops new and/or original ideas and methods; develops innovative solutions for the design of a process, system or component.
7) Designs and implements the analytical, modeling and experimental-based researches; resolves the complex situations encountered in this process and interprets.
8) Leads multi-disciplinary teams, develops solution approaches to complex situations and takes responsibility.
9) Uses at least one foreign language at the general level of European Language Portfolio B2 and communicates effectively in oral and written language.
10) Presents the process and results of the work in national and international media systematically and clearly in written or oral language.
11) Describe the social and environmental dimensions of Power Electronics Engineering applications.
12) In the stages of data collection, interpretation and publication as well as all professional activities, he/she considers the social, scientific and ethical values.

Learning Activity and Teaching Methods

Lesson
Project preparation

Assessment & Grading Methods and Criteria

Written Exam (Open-ended questions, multiple choice, true-false, matching, fill in the blanks, sequencing)
Individual Project

Assessment & Grading

Semester Requirements Number of Activities Level of Contribution
Project 1 % 10
Midterms 2 % 50
Final 1 % 40
total % 100
PERCENTAGE OF SEMESTER WORK % 60
PERCENTAGE OF FINAL WORK % 40
total % 100

Workload and ECTS Credit Grading

Activities Number of Activities Workload
Course Hours 16 48
Project 2 70
Midterms 12 130
Final 4 62
Total Workload 310