Area of Mathematics: | Computational and Applied Mathematics (ECAM) |
||
Semester: | 8o | ||
Course ID: | 82405 | ||
Course Type: | Elective | ||
Teaching hours per week: | Theory: 3 | Practice: 0 | Laboratory: 0 |
ECTS : | 5 | ||
Eclass: | |||
Instructors: |
Description
The course provides an introduction to data mining and knowledge discovery from data. The key data mining methods of clustering, classification and prediction are illustrated, together with practical tools for their execution. Next, we focus on particular aspects of Big Data such as high volume, high dimensionality and high frequency and incorporate tools build to deal with such structures (dimensionality reduction, incremental clustering) into data mining methodologies. Finally the key methods for Big Data sensing and acquisition are discussed, together with basics of applications in social media mining, text mining and biomedicine. We conclude with an introduction to big data visualization.
- Data mining and the knowledge discovery process. Overview of data mining and machine learning techniques.
- Normal Distribution, linear transformation of random variables. Non linear transformations and kernel methods.
- Clustering. Taxonomy of clustering concepts: distance-based (separation, centroids, contiguity), density-based, partitional vs. hierarchical. Methods for centroid-based clustering (k-means), hierarchical clustering (agglomerative and divisive), density-based clustering (DBSCAN).
- Classification and prediction models. Model learning and model validation. Explanation vs. prediction. Naïve Bayes classifiers. Basic machine learning models (linear discriminant analysis, support vector machines, ensemble methods).
- Dimensionality reduction in Big Data (PCA, Random Projection, Parallelized methods)
- Pattern mining and association rules. A priori principle. Mining high-frequency patterns and high-confidence rules.
- Big data and social sensing. Big data acquisition. Web scraping, crawling, crowdsourcing, crowdsensing. Big data technologies and platforms.
- Social media mining – Text Mining. Monitoring social trends. Basics of opinion mining and sentiment analysis. Recommended Systems.
- Applications in Biomedicine. Population Genomics, DNA sequence data mining.
Data visualization and visual analytics.
Bibliography
- Data Mining and Machine Learning: Fundamental Concepts and Algorithms Second Edition Mohammed J. Zaki and Wagner Meira, Jr Cambridge University Press, March 2020 ISBN: 978-1108473989
- Introduction to Data Mining, Steinbach Tan, Kuma, ISBN-13: 978-9332571402