This course will begin by introducing data manipulation and cleaning techniques, as well as the fundamental abstractions and data structures for data analysis. Advanced forms of data visualization will then be presented. Applied machine learning will be discussed according to its techniques and methods, and it will be explained why it is different from descriptive statistics. Data dimension, data clustering and cluster evaluation will also be discussed in this course. Examples of predictive modeling methods will be presented to understand the problems related to data generalization (e.g. cross-validation and overfitting). Advanced techniques on construction sets and practical limitations of predictive models will also be covered in this course. The fundamentals of text mining, including regular expression manipulation, text cleaning and preparing text for use in machine learning processes, as well as natural language processing methods and text classification, will also be discussed through exercises and examples in this course. Finally, network analysis techniques, the concept of connectivity versus robustness, centrality and betweenness will be presented.

Reference:
VANDERPLAS, J. – Python Data Science Handbook: Essential Tools for Working with Data (1st ed.). O’Reilly Media, Inc., 2016.

Note: This course is offered as a master’s course. For the doctorate, she has additional requirements.

* Basic syllabus. The teacher has the autonomy to make any changes.