Exploratory data analysis

Om kursen

The course will review basic concepts in data science and introduce some advanced methods for exploring multivariate datasets. We will work with data organization and planning for multivariate data analyses. Participants will receive training in concepts and gain experience in applying some popular unsupervised data mining techniques for exploring datasets with the aid of online learning platforms. Additionally, students will work with their own multivariate datasets. Some questions to be answered include, "how closely are my samples related to each other, and are any of my measured variables correlated with others?" The focus will be on exploring continuous variables (e.g. density, age, length, weight) while using discrete or categorical variables (e.g. batch, model, gender) as visualization aids. The main methods to be practiced are Principal Components Analysis and Cluster Analysis, which will be placed in a general context of machine learning techniques. Students are encouraged to bring their own datasets and/or can use one of the datasets provided. The examples will mainly use Matlab and R, although other software (e.g. Python) will be accommodated when possible.

Topics to be covered

  • Data organisation and planning
  • Data types and distributions
  • Principal Component Analysis
  • Classification / Clustering
  • Support vector machine
  • Visualization

Obtaining course credit

To pass the course, you will need to complete homework assignments, give two in class presentations and hand in a final report.

Location
This is an online course with in person discussions on zoom. The class typically meets on Thursdays

Mer information

Enquiries: Kate Murphy, murphyk@chalmers.se

Kurslitteratur

We will mainly use online resources, with details to be announced.
Additionally, this book is highly recommended:
Tufte, E.R. The visual display of quantitative information. ISBN 0961392142

Föreläsare

Associate Professor Kathleen Murphy