Introduction to Principal Component Analysis (PCA)

Learn PCA with its interpretation and its implementation in R

Yassine EL KHAL
6 min readOct 8, 2020

Principal Component Analysis (PCA) is a method of dimensionality reduction, it can be used for feature extraction or representation learning. It transforms the data from a d-dimensional space into a new coordinate system of p dimensions(p≤d), and extracting the most important q variables(q << d)

When should I use it ?

First of all we need to know that PCA works only with continuous variables. So if you have a mixture of categorical and continuous variables you have to select only the non-discrete ones.

We can use PCA:

  • Just to visualize data in a space of two or three dimensions
  • If the interpretation of your model features isn’t very important to you
  • If you want to make your features independent
  • If you can tolerate losing a part of information in your data

The core of PCA

The main idea in PCA is the correspondence between the information that the data gives us and the variance of its features.

Say we have our population of people with different ages, jobs and weights, but they all have the same height. Since the height feature is the same for all observations, it doesn’t give any information about every individual and doesn’t make any individual…

--

--