The complete guide to understand variance and standard deviation

Image by blackjackmath

Descriptive statistics are used to describe the basic information about the data. They help us understand some features of the data by giving short summaries about the sample. It’s like the first impression of what the data shows us. In a nutshell, descriptive statistics are metrics and quantitative analysis to briefly describe our sample. And they’re broken down into measures of central tendency and measures of variability. In this article, we’ll discuss the latter and we’ll clear the fog on famous questions about variance and standard deviation.

Variance

Suppose that we have a random variable, by definition, it can take different…


Understanding the confusion matrix, AUC and ROC curve with their implementations

Machine learning classification metrics are not that hard to think about if the data are quite clean, neat and balanced. We can just compute the accuracy with the division of the true predicted observations by the total observation. This is not the case in general. In fact, a lot of problems in machine learning have imbalanced data (spam detection, fraud detection, detection of rare diseases …).

Confusion matrix and ROC curve by Hosein Kazazi

Say we want to create a model to detect spams and our dataset has 1000 emails where 10 are spams and 990 are not. So, we have chosen Logistic Regression to do this task…


Learn PCA with its interpretation and its implementation in R

Principal Component Analysis (PCA) is a method of dimensionality reduction, it can be used for feature extraction or representation learning. It transforms the data from a d-dimensional space into a new coordinate system of p dimensions(p≤d), and extracting the most important q variables(q << d)

When should I use it ?

First of all we need to know that PCA works only with continuous variables. So if you have a mixture of categorical and continuous variables you have to select only the non-discrete ones.

We can use PCA:

  • Just to visualize data in a space of two or three dimensions
  • If the interpretation of your model…


Learn how logistic regression is implemented from scratch

Logistic regression is a classification algorithm, which is pretty popular in some communities especially in the field of biostatistics, bioinformatics and credit scoring. It’s used to assign observations a discrete set of classes(target).

Why not logistic classification?

Logistic regression is strictly not a classification algorithm on its own. Because its output is a probability(a continuous number between 0 and 1), it will be only a classification algorithm in combination with a decision boundary which is generally fixed at 0.5

For example, if we have a logistic regression that has to predict whether an email is a spam or not, the output of the function…

Yassine EL KHAL

A software and machine learning engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store