Unraveling the Mysteries of Random Variables and Probability in Machine Learning

4 min readMay 24, 2024

Hey there, fellow data enthusiast!

Let’s dive into a topic that’s not just foundational for machine learning but also quite fascinating: random variables, probability mass functions (PMF), probability distribution functions (PDF), and cumulative distribution functions (CDF).

Understanding these concepts will give you a solid grip on the probabilistic underpinnings of many machine learning algorithms.

Ready? Let’s go!

What is a Random Variable?

Imagine you’re playing a game where you roll a die. The outcome of this roll is uncertain, right? This outcome can be represented by a random variable. In simple terms, a random variable is a numerical description of the outcome of a random process. There are two main types of random variables:

Discrete Random Variables: These can take on a finite or countable number of values. For instance, the result of rolling a die (1, 2, 3, 4, 5, or 6).
Continuous Random Variables: These can take on an infinite number of values within a given range. For example, the exact height of people in a population.

Probability Mass Function (PMF)

For discrete random variables, we use a PMF to describe the probabilities of different outcomes. The PMF 𝑃(𝑋=𝑥) assigns a probability to each possible value of the random variable X.

Example: If you roll a fair die, each of the six outcomes (1, 2, 3, 4, 5, and 6) has an equal probability of 1/6. The PMF for this scenario would look like a bar graph with six bars, each of equal height representing a probability of 1/6.

Probability Density Function (PDF)

When dealing with continuous random variables, we use a PDF. The PDF 𝑓(𝑥) describes the likelihood of the random variable taking on a particular value. The probability of X falling within a range [a,b] is given by the area under the curve of 𝑓(𝑥) from 𝑎 to b.

Example: Consider the heights of people. The PDF might show that most people fall within a certain height range, with fewer people being extremely short or extremely tall. This would look like a smooth curve, often resembling a bell curve (if the distribution is normal).

Cumulative Distribution Function (CDF)

The CDF 𝐹(𝑥) of a random variable 𝑋 is a function that gives the probability that 𝑋 is less than or equal to 𝑥.

For discrete random variables, it’s simply the sum of the probabilities up to 𝑥:

The CDF is incredibly useful because it provides a complete picture of the distribution.

Tying It All to Machine Learning

So, how do these concepts relate to machine learning? Well, many machine learning algorithms rely on understanding and modeling the underlying probability distributions of the data.

Naive Bayes Classifier: This algorithm uses the concept of conditional probability. It assumes that the presence of a particular feature in a class is independent of the presence of any other feature (hence “naive”). By applying Bayes’ theorem, it uses the PDF or PMF to make predictions.
Expectation-Maximization (EM) Algorithm: Often used in clustering (like Gaussian Mixture Models), the EM algorithm iteratively estimates the parameters of the underlying distributions (often modeled with PDFs) and updates the cluster assignments.
Hidden Markov Models (HMMs): Widely used in time series and sequential data analysis (like speech recognition), HMMs use states represented by random variables with associated probabilities. The transition between states and the observed data are modeled using PMFs or PDFs.

Code Example: Naive Bayes Classifier

Follow along this Blog to understand how Naive Bayes uses Posterior Probability, Prior Probability, Evidence, Likelihood to achieve the final result

Implementing Naive Bayes from Scratch: A Detailed Guide

Introduction to Naive Bayes

medium.com

In this example, the Gaussian Naive Bayes classifier assumes that the continuous values associated with each feature follow a Gaussian (normal) distribution. The PDF for the normal distribution is given by:

The classifier estimates the parameters 𝜇 (mean) and σ2 (variance) for each feature from the training data and uses these estimates to compute the probability of the data belonging to each class.

Wrapping Up

By understanding these probabilistic concepts, you can better interpret the behavior of your machine learning models, tweak them for better performance, and even innovate new approaches.

Whether you’re working on a spam email classifier, a recommendation system, or a predictive maintenance model, these foundational ideas will always be relevant.

So next time you roll a die or ponder the heights of people around you, remember: you’re also touching on the same principles that power some of the most advanced machine learning algorithms out there.

Keep exploring, keep questioning, and happy learning!