t-SNE: A Journey into High-Dimensional Data Visualization
Introduction
Imagine trying to understand the structure of a complex dataset with hundreds or thousands of features. It’s like looking at a tangled ball of yarn — difficult to make sense of without unraveling it. This is where t-Distributed Stochastic Neighbor Embedding (t-SNE) comes in. Developed by Laurens van der Maaten and Geoffrey Hinton, t-SNE is a powerful technique designed to visualize high-dimensional data by projecting it into low-dimensional space, usually 2D or 3D. This blog will explore the fundamentals of t-SNE, break down its mathematics into digestible chunks, and guide you through a practical example using Python.
Understanding t-SNE with an Example
Let’s start with a simple analogy. Imagine you have a large group of friends, and you want to arrange them in a way that those with similar interests stand close to each other. However, each friend has a long list of interests, making it hard to see the similarities at a glance. t-SNE helps you by reducing the list of interests into a 2D map where friends with similar interests are placed near each other.
The Mechanics of t-SNE
High-Dimensional Space
In high-dimensional space, t-SNE calculates the similarity between each pair of data points. This similarity is based on the probability that a data point would pick another data point as its neighbor, using a Gaussian distribution centered at the first point.
Mathematically, for data points 𝑥𝑖 and xj, the similarity 𝑝𝑗∣𝑖 is:
Here, 𝜎𝑖 is the variance of the Gaussian centered at 𝑥𝑖.
Symmetric Similarities
To ensure the similarities are symmetric, we define:
where 𝑛 is the number of data points.
Low-Dimensional Space:
In the low-dimensional map, t-SNE needs to decide how to place friends close to each other in a way that their new 2D positions reflect their similarities based on their long lists of interests. To do this, t-SNE uses a special way of measuring distances, which helps keep similar friends close and spread out different groups.
Student-t Distribution with One Degree of Freedom
Imagine each pair of friends as being connected by a stretchy rubber band. The strength of this rubber band is stronger when the friends have more similar interests. In the new 2D world, these rubber bands still try to keep similar friends close while allowing more flexibility to spread out different groups.
Mathematically, this stretchy rubber band analogy is represented by a Student-t distribution with one degree of freedom (essentially a t-distribution). This distribution has heavy tails, meaning it allows more flexibility to stretch out and prevent crowding.
Here’s the formula that represents the similarity between two friends 𝑖 and 𝑗 in the 2D map:
Where:
- 𝑞𝑖𝑗 is the similarity between friends 𝑖 and 𝑗 in the 2D space.
- 𝑦𝑖 and 𝑦𝑗 are the positions of friends 𝑖 and 𝑗 in the 2D space.
- The numerator (1+∥𝑦𝑖−𝑦𝑗∥2)−1 gives a higher value when yi and yj are close, meaning they are similar.
- The denominator ensures that all similarities sum up to 1, making it a probability distribution.
By using this distribution, t-SNE ensures that similar friends stay close together in the 2D space, while allowing different groups to spread out, just like stretching rubber bands.
Minimizing the Difference
t-SNE aims to minimize the difference between the high-dimensional similarities pij and the low-dimensional similarities qij. This is achieved by minimizing the Kullback-Leibler (KL) divergence:
Algorithm : Simple Version of t-Distributed Stochastic Neighbour Embedding (t-SNE)
Applying t-SNE to the MNIST Dataset
Let’s see t-SNE in action with the MNIST dataset, a collection of 70,000 images of handwritten digits. Each image is a 28x28 pixel grid, which we can think of as a 784-dimensional vector.
Step-by-Step Code Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.manifold import TSNE
digits = datasets.load_digits()
X, y = digits.data, digits.target
from sklearn.preprocessing import StandardScaler
X = StandardScaler().fit_transform(X)
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X[:10000]) # Use a subset for faster computation
plt.figure(figsize=(10, 8))
scatter = plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y[:10000].astype(int), cmap='jet', alpha=0.7)
plt.colorbar(scatter)
plt.title('t-SNE visualization of MNIST dataset')
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.show()
Visualizing the Results
After running the above code, you should see a scatter plot where each point represents an image of a digit, and the color indicates which digit it is. Points that are close together correspond to images of similar digits, showing how t-SNE helps in visualizing high-dimensional data in a comprehensible way.