Implementing Spectral Clustering from Scratch: A Step-by-Step Guide

3 min readMay 23, 2024

Introduction

Clustering is a fundamental task in data science, where we group similar data points together. While k-means is a popular clustering method, it struggles with non-linearly separable data. Spectral clustering, on the other hand, excels in these situations. In this blog, we’ll implement spectral clustering from scratch, making each step easy to understand.

What is Spectral Clustering?

Spectral clustering leverages the properties of the data’s similarity graph. It clusters data by using the eigenvalues (spectrum) of a matrix derived from the data. This method is powerful for clustering non-linear structures, such as concentric circles or moons.

Step-by-Step Implementation

Let’s dive into the implementation. We’ll use Python and its numerical libraries, such as NumPy and SciPy.

Step 1: Importing Libraries

First, we need to import the necessary libraries. We’ll use NumPy for numerical operations and SciPy for linear algebra functions.

import numpy as np
from scipy.linalg import eigh
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

Step 2: Create the Dataset

For demonstration, we’ll use the “moons” dataset from scikit-learn, which is perfect for showcasing the strengths of spectral clustering.

# Create the dataset
X, y = make_moons(n_samples=300, noise=0.1, random_state=42)

Step 3: Construct the Similarity Matrix

The similarity matrix (or affinity matrix) captures how similar each pair of data points is. We’ll use the RBF (Radial Basis Function) kernel for this purpose.

def rbf_kernel(X, gamma=1.0):
    pairwise_sq_dists = np.sum((X[:, np.newaxis] - X[np.newaxis, :]) ** 2, axis=2)
    return np.exp(-gamma * pairwise_sq_dists)

# Compute the similarity matrix
similarity_matrix = rbf_kernel(X, gamma=15.0)

Step 4: Construct the Laplacian Matrix

The Laplacian matrix is key to spectral clustering. It’s computed as 𝐿=𝐷−𝐴, where 𝐷 is the degree matrix and 𝐴is the similarity matrix.

# Degree matrix
degree_matrix = np.diag(similarity_matrix.sum(axis=1))

# Laplacian matrix
laplacian_matrix = degree_matrix - similarity_matrix

Step 5: Compute the Eigenvalues and Eigenvectors

We now compute the eigenvalues and eigenvectors of the Laplacian matrix. The eigenvectors corresponding to the smallest eigenvalues form the new feature space for clustering.

# Compute the eigenvalues and eigenvectors
eigenvalues, eigenvectors = eigh(laplacian_matrix)

# Select the eigenvectors corresponding to the smallest k eigenvalues
k = 2
selected_eigenvectors = eigenvectors[:, :k]

Step 6: Clustering in the New Feature Space

Using k-means, we cluster the data points in the new feature space formed by the selected eigenvectors.

from sklearn.cluster import KMeans

# Perform k-means clustering
kmeans = KMeans(n_clusters=k)
kmeans.fit(selected_eigenvectors)
labels = kmeans.labels_

Step 7: Visualize the Clusters

Finally, let’s visualize the resulting clusters to see how well spectral clustering performed.

# Plot the results
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', lw=0.5, edgecolor='k')
plt.title("Spectral Clustering Results")
plt.show()

Conclusion

Spectral clustering is a powerful technique, especially for data that isn’t linearly separable. By following the steps above, you can implement spectral clustering from scratch and apply it to your own datasets.

Summary of Steps

Import Libraries: Set up your Python environment with necessary libraries.
Create the Dataset: Generate or load your dataset.
Construct the Similarity Matrix: Measure pairwise similarities using a kernel function.
Construct the Laplacian Matrix: Build the Laplacian matrix from the similarity matrix.
Compute Eigenvalues and Eigenvectors: Extract the smallest eigenvalues and their corresponding eigenvectors.
Cluster in the New Feature Space: Use k-means to cluster data in the new feature space.
Visualize the Clusters: Plot the clusters to evaluate the results.

By implementing spectral clustering from scratch, you gain a deeper understanding of the algorithm’s mechanics. Happy clustering!