Mean Shift clustering explained visually

Posted on 13th Sep, 2015

Last year I wrote a blog post on finding meaningful locations in GPS trails. My GPS trail, taken from my android phone. Mean Shift clustering successfully identified several locations of interest, and as a return I should show the world how it works.

The paper that introduced this algorithm is called "The estimation of the gradient of a density function, with applications in pattern recognition" by K. Fukunaga and L.D. Hostetler, published in IEEE Transactions on Information Theory, Volume 21, in 1975. Unfortunately this paper is not freely available online. There is a paper available explaining the technique, and giving proof of its convergence: "Mean Shift, Mode Seeking, and Clustering" by Yizong Cheng, published in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 17, Number 8, August 1995.

The explanation

Similar to the popular k-means clustering algorithm, mean shift it "moves" cluster centers towards areas of high density. The difference between the two algorithms is which parts of the data set are of influence on the cluster prototypes.

The algorithm in pseudo code: B is the bandwidth parameter

1: Initialize cluster prototype each of the data points in the set 
   (or a random sub-selection of the set)
2: For every cluster prototype do:
   - Find the points in the data set that are within B distance of it
   - New cluster prototype location is the mean of those points
   - Iterate until there is no change of points within B distance
3: Merge all cluster prototypes that are close by

It is possible to use a different kernel in step 2, to give a higher weight to points closer to the cluster prototype center. In this blog post I only consider the "flat kernel" e.g. what I just explained in pseudo code.

A single cluster prototype moving in the dataset

What you see in the above animation are three Gaussian blobs of points, and a single cluster prototype moving from a low density area to one of the the dense areas through the iterations. The green star is the actual prototype, the green ring around shows the bandwidth parameter.

Now an animation with a cluster prototype starting at all points in the data set. You can see that with the correct bandwidth parameter the most dense areas are found within a couple of iterations.

A large amount of cluster prototypes moving in the dataset

The bandwidth parameter

The bandwidth is the only parameter of the mean shift algorithm. The animation below show what effect it has on the end result. Choose a bandwidth too low, and the cluster prototypes my get stuck in a local optimum.

Cluster result for different bandwidth settings

But a bandwidth too high will result is cluster prototypes that may span multiple clusters, with the result that there is no density in the data at the location of the cluster prototype.

Bandwidth too high, resulting in two clusters

Bandwidth too high, resulting in one cluster

Source code

The code to create these animations is available here.