## Thompson sampling explained visually

In an earlier post I introduced the multi armed bandit problem, and showed 4 ways of solving it. In that post, and in academic literature it is shown that Thompson sampling is the most efficient algorithm to converge to the correct solution. In this post I will show visually how Thompson sampling works.

Thompson sampling fits a Beta distribution to data available for each version to determine the belief it has in the conversion of both versions of a website. To chose what version of the website to show next, it draws a random value from the two distributions, the version that drew the highest value of the two will be the one shown.

In this experiment I simulated 2 versions of a website:

- High converting version, that has a conversion of 50%
- Low converting version, that has a conversion of 40%.

I will show the belief Thompson sampling has in the conversion of both versions before any visits, after 1 visit, and then in steps to 1000 visits. The following figures show the distributions of belief in conversion ratio, with the distribution of certainty between 0 and 100% for two versions of a website.

We start with the estimated conversion ratio when there is no data. Unsurprisingly the distribution for both versions is not only equal, but also flat. The flat distribution signals that according to the distribution any conversion ratio between 0 and 100% is possible, and equally probable.

To determine what version the first visitor gets we draw a random number from both distributions, and serve the version with the highest number. The first visitor was served the high converting version of the site, and bought something. So the distribution for that version is updated. Based on the data we have now it is impossible that this version does not convert visitors (we just observed the opposite!), we even consider it likely that this version will convert all visitors into customers. Let's see what will happen next.

The second visitor got the low converting version of the site, and left without buying something. This is the exact opposite of what we saw with the first customer. It is impossible that it will convert all visitors, and quite likely that it will convert no one.

The third visitor was served the high converting, but left without buying. So for the high converting version we now have one buying visitor, and one leaving without a sale. We can conclude from this information:

- The high converting version will not convert all visitors into buyers.
- The high converting version will convert at least some visitors in to buyers.

You can see the dotted line in the figure above is 0 at the extremes, with a very broad spread around 50% conversion ratio.

Visitor four received the low converting version of the site, and also left without buying. The estimated conversion ratio for this version is pushed more towards 0%.

Fast forward to 10 visitors. You can see two peaks forming. The peaks are not yet centered around 40% and 50%, but it is clear that one version of the site performs better.

After 1000 visitors there is a noticeable difference between the two distributions, one is high and thin the other is low and broad. The two distributions are roughly centered around their real conversion ratios. The reason that the distribution for the high converting version is much higher is because it got more visitors. More visitors means more certainty in a conversion ratio. It got more visitors because every time a random number was drawn it had a higher probability of getting a higher number. This loop causes Thompson sampling to converge to the best version.

After 1000 visitors the low converting version got only 42 visitors, of which 15 bought something. The high converting version got 958 visitors, of which 498 bought something.

Source code is available at github