How Google knows where you live

Posted on 3rd Apr, 2014

Ok. I do not actually know how Google knows where you live, but I can make an educated guess. Using the location data from my Android Phone I am going to show you how this can easily be done.

The data I'm going to use is from the location history as provided by Google. The location history is enabled by default on the new Android phones. You can toggle this setting (on Android 4.4) in the settings app under Location > Google Location Reporting > Location history. I'm not sure what the phone sends "home" when it is disabled, but when enabled you can access it online.

To find your location data go to http://maps.google.com/locationhistory and log in. Here you can play with your data and most importantly for our purposes: download it.

Partial screenshot of maps.google.com/locationhistory

The format in which the data is provided is KML, an extension to the XML file format. Google Earth can read this data. I will be using the Python programming language to manipulate and visualize the data, using the PyKML package to parse the incoming data.

There where three things that surprised me when I looked at the downloaded data. Firstly on the Google site an average day had about 40 points, but the KML file had around 900 data points. Secondly the error margin displayed online is not included in the file. Thirdly the timezone specified in the file seems off by an hour.

For the month of March I investigated this difference between the amount of datapoints. In the figure below it is shown that in the downloaded there is about 1 minute between each point. It seems that online the amount of points shown is lowered when there is little difference consecutive points.

Figure showing that the average time between points in the dataset is around 1 minute

Now for the big question: How do they know where I live? If we make the assumption that I am most often asleep in my own bed at home between 24:00 and 6:00, this is an easy question to answer: The location I most often am at that time. Let's plot my location at night on a map:

Satellite photo of Groningen with my location at night

What you see is a satellite image with heat map overlay of the city of Groningen, The Netherlands. The heat map shows how often I was in a certain location last month. There is a huge amount of points near my home, quite some points where my girlfriend lives, and a few points in the inner city (hey, I'm a student).

There are a few ways to determine the location of the highest density of points. In a later post I will go into more technical detail. For now you'll just have to believe me that the large concentration of points is indeed the location of my home, the mean of those points is off by 1 house. Not bad.

Google Now also tries to figure out where you work. This way it can give you traffic information for your commute. Let's make the assumption that the place I am most likely to be between 9:00 and 17:00 on weekdays is my work location. This is how the map looks now:

Satellite photo of Groningen with my location during the day

Now a big concentration of points is visible at the university complex in the north-east corner of the city. You can also see that I work from home sometimes, and sometimes lunch in the inner city. So the assumption that I am at the university most often during work hours is true!

With this data it is possible to extract much more locations I frequently go to using clustering techniques. In a later post I will look into this and explain the technical parts in more detail.