Centering Map Viewports Near Locations while Preserving Privacy

Last week, I was at a #pdxgeekout session at the Ace Hotel in downtown Portland. @caseorganic, @aaronpk, @brennannovak, and @maxogden were there and Aaron was working on geoloqi and asked everyone how to best randomize the location data used to center the map.

Why try and mask someone’s location? I could set up geoloqi to let people I don’t know leave message that I get only when I pass withing a certain distance from that message. How do they leave a message in an area that I’m likely to get the message? They have to be able to see where I’m likely to find a message. This means the map they see needs to be centered near my location. Since geoloqi is open source, everyone has access to the source. If the interface for people to leave messages is public — which it is, then anyone can query it. If it just returned my location offset by a random amount, given a certain number of queries the centroid of the returned locations would be, quite accurately, my location.

What geoloqi needs is a method of offsetting the current location which statistically cannot be tricked or munged into reproducing one’s current location. This is the major constraint: How does one ensure that a dedicated (stalker/zombie/attacker/real life spamer/salesman) cannot reproduce one’s location, given the algorithm and a set of data requested from the public API for the map center?

Instead of returning a location that is only a random offset from my current location, I want it to take my history into account. I would like it to take all of my history in a certain region — let us call it \mathcal{R} — and assign a certain probability of me being there based on a sampling of the points. How do I go about doing this?

Let’s take a basis function and look at it:

\displaystyle B(\mathbf{x}) = e^{-{\frac{x_{lat}^2+x_{lang}^2}{2\sigma}}}

This basis function is highest at the origin and does an exponential decay as the radius from the origin is increased and \sigma defines how slowly the function decreases as a function of the radius (it is similar to the standard deviation in the normal distribution, but I’m not using that here). Using this function we can simulate the uncertainty of finding someone at a specific location. Here’s what it looks like graphed out:

2-d Gaussian as Radial Basis Function

By taking a sampling of points from the history of places they’ve been, we can create a crude probability map of where someone could likely be given their current location \mathbf{x} . All we need to do is center this function around all of the sampled points. Let us call the sampling of the history points \mathcal{H} . The probability function over that region is:

\displaystyle P_{\mathcal{H}}(\mathbf{x}) = \sum_{\mathbf{p}\in\mathcal{H}}{ e^{ - \left( \frac{(x_{lat}-p_{lat})^2 + (x_{lng}-p_{lng})^2}{2\sigma} \right)}}

Where p_{lat} and p_{lng} are the latitude and longitude of each point \mathbf{p} in the location history \mathcal{H} .

Let’s say someone’s been undulating through town and their path looks something like this:

Sinusoid composed of radial basis functions

As we can see, because of the rate of change in the \sin function the sampled points get closer as the function’s derivative approaches zero. Our next question is, “How do we create a well distributed sampling of points?” We need to look at our basis function and look at how it decays with distance. If we choose a radius where the height is 0.5, the height of the probability function should stay around unity (unlike the graph above). Here’s the equation for that:

\displaystyle r = \sqrt{-2\sigma\ln(\frac{1}{2})}

As long as the points are spaced r apart, the probability function P should remain nearly normalized. If we wanted to make this better behaved, we would weight each basis according to how long it has been since they’ve been there and how many adjacent sample points there are.

Now that we have our main function P , we need to calculate the directional derivative so we can compute tangent lines and and their intersections with the xy plane at z = 0 :

\displaystyle \partial_{\mathbf{u}}P_{\mathcal{H}}(\mathbf{x}) = \sum_{\mathbf{p}\in\mathcal{H}}{ \left( \frac{x_{lat}u_{lat}-p_{lat}u_{lat} + x_{lng}u_{lng}-p_{lng}u_{lng}}{\sigma} \right) e^{ - \left( \frac{(x_{lat}-p_{lat})^2 + (x_{lng}-p_{lng})^2}{2\sigma} \right)}}

Now we can choose a random point \mathbf{p}_r in the neighborhood of \mathbf{p}_{curr} — our current location, a semi-random unit vector \mathbf{u} , and then calculate a tangent line and it’s intersection with the ground plane.

Let \mathbf{u} = (\cos\theta_{\mathbf{u}}, \sin\theta_{\mathbf{u}}) . Let \theta_{\mathbf{u}} be within a range that is nearly perpendicular to a line through two historical points (say, \mathbf{p}_1 and \mathbf{p}_2 ) adjacent to the current point, so that \left|\mathbf{u} \cdot \frac{\mathbf{p}_{1}\mathbf{p}_{2}}{\|\mathbf{p}_{1}\mathbf{p}_{2}\|}\right| \leq \frac{1}{2}

Keep in mind that \mathbf{u} is defined with a frame of reference of our coordinate system, while the above dot product is independent of the frame of reference. If we wanted to calculate the possible angles that \theta_{\mathbf{u}} could take, we need to first calculate the angle between (p_1, p_2) and the \mathbf{x} axis:

\displaystyle \theta_{\mathbf{e}_1} = \arccos \left(\mathbf{e}_{1}\cdot\frac{\mathbf{p}_{1}\mathbf{p}_{2}}{\| \mathbf{p}_{1}\mathbf{p}_{2} \|} \right)

Where \mathbf{e}_1 is the basis vector for the x axis, or (1, 0) .

Then we know that our random angle can be anywhere from \left[ -\frac{\pi}{3}, \frac{\pi}{3}\right] (because \arccos\frac{1}{2} = \frac{\pi}{3} ). Then to find the angle of \mathbf{u} we find \theta_{\mathbf{u}} = random\left[ -\frac{\pi}{3}, \frac{\pi}{3}\right] + \theta_{\mathbf{e}_1} .

Now, we evaluate P_{\mathcal{H}}(\mathbf{p}_{r}) , \partial_{\mathbf{u}}P_{\mathcal{H}}(\mathbf{p}_{r}) , and find \delta_{\mathbf{u}} :

\displaystyle \delta_{\mathbf{u}} = - \frac{P_{\mathcal{H}}(\mathbf{p}_{r})}{\partial_{\mathbf{u}}P_{\mathcal{H}}(\mathbf{p}_{r})}

Then we find the xy point \mathbf{p}_{new} = \mathbf{p}_{r} + \delta_{\mathbf{u}}\mathbf{u} . If P_{\mathcal{H}}(\mathbf{p}_{new}) is below a certain threshold then we can safely return \mathbf{p}_{new} . If not we set \mathbf{p}_{r} = \mathbf{p}_{new} and start the iteration again. Lather. Rinse. Repeat as necessary.

Once this point is computed for one person, it should be cached until their location is updated again. If you want to return random points around this \mathbf{p}_{new} , ensure the region of return points doesn’t intersect the current region within which the person is located.

Since \mathbf{p}_{new} is dependant on \partial_{\mathbf{u}}P_{\mathcal{H}}(\mathbf{p}_r) ,  care should be taken to ensure it does not fall outside of the desired zoom range for the map. The possible range of values for \mathbf{p}_{new} can be calculated, but it is a very messy problem in the general case. It is left as an exercise to the reader. 😉

Advertisements