Saturday, July 3, 2010

Efficient distances


Efficient distances


Square root

Every programmer knows that sqrt (square root) is an expensive function. There are sometimes rumours that it is nowadays efficient enough. But that is not always the case. See e.g. this link.

It can easily be tested and my personal conclusions are that sqrt is still slow. See e.g. this test-program I created. You can run it even remotely in codepad, showing sqrt is about 3 slower than addition. On my computer the factor is more than 8.  (Normally I use Boost.Timer for timings, but for codepad the clock() function will do).

So sqrt is relatively slow. If you calculate many (thousands) of them, and you can avoid them, avoid them, it will increase performance. In distances sqrt is often easy to avoid.

Cartesian distance

A distance in a Cartesian system is normally calculated using the Pythagorean theorem, by the well-known function:  d = √(Δx2 + Δy2). So that function contains an expensive square root.

If you compare distances the square root can easily be avoided, because √(Δx2+Δy2) < d is equivalent to Δx2+Δy2 < d2 [Paul Hsieh's Square Root page].
Stated differently, the function sqrt is strictly increasing. Therefore, if (and only if) √a < √b then a2< b2 (a and b are positive).

Avoiding the square root is a well-known performance trick in geometry. For example, in calculating the distance from a point to a linestring or a polygon, many distances are compared, sometimes 10000 or 100000 times. The smallest distance is selected. There is only one square root calculation necessary: the final result. The closest distance of a point to all polygon segments is returned as the distance, and there a (only) square root is applied.

Comparable distance

The Boost.Geometry library has a comparable_distance function. It returns a distance-measure, so not a real distance, but something related to distance. In the case of a Cartesian coordinate system, it returns the square distance.

So library users can use the distance function (returning the real distance) and the comparable_distance function (returning the distance-measure which is comparable to other distance measures). The comparable_distance is (of course) more efficient. Internally Boost.Geometry uses comparable distance measures where possible. For example in the distance of a point to a polygon (see above), but also in simplification (removing less important points from a geometry).

Great circle distance

Besides the distance between two points in metric space, often the distance on a sphere is necessary. This cannot be calculated using the Pythagorean theorem. Distances on a sphere are called great circle distances, because the shortest path between two points on a sphere is called the great circle. For a perfect sphere, the following formula is applicable:
 d = 2 * asin(sqrt((sin((lat1-lat2)/2))^2 
+ cos(lat1) * cos(lat2)
* (sin((lon1-lon2)/2))^2))
Substituting sin(a/2)with the haversine function hav makes it somewhat simpler:
 d = 2 * asin(sqrt(hav(lat1-lat2) 
+ cos(lat1) * cos(lat2)
* hav(lon1-lon2) ))
This distance is usually multiplied by the radius of the earth (R ≈ 6378.137 km), to get the distance in kilometers (use other units to get it in e.g. miles).

So let's try, for fun:

Amsterdam: 52° 22′ 23″ N, 4° 53′ 32″ E => 52.373056, 4.892222 (degrees)
Barcelona: 41° 23′ 0″ N, 2° 11′ 0″ E => 41.383333, 2.183333 (degrees)
 d = R * 2 * asin(sqrt(hav(52.373056 - 41.383333) 
+ cos(52.373056) * cos(41.383333)
* hav(2.183333 - 4.892222) ))
= R * 2 * asin(sqrt(0.0091693
+ 0.61051768 * 0.7503034
* 5.58722e-4))
= R * 2 * asin(sqrt(0.00942524))
= R * 2 * 0.194473661
= 1240.38 kilometers
Which is about right. Note that we still assume a perfect sphere, there are more precise calculations for the ellipsoidal Earth.

Comparable great circle distance

What is less widely known, but becomes clear now, is that this function also has a much faster comparable equivalent. Of course we can save multiplying by R * 2, because if a > b, then R * 2 * a > R * 2 * b. But the asin (arc sinus) function is also strictly monotone increasing. So if a > b then asin(a) > asin(b). Therefore, we can save that calculation too, and we then of course can also save the square root again for this case. What's left are four trigoniometric functions. Which are still expensive, but cannot be avoided here.

So the comparable variant of this function is just
 cd = hav(lat1-lat2) + cos(lat1) * cos(lat2) * hav(lon1-lon2)
Comparing the performance of the great dircle distance and its comparable variant gives, on my computer, a factor 37. So to compare great circle distances (still assuming a perfect sphere) it is enough to compare the distance using the formula above, and it is 37 times faster. This will save a lot when calculating the distance of a GPS-point to a polygon (city, community, country) specified in lat/long coordinates.

Boost.Geometry implements this comparable haversine now.