Partition with Boost.Geometry
Speeding up with divide and conquer
Boost.Geometry contains many algorithms in its official interface. Most of them follow the OGC interface specification, such as
- distance
- area
- intersection
- within
But there are also some other algorithms, which are mainly, or exclusively, used internally. One of them is partition and this blog post is about spatial partitioning.
Suppose you have 1000 points and 1000 polygons, and you want to calculate how many points are within a polygon.
With Boost.Geometry that is easy: iterate over all points, for each of them iterate over all polygons, call the within algorithm, and if it returns true then increase the count. If polygons overlap, then one point might be in several polygons, but we ignore this for now (it’s trivial to fix).It is easy but it can perform badly. Because it is a so-called “quadratic loop”. If there are, for example, just 10 elements, it is all fine. But if there are one million points and one million polygons, the within operation will be called one trillion times. Which is way too much.
The Boost.Geometry implementation, internally, often needs to compare such sets. For example, the intersection operation of two polygons starts with intersecting all segments of both polygons with each other. Walking over all segments of big polygons would be too slow.
To speed up this process, partition was implemented. It is similar to a kd-tree or to a quad-tree, but not identical. There is no data structure built, or kept. The space is recursively divided if necessary.
In more detail, it works as following:
- the bounding box of the two sets are calculated
- this box is divided into two halves, left and right
- it takes the left half and tries to execute the operation there
It then continues:
- the left box is divided into two halves, top and bottom
- it takes the bottom half and tries to execute the operation there
This process goes on and on, dividing each box into two halves, alternating horizontally and vertically. Until there are a small number (16 by default) of points or polygons in the box. It then calls the specified operation, in this case within.
After the left half is completely handled, we get:
And after handling the right side, the very end of the process, we can see:
Some boxes are smaller than other boxes, because there are more points there (on purpose, the points are not distributed equally for this example), and/or more polygons.
The "operation" can be specified at compile time, like the compare algorithm in sorting. So in this case we can to calculate within for point-in-polygon, but in other cases we want to calculate intersections.
- 2000 points and 2000 polygons:
- partition takes about 5 ms
- loops version takes about 115 ms
- 20000 points and 20000 polygons:
- partition takes about 205 ms
- loops version takes about 11150 ms
The partition algorithm, for this example, is called as following. I'm showing the experimental version here, using lambda's (Boost.Geometry is written before lambda's existed - the "normal" function is still using a policy with an 'apply' method).