Monday, May 8, 2023

Partition with Boost.Geometry

Partition with Boost.Geometry

 

Speeding up with divide and conquer

Boost.Geometry contains many algorithms in its official interface. Most of them follow the OGC interface specification, such as

  • distance
  • area
  • intersection
  • within

But there are also some other algorithms, which are mainly, or exclusively, used internally. One of them is partition and this blog post is about spatial partitioning. 

Suppose you have 1000 points and 1000 polygons, and you want to calculate how many points are within a polygon.

With Boost.Geometry that is easy: iterate over all points, for each of them iterate over all polygons, call the within algorithm, and if it returns true then increase the count. If polygons overlap, then one point might be in several polygons, but we ignore this for now (it’s trivial to fix).

It is easy but it can perform badly. Because it is a so-called “quadratic loop”. If there are, for example, just 10 elements, it is all fine. But if there are one million points and one million polygons, the within operation will be called one trillion times. Which is way too much.

The Boost.Geometry implementation, internally, often needs to compare such sets. For example, the intersection operation of two polygons starts with intersecting all segments of both polygons with each other. Walking over all segments of big polygons would be too slow.

To speed up this process, partition was implemented. It is similar to a kd-tree or to a quad-tree, but not identical. There is no data structure built, or kept. The space is recursively divided if necessary.

In more detail, it works as following:

  • the bounding box of the two sets are calculated
  • this box is divided into two halves, left and right
  • it takes the left half and tries to execute the operation there
But it is often, of course, still too much. In the figure below, the left half could be handled, maybe, but the right half contains for sure too many points.

It then continues:

  • the left box is divided into two halves, top and bottom
  • it takes the bottom half and tries to execute the operation there

This process goes on and on, dividing each box into two halves, alternating horizontally and vertically. Until there are a small number (16 by default) of points or polygons in the box. It then calls the specified operation, in this case within.

After the left half is completely handled, we get:

And after handling the right side, the very end of the process, we can see:

Some boxes are smaller than other boxes, because there are more points there (on purpose, the points are not distributed equally for this example), and/or more polygons.

The "operation" can be specified at compile time, like the compare algorithm in sorting. So in this case we can to calculate within for point-in-polygon, but in other cases we want to calculate intersections.

So, in summary, partition divides the space into halves, alternating left/right and top/bottom until there are just a few geometry objects, and then these geometries are fed into the specified operation.

Of course: if a box is divided into two halves, there can be polygons on the dividing line. They are in both the left and the right half. But this is recognized, and these polygons are handled separately, calling partition again but then for their bounding box.

If we compare calling the within operation with partition and with the trivial outer/inner loop implementation, we can see that it helps a lot ((all measurements on my Linux laptop, in release mode).
  • 2000 points and 2000 polygons:
    • partition takes about 5 ms 
    • loops version takes about 115 ms
  • 20000 points and 20000 polygons:
    • partition takes about 205 ms 
    • loops version takes about 11150 ms

The partition algorithm, for this example, is called as following. I'm showing the experimental version here, using lambda's (Boost.Geometry is written before lambda's existed - the "normal" function is still using a policy with an 'apply' method).

std::size_t count = 0;
boost::geometry::experimental::partition<box_type>(points, rings,
[](auto& box, auto const& point) { bg::expand(box, point); },
[](auto const& box, auto const& point) { return ! bg::disjoint(point, box); },
[](auto& box, auto const& item) { bg::expand(box, item.box); },
overlaps_ring,
[&count](auto const& p, auto const& ring_item)
{
if (bg::within(p, ring_item.ring))
{
count++;
}
return true;
});

The last lambda is the main operation and is only called for a points and polygons which are in the same partitioned box. bg is a shorthand for Boost.Geometry.
 
There are four other lambda's specified. That is because partition needs to calculate bounding boxes from its input (of which it is unaware, it can be anything, not even geometries but structures containing geometries, for example). Therefore, for the two input sets (points and rings) we need two additional lambda's (expanding a box with a point or ring, and returning true if a box intersects with a point or ring).