Friday, January 28, 2011

SQL Server, PostGIS, STDifference and Precision


SQL Server, PostGIS, STDifference and Precision


Today I encountered a problem related to the OGC difference function. I will describe this problem in this blog. I created a sort of minimal sample, without database, and present that here.

The problem

The problem I encountered is precision: both SQL Server (where I encountered it) and PostGIS (where I verified it) do not handle the situation described here sufficiently. The results is a polygon with two spikes which should not be there. The cause must be imprecision of floating point types. The message is: be warned!

The situation is as following: there is a parcel. That parcel is cut into two pieces (I will explain below how). One piece is thrown away, the other piece is kept. That kept piece is subtracted from the original parcel. What you would expect is having the original piece that was earlier thrown away. But what is happening is that you get one or two spikes extra...

The parcel and the cutline are shown here:
cl

Cutting into two pieces

We cut (for this experiment) the polygon into two pieces by buffering the cutline, and subtracting (with STDifference) that from the parcel. The cutline is shown in the figure above as the dotted red line. Then we keep one of the two resulting polygons using the STGeometryN function. By trying we get the left piece which we want to keep by index 2.

So this piece of SQL is, for SQL Server:

select geometry::STGeomFromWKB(0x
.STDifference(geometry::STGeomFromText('LINESTRING(180955 313700,180920 313740)', 28992).STBuffer(0.1)).STGeometryN(2)


And for PostGIS it is:

select ST_GeometryN(ST_Difference(
ST_Set
::geometry, 28992)
, (ST_Buffer(ST_GeomFromText('LINESTRING(180955 313700,180920 313740)',28992),0.1))), 2)

Note that the 28992 is just an SRID of a coordinate system (the Dutch), it probably works (or better stated: does not work) with any coordinate system. The result is this:

pieces

Subtracting one of the pieces from the original

After cutting we subtract the piece we kept from the original parcel. This will result in the other piece. In fact we get the area covered by the small buffer as well, but that is hardly visible, and is not important for this story.

This is how it should look like:
results


And this is the complete query in SQL Server:
select
geometry::STGeomFromWKB(0x
.STDifference
(
  (
select
geometry::STGeomFromWKB(0x
.STDifference(geometry::STGeomFromText('LINESTRING(180955 313700,180920 313740)', 28992).STBuffer(0.1)).STGeometryN(2)
)
)

And in PostGIS:

select ST_Perimeter(ST_Difference(
ST_Setgeometry,28992)
,
(select ST_GeometryN(ST_Difference(
ST_Setgeometry,28992)
,(ST_Buffer(ST_GeomFromText('LINESTRING(180955 313700,180920 313740)', 28992),0.1))),2)
)))

Results

To my surprise I got two spikes in SQL Server. To verify this, I looked into PostGIS. I got two spikes as well. First I thought it was caused by precision, going to WKT and back. But also if you use WKB, or if you use the geometry from the underlying database (not shown here), you will get the same. So it is the imprecision of the handling. When varying the cutline a bit (changing 180955 to 180960), I get one spike in SQL Server and still two in PostGIS.

The spikes are shown here, bordering the blue result:
spikes

The perimeter of the result is interesting. The result, without spikes, should be 92.5 meter. But including the spikes it is 151.792871568541 (in SQL Server, using the function .STLength() and 151.792880859545 (in PostGIS, using the function ST_Perimeter(...).

The area is everywhere about 295.4 square meter, with or without spikes, indicating that the spikes have an area about zero (so those are real spikes).


Why would you do such a thing... Creating a parcel, cutting, throwing away one, subtracting and getting the other one...

Well, I was just testing our application. Our application has WebGIS functionality to create parcels. And to cut parcels into two pieces. And to throw away pieces. And to fill-up space which is left from a parcel and a piece. So just this scenario. And to my annoyance, often you get spikes! This project is done with SQL Server, and we get spikes. But this little research shows that, would we have done it with PostGIS, we would have had resulting spikes as well.

It is as a simple mathematical exercise. We cut 5 into two pieces, 3 and 2. We throw away the 2 and keep the 3. We subtract 3 from our 5. And we get the 2 back, of course. In geometry we subtract (difference) twice but that is just because the cut is not an OGC operation. So we trick it.

Note that I didn't get this just once. I got it several times for different geometries. Note also that it can easily be solved using a STBuffer of -0.00001 on the resulting polygon, which will eliminate the spikes and have no visual impact (it has an impact though...).

With Boost.Geometry

Of course I was curious what Boost.Geometry would do with this.

The buffer of Boost.Geometry is not completely implemented, so I create the buffer in SQL Server, saving the WKB.
select geometry::STGeomFromText('LINESTRING(180955 313700,180920 313740)', 28992).STBuffer(0.1).STAsBinary()

and draft some C++ code:

template <typename T> 
void test_difference_parcel_precision()
{
    namespace bg = boost::geometry; 
    typedef bg::model::d2::point_xy<T> point_type;
    typedef bg::model::polygon<point_type> polygon_type;
    typedef std::vector<boost::uint8_t> byte_vector;

    polygon_type parcel, buffer;

    {
        byte_vector wkb;
        bg::hex2wkbstd::back_inserter(wkb));
        bg::read_wkb(wkb.begin(), wkb.end(), parcel);
    }
    {
        byte_vector wkb;
        bg::hex2wkb(
std::back_inserter(wkb));
        bg::read_wkb(wkb.begin(), wkb.end(), buffer);
    }
    bg::correct(parcel);
    bg::correct(buffer);

    std::vector<polygon_type> pieces;
    bg::difference(parcel, buffer, pieces);

    std::vector<polygon_type> filled_out;
    bg::difference(parcel, pieces.back(), filled_out);

    std::cout << std::setprecision(16) << bg::wkt(filled_out.front()) << std::endl;
    std::cout << bg::area(filled_out.front()) << std::endl;
    std::cout << bg::perimeter(filled_out.front()) << std::endl;
}


We are using WKB here to not loose precision. I run this with three different numeric types (float, double and ttmath high precision). And that is the essence of Boost.Geometry! Not one type! Many types! All types!

int main(int, char* [])
{
    test_difference_parcel_precision<float>();
    test_difference_parcel_precision<double>();
    test_difference_parcel_precision<ttmath_big>();
    return 0;
}

Using these three types we get:
  • float: one spike, perimeter 136.28
  • double: two spikes, perimeter 151.793
  • ttmath: zero spikes, correct result, perimeter 92.50548050416428971188559805054187252


Conclusions

  • SQL Server is not perfect, it can cause spikey features
  • PostGIS is not perfect, it can cause spikey features
  • The cause is not WKT, nor it is WKB
  • Boost.Geometry, using double, is not perfect because it causes the same spikey features
  • Boost.Geometry, using ttmath, gives the correct results
  • Boost.Geometry, using float, gives one spikey feature
  • Changing the cutline a bit causes only one spikey feature within SQL Server, still two in PostGIS, and zero in Boost.Geometry (with all numeric types).
  • Be aware of floating point precision

5 comments:

  1. Very interesting! The next thing to do would be to test this with an esri geodatabase and a geoprocessing tool.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. on any sql server spatial operation, its possible (if the new point(s) is(are) or not precisely on the grid) that slight shifting occurs , caused by alignment to the integer grid. its like hell on earth...

      Delete
    2. Thanks for your comments. Yes, handling point as integer is a common prodedure to avoid FP errors. So they do it like that. Boost.Geometry uses rescaling too (to integer) but only for the internal graph model, it leaves original coordinates as they are. And most probably we will solve this differently (without rescaling) in a future version.

      p.s. WKB is currently in Boost.Geometry as an extension and will hopefully be released in 1.64

      Delete