[postgis-devel] Introducing wagyu to validate MVT polygons

Raúl Marín Rodríguez rmrodriguez at carto.com
Fri Dec 21 07:59:36 PST 2018


Hi all,

When CARTO decided to switch to ST_AsMVT as the default library to generate
MVTs from the database we made a lot of benchmarks and performance
improvements (most of them are available in 2.5 but some won't be available
until 3.0) [1]. The sort summary when you compare it to Mapnik is:

# Good parts
- Postgis can do work in parallel. When this is triggered, performance
is faster than Mapnik.
- Postgis is faster encoding lines (0.5x to 1.5x faster).
- It is also faster encoding properties (2x faster in tiles with high amount
of properties (40+)).
- It is faster discarding small polygons (up to 20x faster in extreme cases).

# Average
- It has a similar performance encoding points.

# Not so good
- It is slower encoding polygons (2x for small [~10 point] polygons, 20x for
big ones [1M points]).

I also comented this in that blogpost [1]: "It would be interesting to analyze
why Postgis validation (based on GEOS) is way slower than Mapnik’s (based on
Boost), and addressing this would benefit multiple SQL functions that use it,
like St_IsValid."

After working around this to try to speed up this process (validation takes
~95-98% of the time in ST_AsMVTGeom) I've learned several things:
- St_MakeValid is buggy. There are several tickets in GEOS around this and even
Postgis has some commented out tests showing this buggy behaviour.
- St_MakeValid goes beyond what's necessary for MVTs, even being
counterproductive sometimes (like collapsing polygons into lines, or lines
into points).
- St_MakeValid can create new points that don't respect the MVT integer
coordinates. Thus it is producing invalid MVT geometries.
- I was wrong when I said that Mapnik's validation was based on Boost, it is
actually based on Wagyu [2].

I started with the idea of improving ST_MakeValid but it soon proved extremely
hard and some of the shortcuts valid for MVT weren't ok for the general case.
With this in mind, a week ago I decided to have a look at Wagyu and see how
hard it would be to integrate it in Postgis. Both the code [3] and the
performance comparison [4] can be seen at Github; it is 1x faster for small
polygons, 20x faster for large ones.

You might be wondering why I decided to do this in Postgis instead of GEOS,
which is a C++ library that we already depend on. The main reasons is because
I have little to no experience with it, and doing it in the right way would
require to move the validation code from Postgis to GEOS, add the ability to
work with integers (it only allows doubles right now) and then expose this all
through its C-API. So, what has taken me a week (and a good chunk was autotools)
would require multiple months of work.

I see 2 ways to include the library and I'm not sure which one is best for this
case:
- Use system libraries. Packages for `wagyu` and `geometry.hpp` are only
available in Debian and Fedora, but not in other Linux distributions, OSX or
Windows. If it was widely available I think this would be the best option, as
it has been done with other libraries like geos, protobuf, etc.
- Bring the library code into the project. This is what I've done in my PR.

A C++11 compiler is required and it should be trivial to switch between the 2
(a couple of configure flags). I think that the first option would be best if
we could have packagers making wagyu more widely available, but even though
both I and my company only use Linux, but I don't want this improvement to
be Linux only.

Some comments about the PR:
- I've created a minimal C api to do the operations we need (clipping with
validation) for the MVT use case (integers and opposite winding order). Wagyu
has other functionalities like Union or XOR but those aren't exposed, and it
can also work with doubles but for this use case it was 10% slower.
- Using wagyu it's optional and only used if you pass `--use-wagyu` to
configure. It will use CXX and CXXFLAGS, not CC and CFLAGS.
- The library only supports polygons, so any other geometries are still passed
to the make_valid based on GEOS.
- The MVT process now transforms into MVT coordinates before clipping. This is
to make the code more similar for the 2 methods (GEOS and Wagyu) and to make
the clipping process consistent (we had hacks to account for half units and so
on which are now gone).
- Some outputs change when using this library, most notably dropping some
geometries that have extreme self intersections (on input).

Things that aren't done:
- Add tests directly to libwagyu instead of relying on MVT tests.
- Adapt MVT tests to pass with both methods.
- Adapt CI to test both methods.
- Update documentation for ST_AsMVTGeom.
- Move uthash to the new `deps` folder.


Although it's almost Christmas I'm expecting some conflict, specially around
the fact that it's bringing a new library and code inside the project and that
it requires new configure flags and a C++11 compiler if you decided to use it.
What are your thoughts?

[1] - https://carto.com/blog/inside/An-update-on-MVT-encoders/
[2] - https://github.com/mapbox/wagyu
[3] - https://github.com/postgis/postgis/pull/356
[4] - https://github.com/postgis/postgis/files/2703289/20181221_mvt_postgis_trunk_vs_20181221_mvt_postgis_wagyu.pdf

Regards

-- 
Raúl Marín Rodríguez
carto.com


More information about the postgis-devel mailing list