<div dir="ltr">Interesting!<div><br></div><div>When I initially implemented ST_AsMVTGeom I looked at wagyu as it was already pretty clear the main encoding performance bottleneck was with MakeValid and that wagyu was likely faster. But I was/is too inexperienced with C and C++ to do what you just have done. Nice work!</div><div><br></div><div>/Björn</div></div><br><div class="gmail_quote"><div dir="ltr">Den fre 21 dec. 2018 kl 17:00 skrev Raúl Marín Rodríguez <<a href="mailto:rmrodriguez@carto.com">rmrodriguez@carto.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
When CARTO decided to switch to ST_AsMVT as the default library to generate<br>
MVTs from the database we made a lot of benchmarks and performance<br>
improvements (most of them are available in 2.5 but some won't be available<br>
until 3.0) [1]. The sort summary when you compare it to Mapnik is:<br>
<br>
# Good parts<br>
- Postgis can do work in parallel. When this is triggered, performance<br>
is faster than Mapnik.<br>
- Postgis is faster encoding lines (0.5x to 1.5x faster).<br>
- It is also faster encoding properties (2x faster in tiles with high amount<br>
of properties (40+)).<br>
- It is faster discarding small polygons (up to 20x faster in extreme cases).<br>
<br>
# Average<br>
- It has a similar performance encoding points.<br>
<br>
# Not so good<br>
- It is slower encoding polygons (2x for small [~10 point] polygons, 20x for<br>
big ones [1M points]).<br>
<br>
I also comented this in that blogpost [1]: "It would be interesting to analyze<br>
why Postgis validation (based on GEOS) is way slower than Mapnik’s (based on<br>
Boost), and addressing this would benefit multiple SQL functions that use it,<br>
like St_IsValid."<br>
<br>
After working around this to try to speed up this process (validation takes<br>
~95-98% of the time in ST_AsMVTGeom) I've learned several things:<br>
- St_MakeValid is buggy. There are several tickets in GEOS around this and even<br>
Postgis has some commented out tests showing this buggy behaviour.<br>
- St_MakeValid goes beyond what's necessary for MVTs, even being<br>
counterproductive sometimes (like collapsing polygons into lines, or lines<br>
into points).<br>
- St_MakeValid can create new points that don't respect the MVT integer<br>
coordinates. Thus it is producing invalid MVT geometries.<br>
- I was wrong when I said that Mapnik's validation was based on Boost, it is<br>
actually based on Wagyu [2].<br>
<br>
I started with the idea of improving ST_MakeValid but it soon proved extremely<br>
hard and some of the shortcuts valid for MVT weren't ok for the general case.<br>
With this in mind, a week ago I decided to have a look at Wagyu and see how<br>
hard it would be to integrate it in Postgis. Both the code [3] and the<br>
performance comparison [4] can be seen at Github; it is 1x faster for small<br>
polygons, 20x faster for large ones.<br>
<br>
You might be wondering why I decided to do this in Postgis instead of GEOS,<br>
which is a C++ library that we already depend on. The main reasons is because<br>
I have little to no experience with it, and doing it in the right way would<br>
require to move the validation code from Postgis to GEOS, add the ability to<br>
work with integers (it only allows doubles right now) and then expose this all<br>
through its C-API. So, what has taken me a week (and a good chunk was autotools)<br>
would require multiple months of work.<br>
<br>
I see 2 ways to include the library and I'm not sure which one is best for this<br>
case:<br>
- Use system libraries. Packages for `wagyu` and `geometry.hpp` are only<br>
available in Debian and Fedora, but not in other Linux distributions, OSX or<br>
Windows. If it was widely available I think this would be the best option, as<br>
it has been done with other libraries like geos, protobuf, etc.<br>
- Bring the library code into the project. This is what I've done in my PR.<br>
<br>
A C++11 compiler is required and it should be trivial to switch between the 2<br>
(a couple of configure flags). I think that the first option would be best if<br>
we could have packagers making wagyu more widely available, but even though<br>
both I and my company only use Linux, but I don't want this improvement to<br>
be Linux only.<br>
<br>
Some comments about the PR:<br>
- I've created a minimal C api to do the operations we need (clipping with<br>
validation) for the MVT use case (integers and opposite winding order). Wagyu<br>
has other functionalities like Union or XOR but those aren't exposed, and it<br>
can also work with doubles but for this use case it was 10% slower.<br>
- Using wagyu it's optional and only used if you pass `--use-wagyu` to<br>
configure. It will use CXX and CXXFLAGS, not CC and CFLAGS.<br>
- The library only supports polygons, so any other geometries are still passed<br>
to the make_valid based on GEOS.<br>
- The MVT process now transforms into MVT coordinates before clipping. This is<br>
to make the code more similar for the 2 methods (GEOS and Wagyu) and to make<br>
the clipping process consistent (we had hacks to account for half units and so<br>
on which are now gone).<br>
- Some outputs change when using this library, most notably dropping some<br>
geometries that have extreme self intersections (on input).<br>
<br>
Things that aren't done:<br>
- Add tests directly to libwagyu instead of relying on MVT tests.<br>
- Adapt MVT tests to pass with both methods.<br>
- Adapt CI to test both methods.<br>
- Update documentation for ST_AsMVTGeom.<br>
- Move uthash to the new `deps` folder.<br>
<br>
<br>
Although it's almost Christmas I'm expecting some conflict, specially around<br>
the fact that it's bringing a new library and code inside the project and that<br>
it requires new configure flags and a C++11 compiler if you decided to use it.<br>
What are your thoughts?<br>
<br>
[1] - <a href="https://carto.com/blog/inside/An-update-on-MVT-encoders/" rel="noreferrer" target="_blank">https://carto.com/blog/inside/An-update-on-MVT-encoders/</a><br>
[2] - <a href="https://github.com/mapbox/wagyu" rel="noreferrer" target="_blank">https://github.com/mapbox/wagyu</a><br>
[3] - <a href="https://github.com/postgis/postgis/pull/356" rel="noreferrer" target="_blank">https://github.com/postgis/postgis/pull/356</a><br>
[4] - <a href="https://github.com/postgis/postgis/files/2703289/20181221_mvt_postgis_trunk_vs_20181221_mvt_postgis_wagyu.pdf" rel="noreferrer" target="_blank">https://github.com/postgis/postgis/files/2703289/20181221_mvt_postgis_trunk_vs_20181221_mvt_postgis_wagyu.pdf</a><br>
<br>
Regards<br>
<br>
-- <br>
Raúl Marín Rodríguez<br>
<a href="http://carto.com" rel="noreferrer" target="_blank">carto.com</a><br>
_______________________________________________<br>
postgis-devel mailing list<br>
<a href="mailto:postgis-devel@lists.osgeo.org" target="_blank">postgis-devel@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-devel" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-devel</a></blockquote></div>