[Qgis-developer] twkb (Tiny WKB) in QGIS?
Nicklas Avén
nicklas.aven at jordogskog.no
Wed Mar 2 04:48:57 PST 2016
Hi
Answers inlined
On Wed, 2016-03-02 at 12:55 +0100, Hugo Mercier wrote:
> Hi,
>
> This is interesting.
>
> But I am pretty sure the possible gain in speed greatly depends on
> multiple other factors. In some scenarios, computing deltas in geometry
> coordinates may take time compared to a direct access to numbers
> represented by raw doubles.
>From working with twkb, I have a feeling that there is several
bottlenecks in the data flow even inside a a single machine.
But I agree with you that the real gain will be when data is transported
over the net.
But I also think as QGIS becomes a more and more respected player in the
GIS-world, the usage is also moved out from single machine usage. Me
myself now have a scenario when we are designing a system running in the
cloud. We will use QGIS-server for rendering wms-service. Then I have to
administrate the data and qgis-project on the server from my office.
I also know local authorities here around starting to use QGIS against
central PostGIS databases.
So I think the need will increase as the usage gets wider.
>
> That would be an interesting addition to the postgres provider, but it
> has to be an option for advanced users and should probably turn the
> layer read only. It could probably be started by adding a geomFromTwkb
> in GeometryFactory and an option to the postgres provider
I don't know anything about the internals in QGIS, but I think that it
is an important point to reduce the handling of uncompressed data. THat
is one of the key features that makes encoding so fast at PostGIS side I
think, that it is encoded more or less directly when read from disk. THe
number of memcopying of the uncompressed dataset is low.
>
> Just out of curiosity: I am not a dba, but I would be surprised if there
> is no way to optimize the bandwith usage of a postgresql server. I am
> pretty sure postgresql network exchanges can be tunneled in ssh. So
> there might be a way to tunnel them with gzip for example. That would be
> interesting to compare the overhead of tinywkb compared to the overhead
> of gzip. Did you run some similar benchmarks ?
I have not benchmarked gzip in that way. But I have compared those three
scenarios
1) Inside PostGIS, encode to twkb and write the result to a new table
2) encode to wkb and write it to a file outside PostGIS and compress to
zip and tar.gz
3) encode as geojson without decimals. Export to a text file and
compress to zip or tar.gz
I compare point 1 one with the compression time in point 2 and 3. The
wkb/geojson encoding and export from db is of course irrelevant.
geojson with 0 decimals takes approx the same size as wkb with full
precision (Makes no difference in wkb how much precision (well with
zeroes it would probably compress better) .
I haven't done this for a while, but the typical numbers is for a table
containing 300 mb of geometries as wkb is that the twkb option above to
encode and write a new table in PostGIS uses about 3 seconds.
Number 2 and 3 uses at least 20 seconds to compress and write result to
disk.
These tests is on the same machine, and the fact that number 1 writes to
a database table shouldn't give any benefit as far as I know.
The result is also smaller for twkb, than the compressed files produced
in 2 and 3.
So, for rendering a dataset with 300 mb of geometries as wkb, qgis would
in total have to wait less than 3 seconds for the encoding (one one
thread since PostGIS uses only one) . And then get only maybe 60 mb
transfer.
And since the database starts spit out the first row more or less at
once, the whole have to be rendered on the client in 3 seconds before
PostGIS will be a bottleneck.
About decoding it should be slightly faster than encoding if done right
I think. And since QGIS can use many threads...
I believe it can be a great boost in many situations.
ATB
Nicklas
>
> On 02/03/2016 11:46, kimaidou wrote:
> > Hi Nicklas,
> >
> > This is indeed a great idea, and will go on further on the step already
> > done here [1] with PostGIS 2.2 backend. QGIS 2.14 now uses
> > ST_RemoveRepeatedPoints as described by one of the optimisations we can
> > use with PostGIS
> >
> > As you said, we could only use this format for rendering purpose, and
> > not for spatial analyses or writing back to the PostgreSQL server.
> >
> >
> > I do not personaly know how much work is needed to achieve this, but
> > this would be a great step further to lower bandwich usage and improve
> > performance.
> >
> > Cheers
> > Michaël
> >
> > [1] https://github.com/qgis/QGIS/pull/2410
> >
> > 2016-03-02 9:24 GMT+01:00 Nicklas Avén <nicklas.aven at jordogskog.no
> > <mailto:nicklas.aven at jordogskog.no>>:
> >
> > Hallo all
> >
> > I have seen this question raised before some time ago on QGIS list.
> > That made me glad :-)
> >
> > But I think maybe time is more right now, since twkb is suppoerted in a
> > released PostGIS version.
> >
> > For anyone not knowing what twkb is:
> > TWKB is a geometry format with similarities to wkb, but uses compressen
> > the same way as MapBox Vector Tiles.
> >
> > I started to work with this format maybe 4 years ago.
> > I added encoding support in PostGIS trunk some time after PostGIS 2.1
> > release.
> >
> > I will not tell the whole history here, but last spring, in 2015 the
> > format went through quite a big face lift. Since CartoDB was interested
> > in using the format between their database servers and rendering
> > servers, Paul Ramsey looked into the format and did some great
> > improvements. He wrote a blog post about the result for CartoDB
> > http://blog.cartodb.com/smaller-faster/
> >
> > Also Javier Santana at CartoDB should be mentioned here since he was
> > part of the discussions.
> >
> > Then with PostGIS 2.2 we now have encoding and decoding of twkb in
> > PostGIS.
> >
> > I have all the time thought that twkb should be a good fit for the
> > communication between PostGIS and QGIS.
> >
> > There is some things to consider though.
> > TWKB doesn't preserve the precision in the coordinates. Because of that
> > it is not suitable for writing back to the db.
> >
> > But the increased size (often about 7 times) compared to wkb would give
> > a real boost to QGIS. Especially Encoding and decoding is fast. For
> > example, writing a table inside PostGIS to a new table like:
> > CREATE TABLE foo AS
> > SELECT ST_Binary(geom) FROM table1;
> >
> > is slower than encoding and writing as twkb
> > CREATE TABLE foo AS
> > SELECT ST_AsTWKB(geom) FROM table1;
> >
> > But, as mentioned, then decimals is removed.
> > The encoder in PostGIS takes a second argument that defines how many
> > decimals that should be included. Default is 0 for meter based
> > projections and 6 or 5 (I don't remember) for 4326. The precision value
> > can also be negative which results in reounded integer values, like -1
> > means 10 meters precision (on meter based projections)
> >
> > So here is my suggestion how to handle things.
> >
> > As an option for people using PostGIS >= 2.2 as backend, they can switch
> > to twkb as transportation format. When zoomed out QGIS asks PostGIS for
> > 0 or -1 decimals precision. When zooming in QGIS requests more
> > precision. But meter precision is enough for quite deep zooming.
> >
> > So, the editing issue. When the user opens a layer for editing QGIS will
> > have to switch to wkb format. If not the whole geoemtry that is edited
> > will loose precision when sent back.
> >
> > It could be possible of course to edit and accept less precision. For
> > GIS-professionals that know how accurate their data is, they know that
> > storing 10 decimals is no gain. They could decide that their data set
> > only have cm precision and work both ways with twkb with 2 decimals.
> >
> > The compression is worse with 2 decimals than 0 decimals of course, but
> > still smaller than wkb. So one approach could be that there is an option
> > to decide with what precision to edit and use twkb as return format too.
> >
> >
> > But as a first step I think decoding at QGIS as a client is a big step
> > forward.
> >
> > I don't have the skills to write this in C++ myself, but everything
> > needed exists in C, both in PostGIS and in a small standalone library
> > that I have been playing with https://github.com/nicklasaven/twkbC.
> >
> > But to make things as fast as possible the decoding should of course be
> > done directly from twkb into QGIS internal format, with no other
> > representations in between.
> >
> > So, if anyone is interested, I will do all that I can to help. But I
> > don't think it would be much faster to get someone on board that knows
> > the QGIS code base and know C++, than me trying to do things myself.
> >
> > So, what do you think?
> > Could this be interesting?
> > If so, what next? This shouldn't be a very big thing if we find a way to
> > work.
> >
> > I guess if it is interesting it is a candidate for QGIS 3.0
> >
> >
> > Best Regards
> >
> > Nicklas Avén
> >
> >
> >
> > _______________________________________________
> > Qgis-developer mailing list
> > Qgis-developer at lists.osgeo.org <mailto:Qgis-developer at lists.osgeo.org>
> > List info: http://lists.osgeo.org/mailman/listinfo/qgis-developer
> > Unsubscribe: http://lists.osgeo.org/mailman/listinfo/qgis-developer
> >
> >
> >
> >
> > _______________________________________________
> > Qgis-developer mailing list
> > Qgis-developer at lists.osgeo.org
> > List info: http://lists.osgeo.org/mailman/listinfo/qgis-developer
> > Unsubscribe: http://lists.osgeo.org/mailman/listinfo/qgis-developer
> >
>
> _______________________________________________
> Qgis-developer mailing list
> Qgis-developer at lists.osgeo.org
> List info: http://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: http://lists.osgeo.org/mailman/listinfo/qgis-developer
>
More information about the Qgis-developer
mailing list