[postgis-tickets] [PostGIS] #2878: winnie coughing up blood when trying to build cunit for twkb
PostGIS
trac at osgeo.org
Sun Aug 10 11:29:48 PDT 2014
#2878: winnie coughing up blood when trying to build cunit for twkb
----------------------+-----------------------------------------------------
Reporter: robe | Owner: nicklas
Type: defect | Status: reopened
Priority: blocker | Milestone: PostGIS 2.2.0
Component: postgis | Version: trunk
Resolution: | Keywords:
----------------------+-----------------------------------------------------
Comment(by nicklas):
This was very strange.
When I have tried to catch this performance issue mentioned above, I have
found that it shows only in some cases.
The problem is not directly related to number of calls to varint-
functions. And not directly to number of rows doing those calls. But if I
ask the query to write a new table with the resulting rows it shows.
So it seems like when the db writs to disc and returns to the next row
calling the varint functions it has some overhead, but not if it just
iterates the rows with explain analyze.
This is the most obvious case I have found:
Create a table with a lot of points:
{{{
create table a as
select 'point(1 1)'::geometry as geom from
generate_series(1,5000000);
}}}
Then if comparing r12835 (varint functions in lwout_twkb.c) with r12836
(varint functions in varint.c) I get interesting results:
On query:
{{{
create table c as
select st_astwkb(geom, 0) from a;
}}}
r12835 uses always under 4000 ms and
r12836 uses between 4400 and 4500 ms.
That is more than 10% difference.
But when running
{{{
explain analyze
select st_astwkb(geom, 0) from a;
}}}
I see more or less no difference.
Another evidence for this is if I use the aggregate function of twkb like:
{{{
create table c as
select st_astwkbagg(geom, 0) from a;
}}}
Then I also see no diffrence. I guess that is because the database don't
go in and out of reading and back tu the encoding functions. Interesting
to not is also that doing the last query only takes about 2000 ms. So 2
seconds of overhead is removed by writing the same (almost) at 1 row
instead of 5 millon rows.
You can see the same effect when comparing:
{{{
create table d as
select st_asbinary(geom) from a;
}}}
which takes about 4000 ms with:
{{{
create table d as
select st_asbinary(st_collect(geom)) from a
}}}
which takes about 3100 ms.
I also found that this effect don't show between
{{{
create table d as
select geom from a
}}}
and
{{{
create table d as
select st_collect(geom) from a
}}}
which both uses about 2400 ms.
In Summary:
There is something that takes time to initialize when the database gets
back from disc writing. This overhead is smaller in the r12835 where there
is fewer calls cross source files than in r12836.
This overhead also shows with other functions like ST_AsBinary.
It doesn't show when no PostGIS function is called like when just copying
the geometry as is.
GCC that I am testing on is :
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2878#comment:17>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list