[postgis-tickets] [PostGIS] #2878: winnie coughing up blood when trying to build cunit for twkb

PostGIS trac at osgeo.org
Sun Aug 10 11:29:48 PDT 2014


#2878: winnie coughing up blood when trying to build cunit for twkb
----------------------+-----------------------------------------------------
  Reporter:  robe     |       Owner:  nicklas      
      Type:  defect   |      Status:  reopened     
  Priority:  blocker  |   Milestone:  PostGIS 2.2.0
 Component:  postgis  |     Version:  trunk        
Resolution:           |    Keywords:               
----------------------+-----------------------------------------------------

Comment(by nicklas):

 This was very strange.

 When I have tried to catch this performance issue mentioned above, I have
 found that it shows only in some cases.
 The problem is not directly related to number of calls to varint-
 functions. And not directly to number of rows doing those calls. But if I
 ask the query to write a new table with the resulting rows it shows.

 So it seems like when the db writs to disc and returns to the next row
 calling the varint functions it has some overhead, but not if it just
 iterates the rows with explain analyze.

 This is the most obvious case I have found:

 Create a table with a lot of points:

 {{{
 create table a as
 select 'point(1 1)'::geometry as geom from
 generate_series(1,5000000);
 }}}

 Then if comparing r12835 (varint functions in lwout_twkb.c) with r12836
 (varint functions in varint.c) I get interesting results:

 On query:

 {{{
 create table c as
 select st_astwkb(geom, 0) from a;
 }}}

 r12835 uses always under 4000 ms and
 r12836 uses between 4400 and 4500 ms.
 That is more than 10% difference.

 But when running

 {{{
 explain analyze
 select st_astwkb(geom, 0) from a;
 }}}
 I see more or less no difference.

 Another evidence for this is if I use the aggregate function of twkb like:
 {{{
 create table c as
 select st_astwkbagg(geom, 0) from a;
 }}}
  Then I also see no diffrence. I guess that is because the database don't
 go in and out of reading and back tu the encoding functions. Interesting
 to not is also that doing the last query only takes about 2000 ms. So 2
 seconds of overhead is removed by  writing the same (almost) at 1 row
 instead of 5 millon rows.


 You can see the same effect when comparing:


 {{{
 create table d as
 select st_asbinary(geom) from a;
 }}}
  which takes about 4000 ms with:


 {{{
 create table d as
 select st_asbinary(st_collect(geom)) from a
 }}}
 which takes about 3100 ms.

 I also found that this effect don't show between


 {{{
 create table d as
 select geom from a
 }}}
 and
 {{{
 create table d as
 select st_collect(geom) from a
 }}}
 which both uses about 2400 ms.

 In Summary:
 There is something that takes time to initialize when the database gets
 back from disc writing. This overhead is smaller in the r12835 where there
 is fewer calls cross source files than in r12836.

 This overhead also shows with other functions like ST_AsBinary.

 It doesn't show when no PostGIS function is called like when just copying
 the geometry as is.

 GCC that I am testing on is :
 gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2878#comment:17>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list