[postgis-devel] What's the difference betweengeom_accum andbuiltin postgresql array_append

Obe, Regina robe.dnd at cityofboston.gov
Tue Sep 2 21:12:25 PDT 2008


Mark,

This is interesting.  I've been doing all my tests on Union and since the bottleneck has always been ST_Union I wasn't seeing much of a difference in timings, but to just test the raw speed of array_append vs. st_geom_accum,  I decided doing a collect compare would be better since collect pretty much just appends and does a simple collect.  

st_geom_accum actually performs worse most of the time for largish geometries.  Note I also put in the ST_NumGeometries test to limit the latency of network transfer.

My ST_Collect2 looks like this
CREATE AGGREGATE st_collect2(geometry) (
  SFUNC=array_append,
  STYPE=geometry[],
  FINALFUNC=st_collect_garray
);

On my standard usstatebounds sample set - 
POSTGIS="1.3.3" GEOS="3.0.0-CAPI-1.4.1" PROJ="Rel. 4.6.0, 21 Dec 2007" (Windows 2003)
"PostgreSQL 8.2.5 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)"

--55,781 ms (result: 2895 )
SELECT ST_Numgeometries(ST_Collect(the_geom))
FROM usstatebounds

-- 36,359 ms (result: 2895)
SELECT ST_Numgeometries(ST_Collect2(the_geom))
FROM usstatebounds

-- Max number of points of any single geometry is 70,238 points, avg - 140.5
SELECT Max(ST_NPoints(the_geom)), Avg(ST_NPoints(the_geom))
FROM
usstatebounds

(unfortunately I don't have a similar data set on my Linux box) - testing with a smaller number of records with largish geoms which I can do a comparable test).  
Tests on linux  ("PostgreSQL 8.3.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)")
This test on linux with a 8.3 gives same timings POSTGIS="1.3.3" GEOS="3.0.0-CAPI-1.4.1" PROJ="Rel. 4.5.0, 22 Oct 2006" USE_STATS"
--812 ms (result - 56)
SELECT ST_Numgeometries(ST_Collect(the_geom))
FROM  fe_2007_us_state

-- 812 ms (result - 56)
SELECT ST_Numgeometries(ST_Collect2(the_geom))
FROM fe_2007_us_state

-- max 45,056, avg 12,130
SELECT Max(ST_NPoints(the_geom)), Avg(ST_NPoints(the_geom))
FROM
fe_2007_us_state


--437 ms (result - 209)
SELECT ST_Numgeometries(ST_Collect(the_geom))
FROM country_boundaries

-- 312 ms (result - 209)
SELECT ST_Numgeometries(ST_Collect2(the_geom))
FROM country_boundaries

-- max 12,125, avg 455
SELECT Max(ST_NPoints(the_geom)), Avg(ST_NPoints(the_geom))
FROM
country_boundaries

Thanks,
Regina
-----Original Message-----
From: postgis-devel-bounces at postgis.refractions.net on behalf of Mark Cave-Ayland
Sent: Tue 9/2/2008 6:23 PM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] What's the difference betweengeom_accum	andbuiltin postgresql array_append
 
Obe, Regina wrote:
> 
> Getting back to this topic.  Recall I said there was a situation where 
> LWGEOM_accum worked where array_append didn't.  I can't replicate it so 
> I suspect it was an earlier error in my indexing efforts that caused 
> this issue.  I am also not seeing this LWGEOM_accum function used 
> directly in any of the postgis code base.
> 
> So here is my half-assed suggestion since no one seems to know why we 
> should maintain the LWGEOM_accum function instead of just using built-in 
> PostgreSQL array_append function.
> 
> Simply get rid of LWGEOM_accum from our code base (or deprecate it) -- 
> change
> 
> /**CREATE OR REPLACE FUNCTION st_geom_accum(geometry[], geometry)
>   RETURNS geometry[] AS
> '$libdir/liblwgeom.dll', 'LWGEOM_accum'
>   LANGUAGE 'c' IMMUTABLE;
> ***/
> 
> To:
> CREATE OR REPLACE FUNCTION st_geom_accum(geometry[], geometry)
>   RETURNS geometry[] AS
> $$ SELECT array_append($1,$2);  $$
>   LANGUAGE 'sql' IMMUTABLE;
> 
> 
> And that's it.  In tests I've done doing above, I don't see a definitive 
> speed difference and all aggs we have seem to have this function for 
> collecting geometries. 
> 
> I would have suggested replacing our use in aggregates directly with 
> array_append, but thought it would be better to keep this buffer for 3 
> reasons
> 1) We don't have a CREATE OR REPLACE AGGREGATE which makes changing aggs 
> without a drop reload tricky. Changing this function will auto-magically 
> correct the aggs. Granted 1.4 people should be doing a full reload anyway.
> 
> 2) Don't know if anyone is relying on the st_geom_accum function so we 
> probably should keep it around anyway.
> 
> 3) In case we are wrong.  It won't be a major undertaking to correct the 
> mistake.
> 
> Any thoughts, concerns?
> 
> Thanks,
> Regina


Hi Regina,

At a quick glance, I'd say that the LWGEOM_accum function forcibly 
de-TOASTs the input geometries when building the array so it may be that 
you will only see a time difference working very large geometries 
(rather than large numbers of smaller geometries). Does this make any 
difference in your tests?


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
_______________________________________________
postgis-devel mailing list
postgis-devel at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-devel






-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080903/5004de23/attachment.html>


More information about the postgis-devel mailing list