[postgis-devel] Split By function

Obe, Regina robe.dnd at cityofboston.gov
Mon Jun 23 14:13:50 PDT 2008


Instinct.  I guess I just never put two and two together that it can be used with  dump until Dave asked that question and I remembered Kevin using dump in an example.  Strange how if you are asked the same question over and over again you come up with a different answer.

________________________________

From: postgis-devel-bounces at postgis.refractions.net on behalf of Kevin Neufeld
Sent: Mon 6/23/2008 4:44 PM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] Split By function



Where did you see this syntax?  This is great. I had no idea you could
query like that.  Very nice.
SELECT (ST_Dump(the_geom)).geom
FROM ...

So, here's a test sequence of commands to answer your question.
CREATE SCHEMA tmp;
SET search_path to tmp, public;
-- Create a point table with 5000 points
CREATE TABLE _5000 AS SELECT makepoint(random(), random()) AS the_geom
FROM generate_series(1, 5000);
-- Create a point table with 10000 points
CREATE TABLE _10000 AS SELECT makepoint(random(), random()) AS the_geom
FROM generate_series(1, 10000);
CREATE TABLE _20000 AS SELECT makepoint(random(), random()) AS the_geom
FROM generate_series(1, 20000);
-- Create a multipoint table with the 5000 points
CREATE TABLE _5000_m AS SELECT collect(the_geom) AS the_geom FROM _5000 ;
CREATE TABLE _10000_m AS SELECT collect(the_geom) AS the_geom FROM _10000 ;
CREATE TABLE _20000_m AS SELECT collect(the_geom) AS the_geom FROM _20000 ;


As I recall, we determined that generate_series() *does* scale well, but
the ST_GeometryN function does not.
EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT ST_GeometryN(the_geom, generate_series(1,
ST_Numgeometries(the_geom)))
   FROM _1000_m
) a;
-- Total runtime: 398.582 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT ST_GeometryN(the_geom, generate_series(1,
ST_Numgeometries(the_geom)))
   FROM _5000_m
) a;
-- Total runtime: 11793.898 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT ST_GeometryN(the_geom, generate_series(1,
ST_Numgeometries(the_geom)))
   FROM _10000_m
) a;
-- ERROR:  out of memory
-- DETAIL:  Failed on request of size 12.


Ok, so now using the Regina's notation,
EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT (ST_Dump(the_geom)).geom
   FROM _5000_m
) a;
-- Total runtime: 79.239 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT (ST_Dump(the_geom)).geom
   FROM _10000_m
) a;
-- Total runtime: 158.046 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT (ST_Dump(the_geom)).geom
   FROM _20000_m
) a;
-- Total runtime: 319.109 ms


So, not only does ST_GeometryN not scale well (what is that, quadratic
scaling?), but it consumes all memory on my 4GB server.

SELECT (ST_Dump(geom)).geom  on the other hand is linear and has no
problems.

Unfortunately, this doesn't solve the ST_PointN equivalent problem.

I suggest we add a ST_DumpPoints() method to the TODO list!

Cheers,
Kevin


Obe, Regina wrote:
> I recall we were discussing a need for a split by function and that
> ST_GeometryN etc. is slow for large geometries. Unfortunately I don't
> have any humongous geometries to test with.
> 
> I just looked at ST_Dump (not sure why I never used it before, I guess
> because generate_series is more versatile.).  Anyrate it seems to me it
> comes very close to a split by and with range ability.
> 
> I presume ST_Dump is much more efficient than ST_GeometryN since it
> would make only one call out.  Kevin can you test out that old
> generate_series one you had with dump?
> 
> E.g. I can do this
> 
>
> SELECT a.somefield1, a.somefield2, a.geom
> FROM
> (SELECT somefield1, somefield2, (ST_Dump(the_geom)).*
> FROM sometablewithmulties) as a
> WHERE
> a.path[1] between 2 and 5;
> 
> To grab all geometries in all records in a MULTI between 2 and 5 slot.
> Granted I can't think of a use for such a thing.
> 
> Which allows me to wrap this in an sql function wrapper to generate a
> pgis_GeometryRangeN(start int, end int)
> 
> 
> or
> pgis_GeometryAll() returns geometry[]
> 
> similarly the below denatures a MULTIPOLYGON/LINESTRING to single
> POLYGON/LINESTRINGS
> SELECT a.somefield, (ST_Dump(the_geom)).geom
> FROM sometable as a;
> 
> Which allows me to wrap this in a simple SQL function that does all
> sorts of split by stuff.  E.g. for geometry collections I suppose it
> could get interesting with those multidimensional arrays.  Unfortunately
> I guess ST_Dump is not going to deal with single geometries and breaking
> into points etc.  But seems if its efficient for the other case - we can
> create a similar one to deal with the other cases by cutting and pasting
> some of the logic.
> 
> Thanks,
> Regina
> 
>
> 
> 
> 
>
> ------------------------------------------------------------------------
>
> * The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure pursuant
> to Massachusetts law. It is intended solely for the addressee. If you
> received this in error, please contact the sender and delete the
> material from any computer. *
>
> ------------------------------------------------------------------------
>
> * Help make the earth a greener place. If at all possible resist
> printing this email and join us in saving paper. *
>
> * *
>
> * *
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
postgis-devel at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080623/e70551c7/attachment.html>


More information about the postgis-devel mailing list