[postgis-devel] Split By function

Kevin Neufeld kneufeld at refractions.net
Mon Jun 23 13:44:45 PDT 2008


Where did you see this syntax?  This is great. I had no idea you could 
query like that.  Very nice.
SELECT (ST_Dump(the_geom)).geom
FROM ...

So, here's a test sequence of commands to answer your question.
CREATE SCHEMA tmp;
SET search_path to tmp, public;
-- Create a point table with 5000 points
CREATE TABLE _5000 AS SELECT makepoint(random(), random()) AS the_geom 
FROM generate_series(1, 5000);
-- Create a point table with 10000 points
CREATE TABLE _10000 AS SELECT makepoint(random(), random()) AS the_geom 
FROM generate_series(1, 10000);
CREATE TABLE _20000 AS SELECT makepoint(random(), random()) AS the_geom 
FROM generate_series(1, 20000);
-- Create a multipoint table with the 5000 points
CREATE TABLE _5000_m AS SELECT collect(the_geom) AS the_geom FROM _5000 ;
CREATE TABLE _10000_m AS SELECT collect(the_geom) AS the_geom FROM _10000 ;
CREATE TABLE _20000_m AS SELECT collect(the_geom) AS the_geom FROM _20000 ;


As I recall, we determined that generate_series() *does* scale well, but 
the ST_GeometryN function does not.
EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT ST_GeometryN(the_geom, generate_series(1, 
ST_Numgeometries(the_geom)))
   FROM _1000_m
) a;
-- Total runtime: 398.582 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT ST_GeometryN(the_geom, generate_series(1, 
ST_Numgeometries(the_geom)))
   FROM _5000_m
) a;
-- Total runtime: 11793.898 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT ST_GeometryN(the_geom, generate_series(1, 
ST_Numgeometries(the_geom)))
   FROM _10000_m
) a;
-- ERROR:  out of memory
-- DETAIL:  Failed on request of size 12.


Ok, so now using the Regina's notation,
EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT (ST_Dump(the_geom)).geom
   FROM _5000_m
) a;
-- Total runtime: 79.239 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT (ST_Dump(the_geom)).geom
   FROM _10000_m
) a;
-- Total runtime: 158.046 ms

EXPLAIN ANALYZE
SELECT count(*)
FROM (
   SELECT (ST_Dump(the_geom)).geom
   FROM _20000_m
) a;
-- Total runtime: 319.109 ms


So, not only does ST_GeometryN not scale well (what is that, quadratic 
scaling?), but it consumes all memory on my 4GB server.

SELECT (ST_Dump(geom)).geom  on the other hand is linear and has no 
problems.

Unfortunately, this doesn't solve the ST_PointN equivalent problem.

I suggest we add a ST_DumpPoints() method to the TODO list!

Cheers,
Kevin


Obe, Regina wrote:
> I recall we were discussing a need for a split by function and that 
> ST_GeometryN etc. is slow for large geometries. Unfortunately I don't 
> have any humongous geometries to test with.
>  
> I just looked at ST_Dump (not sure why I never used it before, I guess 
> because generate_series is more versatile.).  Anyrate it seems to me it 
> comes very close to a split by and with range ability. 
>  
> I presume ST_Dump is much more efficient than ST_GeometryN since it 
> would make only one call out.  Kevin can you test out that old 
> generate_series one you had with dump?
>  
> E.g. I can do this
>  
> 
> SELECT a.somefield1, a.somefield2, a.geom
> FROM
> (SELECT somefield1, somefield2, (ST_Dump(the_geom)).*
> FROM sometablewithmulties) as a
> WHERE
> a.path[1] between 2 and 5;
>  
> To grab all geometries in all records in a MULTI between 2 and 5 slot. 
> Granted I can't think of a use for such a thing.
>  
> Which allows me to wrap this in an sql function wrapper to generate a
> pgis_GeometryRangeN(start int, end int)
>  
>  
> or
> pgis_GeometryAll() returns geometry[]
>  
> similarly the below denatures a MULTIPOLYGON/LINESTRING to single 
> POLYGON/LINESTRINGS
> SELECT a.somefield, (ST_Dump(the_geom)).geom
> FROM sometable as a;
>  
> Which allows me to wrap this in a simple SQL function that does all 
> sorts of split by stuff.  E.g. for geometry collections I suppose it 
> could get interesting with those multidimensional arrays.  Unfortunately 
> I guess ST_Dump is not going to deal with single geometries and breaking 
> into points etc.  But seems if its efficient for the other case - we can 
> create a similar one to deal with the other cases by cutting and pasting 
> some of the logic.
>  
> Thanks,
> Regina
>  
> 
>  
>  
>  
> 
> ------------------------------------------------------------------------
> 
> * The substance of this message, including any attachments, may be 
> confidential, legally privileged and/or exempt from disclosure pursuant 
> to Massachusetts law. It is intended solely for the addressee. If you 
> received this in error, please contact the sender and delete the 
> material from any computer. *
> 
> ------------------------------------------------------------------------
> 
> * Help make the earth a greener place. If at all possible resist 
> printing this email and join us in saving paper. *
> 
> * *
> 
> * *
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list