[postgis-users] Large geometry issue

Gregory Williamson Gregory.Williamson at digitalglobe.com
Thu Jul 5 18:11:27 PDT 2007


Having has a little time to look at this in detail I can see why this is ugly ... a very long sinuous shape vs a ton of small rectangles, so the bounding box does get a lot of candidates which then have to be winnowed.

I do find myself wondering at this result:

 POSTGIS="1.2.1" GEOS="3.0.0rc4-CAPI-1.3.3" USE_STATS
select count(*) from wsc_candidates w, oli_req x WHERE w.geometry && x.bbox AND contains(x.oli_req_geom,w.geometry) AND x.oli_req_id = 114672;
 count
-------
     0
(1 row)

Time: 4347.776 ms

versus
POSTGIS="1.2.1" GEOS="2.2.3-CAPI-1.1.1" PROJ="Rel. 4.4.9, 29 Oct 2004" USE_STATS
# select count(*) from wsc_candidates w, oli_req x WHERE w.geometry && x.bbox AND contains(x.oli_req_geom,w.geometry) AND x.oli_req_id = 114672;
 count
-------
   241
(1 row)

Time: 15956.425 ms

Not the time so much -- the box is a bit older -- but the disparity in rows. 241 seems more likely from eyeballing the data in JUMP ... or have I overlooked something obvious ?

Thanks,

Greg W.


-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net on behalf of Martin Davis
Sent: Thu 7/5/2007 5:32 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Large geometry issue
 
No problem.  Actually it's me being blind - I just saw the attachment 
with the test data.

2530 vertices isn't a very big polygon.  So this isn't a problem with 
lots of holes or shells, it simply reveals that we're not extracting the 
maximum possible performance from a querying involving comparing many 
small geoms against a single large one.  Stay tuned - hopefully we'll 
have a better story for this shortly.

Martin

Gregory Williamson wrote:
>
> Martin,
>
> Sorry for the lack of coherence -- 2530 vertices in this test polygon 
> ... it should be in the original posting I sent, an attached zip file 
> with two tables; this is the single entry in the oli_req table. The 
> other table has some 13000 rows of polygons which are either 
> completely within or overlapping with the large poly.
>
> If you can't get at that let me know and I'll send you a copy ...
>
> Greg W.
>
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net on behalf of 
> Martin Davis
> Sent: Thu 7/5/2007 5:09 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Large geometry issue
>
> Greg,
>
> Can you elaborate on your statement "2530 in a sinlge polygon"?  Do you
> mean holes or subPolygons in a MultiPolygon?
>
> Either way, currently JTS is not optimized for working with very large
> numbers of holes and sub-polygons.  As Paul says, we are currently
> working on improving this.  Hopefully we can at least match the
> performance of the Other DMBS(es) for this situation!
>
> Would it be possible for me to obtain this geometry for testing?
>
> Martin
>
>
>
> Gregory Williamson wrote:
> >
> > Dear peoples,
> >
> > I have a problem with a query that uses an absurdly large geometry
> > (2530 in a single polygon). This is srid -1 (part of a large test of
> > postgres vs some other database product). Everything has been vacuumed
> > and analyzed.
> >
> > The initial search to find candidates in a target table is quite fast:
> > catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE
> > x.bbox && w.geometry AND x.id_as_int = 114672;
> >  count
> > -------
> >  13168
> > (1 row)
> >
> > Time: 9.472 ms
> >
> > Trying to get the list narrowed to geometries that are completely
> > contained by the requested shape is slow:
> > catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE
> > x.bbox && w.geometry AND distance(x.geometry,w.geometry) = 0 and
> > x.id_as_int = 114672;
> >  count
> > -------
> >   1112
> > (1 row)
> >
> > Time: 69277.780 ms
> >
> > So I have two questions:
> >   a) anything better to use than "distance(x,y) = 0) ? I tried
> > st_within -- it is about the same speed but returns no polys, which is
> > strange to me, but I also haven't looked at these in detail yet. For
> > example:
> > catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE
> > x.bbox && w.geometry AND st_within(x.geometry,w.geometry) and
> > x.id_as_int = 114672;
> >  count
> > -------
> >      0
> > (1 row)
> >
> > Time: 1173.185 ms
> > (same results with st_within(w.geometry,x.geometry):
> > catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE
> > x.bbox && w.geometry AND st_within(w.geometry,x.geometry) and
> > x.id_as_int = 114672;
> >  count
> > -------
> >      0
> > (1 row)
> >
> >
> >   b) anything I can do to speed things up ? I have tried boosting work
> > mem to 16 megs (from 1) and it made no apparent difference.
> >
> >
> >
> > I have a self contained test case that shows the same behavior -- the
> > one large poly and all the candidates in another table. Apologies for
> > the size; hopefully it's not been mangled in the transfers.
> >
> > Explain analyze of the sample (the sequential is sensible since there
> > is only one row in the table):
> > catest=# explain analyze select count(*) from wsc_candidates w,
> > oli_req x WHERE w.geometry && x.bbox AND
> > distance(w.geometry,x.oli_req_geom) > 0 AND x.oli_req_id = 114672;
> >                                                                      
> > QUERY PLAN                                     
> > 
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Aggregate  (cost=20.28..20.29 rows=1 width=0) (actual
> > time=77232.858..77232.859 rows=1 loops=1)
> >    ->  Nested Loop  (cost=0.00..9.30 rows=4389 width=0) (actual
> > time=6.389..77221.506 rows=12056 loops=1)
> >          Join Filter: (distance(w.geometry, x.oli_req_geom) >
> > 0::double precision)
> >          ->  Seq Scan on oli_req x  (cost=0.00..1.01 rows=1
> > width=40602) (actual time=0.007..0.009 rows=1 loops=1)
> >                Filter: (oli_req_id = 114672)
> >          ->  Index Scan using wsc_c_spatial_ndx on wsc_candidates w 
> > (cost=0.00..8.27 rows=1 width=109) (actual time=0.022..25.991
> > rows=13168 loops=1)
> >                Index Cond: (w.geometry && x.bbox)
> >                Filter: (w.geometry && x.bbox)
> >  Total runtime: 77232.901 ms
> > (9 rows)
> >
> > Time: 77233.773 ms
> >
> >
> > And for the real thing:
> > catest=# explain analyze select count(*) from wtm_sub_cell w,
> > order_line_item x WHERE w.geometry && x.bbox AND
> > distance(w.geometry,x.geometry) = 0 AND x.id_as_int = 114672;
> >                                                                  QUERY
> > PLAN                                          
> > 
> --------------------------------------------------------------------------------------------------------------------------------------------
> >  Aggregate  (cost=141.83..141.84 rows=1 width=0) (actual
> > time=77457.587..77457.588 rows=1 loops=1)
> >    ->  Nested Loop  (cost=5.99..141.83 rows=1 width=0) (actual
> > time=15.682..77456.541 rows=1112 loops=1)
> >          Join Filter: (distance(w.geometry, x.geometry) = 0::double
> > precision)
> >          ->  Index Scan using oli_id_ndx on order_line_item x 
> > (cost=0.00..8.30 rows=1 width=383) (actual time=0.012..0.018 rows=1
> > loops=1)
> >                Index Cond: (id_as_int = 114672)
> >          ->  Bitmap Heap Scan on wtm_sub_cell w  (cost=5.99..132.97
> > rows=32 width=109) (actual time=2.988..21.796 rows=13168 loops=1)
> >                Filter: (w.geometry && x.bbox)
> >                ->  Bitmap Index Scan on wsc_geom_idx1 
> > (cost=0.00..5.98 rows=32 width=0) (actual time=2.828..2.828 rows=13168
> > loops=1)
> >                      Index Cond: (w.geometry && x.bbox)
> >  Total runtime: 77457.633 ms
> > (10 rows)
> >
> > Time: 77458.458 ms
> >
> >
> > The tables involved by size:
> > catest=# select count(*) from wsc_candidates;
> >  count
> > -------
> >  13168
> > (1 row)
> >
> > Time: 2.586 ms
> > catest=# select count(*) from oli_req;
> >  count
> > -------
> >      1
> > (1 row)
> >
> > Time: 0.193 ms
> > catest=# select count(*) from wtm_sub_cell;
> >   count
> > ---------
> >  6399928
> > (1 row)
> >
> > Time: 1776.508 ms
> > catest=# select count(*) from order_line_item;
> >  count
> > --------
> >  395921
> > (1 row)
> >
> > Time: 176.083 ms
> >
> >
> > Many thanks for your time and bandwidth!
> >
> > Greg Williamson
> > Senior DBA
> > GlobeXplorer LLC, a DigitalGlobe company
> >
> > Confidentiality Notice: This e-mail message, including any
> > attachments, is for the sole use of the intended recipient(s) and may
> > contain confidential and privileged information and must be protected
> > in accordance with those provisions. Any unauthorized review, use,
> > disclosure or distribution is prohibited. If you are not the intended
> > recipient, please contact the sender by reply e-mail and destroy all
> > copies of the original message.
> >
> > (My corporate masters made me say this.)
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >  
>
> --
> Martin Davis
> Senior Technical Architect
> Refractions Research, Inc.
> (250) 383-3022
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022

_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070705/251fb848/attachment.html>


More information about the postgis-users mailing list