[postgis-users] How to improve performance of ST_Within in a small table with large geometries

Paul Ramsey pramsey at opengeo.org
Wed May 13 08:09:52 PDT 2009


Oh, and WRT to updates, don't do them. Instead try and cast your
problem into the space of creating a new table that has the updated
column in it.

P

On Wed, May 13, 2009 at 8:08 AM, Paul Ramsey <pramsey at opengeo.org> wrote:
> Make the big things smaller.
>
> - Break the multi-polygons into single polygons
> - Cut the single polygons up into even smaller things using a grid
> - Parallelize by having one query check for containment with one
> subset of polygons and others check for containment on a
> second/third/fourth (this approach might break down if you get I/O
> bound and start thrashing, but to the extent that the operation
> remains CPU bound it will help)
>
> P.
>
> 2009/5/13 Jorge Arévalo <jorge.arevalo at gmail.com>:
>> Hello,
>>
>> As I said in these threads:
>>
>> http://postgis.refractions.net/pipermail/postgis-users/2009-May/023369.html
>> http://postgis.refractions.net/pipermail/postgis-users/2009-May/023415.html
>>
>> We're working with a table of large geometries (multipolygons) and small
>> number of rows. We want to check in which multipolygon of this table are the
>> points of another table. This another table has ~8 million of points. The
>> SQL code is:
>>
>> --loop for all polygons. T1 has polygons, but we use the center, a point, to
>> do the Within
>> for i in select ogc_fid from T1 LOOP
>>     update T1
>>     set field_to_update = (select updating_field  from T2 where
>> ST_Within(ST_Centroid(T1.wkb_geometry), the_geom) LIMIT 1 OFFSET 0)
>>     where ogc_fid = i;
>> END LOOP
>> --end of loop
>>
>> (We have to update one table with a field of the another one, but the
>> important part is the ST_Within part)
>>
>> In the worst case (point is in the last multipolygon) the query takes 1sec.
>> In the best case, takes 16msec. If we take into account the worst case only
>> (this is, all the queries will take 1sec), we're talking about 93 days... We
>> need to improve the performance of this.  So, we tried:
>>
>> - Force using index instead of sequential scan (SET enable_seqscan TO off).
>> It didn't improve the speed
>> - Use bounding box of multipolygons instead the multipolygons (ST_Envelope).
>> We can't do this because bounding box doesn't take into account
>> InnerBoundaries inside multipolygons, and we have multipolygons with "holes"
>> filled of other multipolygons
>> - Clustering of indexes. It didn't increase the speed
>> - Force_2D(). It didn't increase the speed significantly
>>
>> We're using PostGIS 1.3.5 with GEOS and PROJ support, over PostgreSQL 8.1.1.
>> Our operating system is Suse Linux Enterprise. The hardware is an Intel XEON
>> 2.5 GHZ (64 bits) and 4GB RAM.
>>
>> How could we improve the performance of this? 93 days are too much time. Is
>> it possible to take advantage of the 4-core processor, for example?
>>
>> Thanks in advance
>> Best regards
>>
>> Jorge
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>



More information about the postgis-users mailing list