[postgis-users] Query optimization using ST_DWithin

Mon Jul 6 15:28:55 PDT 2009

Hi!

I’m using a land use/land cover dataset with roughly 93000 polygons for 
purposes of ecological research with the following query:

SELECT a.gid AS a_gid,
b.gid AS b_gid,
b.gridcode AS b_class,
st_area(b.the_geom) AS b_area,
st_perimeter(b.the_geom) AS b_perimeter,
st_distance(a.the_geom,b.the_geom) AS patch_dist,
st_distance(st_centroid(a.the_geom),st_centroid(b.the_geom)) AS c2c_dist

INTO public.cantanhez03_ftable_fd0100

FROM public.lulc_cantanhez_c03 AS a,
public.lulc_cantanhez_c03 AS b

WHERE st_dwithin(a.the_geom, b.the_geom, 100) AND a.gid <> b.gid

ORDER BY a_gid, b_gid, b_class

The general idea is to retrieve for each patch/polygon the set of 
patches that are within a range of 100 m excluding the initial/focal 
patch from the analysis (i.e.: a.gid <> b.gid) and some area, perimeter 
and distance measurements. The geometry column is indexed with GisT 
(CREATE INDEX "lulc_cantanhez_c03_idx1" ON "public"."lulc_cantanhez_c03" 
USING gist ("the_geom" "public"."gist_geometry_ops");). Some geometries 
extend throughout the area with a large number of vertices (84121, 
25214, 9873, 9485, 8991, 7340, 5219, 5066, 4653, 4439, 4275, 4267, 3620, 
3556, 3288,…) which translates in this cases into a high number of 
vicinal patches. I’m using POSTGIS="1.3.6" GEOS="3.1.0-CAPI-1.5.0" 
PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS; PostgreSQL 8.2.13 on 
i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special) on 
a Win32 XP Pro development machine.
This query can take 12 to 15H on a Pentium D 3.2GHz with 2 GB RAM to 
fully process the dataset. Is there a way to optimize/improve this? What 
should be the approach: server parameters optimization, query redesign, 
tilling, sub-setting?

Thanks in advance:

João G.