[postgis-users] Re: GiST index seems to be ignored?

Mark Cave-Ayland mark.cave-ayland at ilande.co.uk
Sun Mar 11 04:22:42 PDT 2007


On Fri, 2007-03-09 at 12:34 -0800, Josh Livni wrote:
> Mark,
> 
> Nice followup:
> 
> "Nested Loop  (cost=5.03..25.13 rows=1 width=586) (actual 
> time=56.236..183.194 rows=1734 loops=1)"
> "  Join Filter: within(centroid(p.the_geom), b.thebuffer)"
> "  ->  Index Scan using bg_fips on kc_block_groups  (cost=0.00..8.02 
> rows=1 width=1911) (actual time=53.812..53.818 rows=1 loops=1)"
> "        Index Cond: ((fips)::text = '530330001001'::text)"
> "  ->  Bitmap Heap Scan on parcels_06 p  (cost=5.03..17.04 rows=3 
> width=586) (actual time=2.356..56.674 rows=2038 loops=1)"
> "        Filter: (b.thebuffer && centroid(p.the_geom))"
> "        ->  Bitmap Index Scan on parcels_06_centroid_geom_gist  
> (cost=0.00..5.03 rows=3 width=0) (actual time=1.903..1.903 rows=2038 
> loops=1)"
> "              Index Cond: (b.thebuffer && centroid(p.the_geom))"
> "Total runtime: 186.480 ms"
> 
> 
> Sweet.
> 
>   -Josh

Okay, I didn't see the materialize node I was expecting, but the planner
is now doing the right thing which is effectively creating a table
containing a single row of your buffered geometry and then joining it
onto the parcel table. That's probably about as fast as that query is
going to get.

Now you've got this far, you can then use this query to generate the
list of pin_keys to feed into your aggregates and then perform an inner
join between the two datasets like this:


SELECT count(j."lfc_code") as count, sum(J."gross_sqft") as
Sum_Gross_Sqft, sum(j."units") as Sum_Units, j."lfc_code" FROM (SELECT
p.pin_key FROM (SELECT buffer(kc_block_groups.the_geom, 1320) as
thebuffer FROM kc_block_groups AND kc_block_groups.fips = 530330001001
ORDER BY kc_block_groups.fips) b INNER JOIN parcels_06 p ON b.thebuffer
&& centroid(p.the_geom) WHERE within(centroid(p.the_geom), b.thebuffer))
AS l, lfc_comm AS j WHERE l.pin_key = j.pin_key GROUP BY j."lfc_code"


You should find that the query above is close to optimal, and so
shouldn't be too far adrift from your temporary table workaround. Also
it's worth saying as a general note for people following this thread
that I don't like doing this too much since you're forcing a particular
plan which is unlikely to scale as the datasets get larger. However, in
particular applications such as yours, it can be a very powerful
technique.


Kind regards,

Mark.





More information about the postgis-users mailing list