[postgis-users] Re: GiST index seems to be ignored?

Josh Livni josh at livniconsulting.com
Fri Mar 9 11:46:50 PST 2007


Regina,

Yeah - it seems like it's not wanting to limit the buffering to the 
single fips result without some serious hand holding.

Regarding your first post:
We actually tried a similar query earlier (I love IRC) - but in any 
case, I swapped the AND in the query you posted in your last email for a 
WHERE, and it took about 115 seconds.

As for your point below:  You are right - my fips field is indeed a 
varchar.  I tried the query you suggest below (quoting the fips) but it 
did not help.  In addition, I made a fips_int column and added an index 
- same thing (each of these took about 115s).

Also, just want to reiterate thx again for everyone's input from this - 
I've learned quite a bit about query optimizing today (still quite a 
ways to go, however).

  -Josh


Obe, Regina wrote:
> Just noticed something else in your prior email which I think Reid may 
> have alluded to.
>  
> Is your fips stored as an integer or as a varchar or text?  It looked 
> from your query plan that it is probably being stored as text which 
> may explain why your fips index is not being used if it really is not 
> being used because it may be casting it to an int to match your where 
> clause thus not finding a suitable index.
>  
> What if you tried the original I gave you with quoting the fips code 
> instead of treating it as number. like below.  Note I also changed my 
> original p.the_geom to centroid(p.the_geom) because I am assuming you 
> have an index on centroid(p.the_geom) and not necessarily on p.the_geom.
>  
> SELECT
>   count(j."lfc_code") as count,
>      sum(J."gross_sqft") as Sum_Gross_Sqft,
>     sum(j."units") as Sum_Units,
>      j."lfc_code"
>     FROM kc_block_groups b, parcels_06 p inner join lfc_comm j on
>   p.pin_key = j.pin_key
>      WHERE expand(b.the_geom,1320) && centroid(p.the_geom)
>      and within(centroid(p.the_geom), buffer(b.the_geom,1320))
>      and b.fips = '530330001001'
>      GROUP BY j."lfc_code"
>  
>
> ------------------------------------------------------------------------
> *From:* postgis-users-bounces at postgis.refractions.net on behalf of 
> Obe, Regina
> *Sent:* Fri 3/9/2007 1:44 PM
> *To:* PostGIS Users Discussion
> *Subject:* RE: [postgis-users] Re: GiST index seems to be ignored?
>
> Interesting.  So was it not using the fips index at all? Just curious 
> can you try this query
>  
> SELECT p.*
> FROM (SELECT buffer(kc_block_groups.the_geom, 1320) as thebuffer FROM 
> kc_block_groups AND kc_block_groups.fips = 530330001001)  b INNER JOIN 
> parcels_06  p   ON b.thebuffer && centroid(p.the_geom)  
> WHERE within(centroid(p.the_geom), b.thebuffer)
>
> ------------------------------------------------------------------------
> *From:* postgis-users-bounces at postgis.refractions.net on behalf of 
> Josh Livni
> *Sent:* Fri 3/9/2007 1:29 PM
> *To:* PostGIS Users Discussion
> *Subject:* Re: [postgis-users] Re: GiST index seems to be ignored?
>
> Regina,
>
> Thanks a lot for your detailed response (and to Reid too).
>
> First, the 90 was a typo in my message, sorry -- the query was always
> using just 1320 with it's slowness.
>
> Also, I did have both a btree index on fips, and a GiST index on
> centroid(parcels_06.the_geom)
>
> I tried your query, and it too, took a couple minutes.
>
> Moving on to some good news for me (and my not having to use my hack
> workaround of temp tables with pre-calculated buffers):
>
> Turns out, Mark Cave-Ayland was exceedingly helpful on IRC and managed
> to solve this little issue, where it looks like forcing the planner's
> hand was the key.
>
> He narrowed it down, so we noticed that:
> SELECT p.*
> FROM kc_block_groups b, parcels_06 p   
> WHERE buffer(b.the_geom,1320) && centroid(p.the_geom)  
> and b.fips = 530330001001;
>
> took > 110,000 ms to run.
>
>
> whereas:
> SELECT *
> FROM parcels_06 AS p
> WHERE  
>      (SELECT buffer(the_geom, 1320)    AS the_geomx  
>                    FROM kc_block_groups     WHERE fips = 530330001001)  
>      && centroid(p.the_geom)
> AND within(centroid(p.the_geom),
>     (SELECT buffer(the_geom, 1320)  AS the_geomx
>                   FROM kc_block_groups  WHERE fips = 530330001001) );
>
> took ~300ms to run.
>
> Nice!
>
> I put this back in my orig query, and it took about 1500ms:
> SELECT
>    count(j."lfc_code") as count,
>    sum(J."gross_sqft") as Sum_Gross_Sqft,
>    sum(j."units") as Sum_Units,
>    j."lfc_code"
>    FROM kc_block_groups b, parcels_06 p inner join lfc_comm j on
> p.pin_key = j.pin_key
>    WHERE
>    (SELECT buffer(the_geom, 1320)
>    AS the_geomx
>    FROM kc_block_groups
>     WHERE fips = 530330001001)    && centroid(p.the_geom)
>    and within(centroid(p.the_geom), (SELECT buffer(the_geom, 1320)
>    AS the_geomx
>    FROM kc_block_groups
>     WHERE fips = 530330001001) )
>    and b.fips = 530330001001
>    GROUP BY j."lfc_code" ;
>
> However, this compared to a ~350ms return time for the query on my temp
> table of pre-buffered polygons:
> SELECT
>    count(j."lfc_code") as count,
>    sum(J."gross_sqft") as Sum_Gross_Sqft,
>    sum(j."units") as Sum_Units,
>    j."lfc_code"
>    FROM kc_bg_qm_buffer b, parcels_06 p inner join lfc_comm j on
> p.pin_key = j.pin_key
>    WHERE b.the_geom && centroid(p.the_geom)
>    and within(centroid(p.the_geom), b.the_geom)
>    and b.fips = 530330001001
>    GROUP BY j."lfc_code" ;
>
> As I think Mark noted, since my queries are pretty much spatial, further
> subselects to ensure the spatial stuff is done first might help.  This
> is my next step, but meanwhile here is a really significant improvement
> - I hope the notes above might aid someone in a similar situation.  
> Thanks Mark!
>
> Cheers,
>
>
>   -Josh
>
> Obe, Regina wrote:
> > Do you have a gist index on your p.the_geom.
> > 
> > I think expand is much more efficient than buffer for your && and I
> > think your centroid && instead of just using p.the_geom is totally
> > wasteful.  So I would change to do the falling
> > 
> > 1) Add a btree index on fips on your block groups  - a attribute index
> > is almost always more selective than a spatial index.
> > 2) I think you can drop that buffer index you have going, but if you
> > do need to keep it change it to expand(b.the_geom, 90) - for this
> > particular query, it probably won't be used anyway if you have an
> > index on fips.
> > 2) Make sure you have an index on p.the_geom
> > 3) Change your query to
> > 
> >  SELECT
> >    count(j."lfc_code") as count,
> >     sum(J."gross_sqft") as Sum_Gross_Sqft,
> >     sum(j."units") as Sum_Units,
> >     j."lfc_code"
> >    FROM kc_block_groups b, parcels_06 p inner join lfc_comm j on
> >  p.pin_key = j.pin_key
> >     WHERE expand(b.the_geom,90) && p.the_geom
> >     and within(centroid(p.the_geom), buffer(b.the_geom,1320))
> >     and b.fips = 530330001001
> >     GROUP BY j."lfc_code"
> >
> > ------------------------------------------------------------------------
> > *From:* postgis-users-bounces at postgis.refractions.net on behalf of
> > Reid Priedhorsky
> > *Sent:* Fri 3/9/2007 12:40 PM
> > *To:* PostGIS Users Discussion
> > *Subject:* [postgis-users] Re: GiST index seems to be ignored?
> >
> > Josh Livni wrote:
> >
> > > I am basically trying to find points within a buffered polygon.  I've
> > > created a GiST index on my polygons:
> > > CREATE INDEX kc_block_groups_qm_buffer_the_geom_gist
> > >  ON kc_block_groups
> > >  USING gist
> > >  (buffer(the_geom, 1320::double precision));
> >
> > > SELECT
> > >    count(j."lfc_code") as count,
> > >    sum(J."gross_sqft") as Sum_Gross_Sqft,
> > >    sum(j."units") as Sum_Units,
> > >    j."lfc_code"
> > >    FROM kc_block_groups b, parcels_06 p inner join lfc_comm j on
> > > p.pin_key = j.pin_key
> > >    WHERE buffer(b.the_geom,90) && centroid(p.the_geom)
> > >    and within(centroid(p.the_geom), buffer(b.the_geom,1320))
> > >    and b.fips = 530330001001
> > >    GROUP BY j."lfc_code"
> >
> > within() won't take advantage of the index, and with the operator that
> > does (&&), your buffer() call uses a different buffer distance. What
> > happens if you create the index on buffer(the_geom, 90)?
> >
> > My other guess is that you use a typecast in the index creation but not
> > the query.
> >
> > Good luck,
> >
> > Reid
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> > ------------------------------------------------------------------------
> >
> > *
> > The substance of this message, including any attachments, may be
> > confidential, legally privileged and/or exempt from disclosure
> > pursuant to Massachusetts law. It is intended solely for the
> > addressee. If you received this in error, please contact the sender
> > and delete the material from any computer.
> > *
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >  
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   





More information about the postgis-users mailing list