[postgis-users] PostGIS supresses the "LIMIT"
Mark Cave-Ayland
mark.cave-ayland at ilande.co.uk
Sat Aug 18 01:52:36 PDT 2007
On Fri, 2007-08-17 at 16:59 -0700, Michael Smedberg wrote:
> Mark,
>
>
>
> Thanks for the reply!
>
>
>
> Yup, adding that WHERE clause DOES make a big difference, but not big
> enough (in a sense.) It doesn’t really explain the performance
> difference (it still seems like maybe the LIMIT is not optimized by
> PostGIS?):
>
>
>
>
>
> Adding a WHERE clause to the simple SELECT query which causes it to
> return EXACTLY the same number of rows causes it to be executed in
> about 1 second. Note that this is filtering on a column that is NOT
> indexed:
>
>
>
> # explain analyze select search_id from property_search_keys where
> search_id <= 699148 limit 1;
>
> QUERY PLAN
>
>
> Limit (cost=0.00..1.01 rows=1 width=4) (actual
> time=1091.551..1091.552 rows=1 loops=1)
>
> -> Seq Scan on property_search_keys (cost=0.00..711933.16
> rows=701567 width=4) (actual time=1091.537..1091.537 rows=1 loops=1)
>
> Filter: (search_id <= 699148)
>
> Total runtime: 1091.579 ms
>
> (4 rows)
>
>
>
>
>
>
>
> In comparison, doing a similar query against the same table, but
> LIMITing on a GIS indexed column (which returns the same number of
> rows) takes about 12 seconds:
>
>
>
>
>
> # explain analyze select search_id from property_search_keys where
>
> # GeomFromText('Polygon((47.339753313135965
>
> # -122.96722412109376,47.339753313135965
>
> # -121.63375854492188,47.877670790949324
>
> # -121.63375854492188,47.877670790949324
>
> # -122.96722412109376,47.339753313135965 -122.96722412109376))', 4269)
> &&
>
> # centroid limit 1;
>
> QUERY PLAN
>
>
> Limit (cost=0.00..1.67 rows=1 width=4) (actual
> time=11426.844..11426.845 rows=1 loops=1)
>
> -> Seq Scan on property_search_keys (cost=0.00..711933.16
> rows=425818 width=4) (actual time=11426.841..11426.841 rows=1 loops=1)
>
> Filter:
> ('0103000020AD100000010000000500000035505C097DAB474001000000E7BD5EC035505C097DAB4740000000808F685EC012E4378457F04740000000808F685EC012E4378457F0474001000000E7BD5EC035505C097DAB474001000000E7BD5EC0'::geometry && centroid)
>
> Total runtime: 11426.865 ms
>
> (4 rows)
>
>
>
>
>
>
>
> Note that they both return the same number of rows:
>
>
>
> # select count(*) from property_search_keys where search_id <= 699148;
>
> count
>
> --------
>
> 699148
>
>
>
> # select count(*) from property_search_keys where
>
> # GeomFromText('Polygon((47.339753313135965
>
> # -122.96722412109376,47.339753313135965
>
> # -121.63375854492188,47.877670790949324
>
> # -121.63375854492188,47.877670790949324
>
> # -122.96722412109376,47.339753313135965 -122.96722412109376))', 4269)
> &&
>
> # centroid;
>
> count
>
> --------
>
> 699148
>
>
>
>
>
> And they’re both a relatively small fraction of the table:
>
>
>
> # select count(*) from property_search_keys;
>
> count
>
> ----------
>
> 11507213
Hi Michael,
Looking at your queries above, I see that even your geometry && centroid
query is not using the index which means something is wrong somewhere. A
couple of questions: which version of PostgreSQL are you using? And have
you done a VACUUM ANALYZE on the database to update the geometry
statistics?
If so, the only unusual thing I see from your schema is this:
Indexes:
"psk_i_centroid_lsd_lsp_br_ba" gist (centroid, last_sale_date,
last_sale_price, num_bedrooms, num_bathrooms) CLUSTER
I think only very recent versions of PostgreSQL could do something
useful with a multi-column index where the first column can be used for
an index scan alone. So my other suggestion would be to add a gist index
solely on the centroid column and check the EXPLAIN to see whether your
12s GIS query starts to use the new index.
The other thing to do would be to post your query to the pgsql-general
list - I'm sure somewhere there would be able to give you a lot more
information (don't forget to include your PostgreSQL version).
HTH,
Mark.
--
ILande - Open Source Consultancy
http://www.ilande.co.uk
More information about the postgis-users
mailing list