[postgis-users] PostGIS supresses the "LIMIT"

Michael Smedberg Michael.Smedberg at redfin.com
Fri Aug 17 16:59:30 PDT 2007


Mark,

 

Thanks for the reply!

 

Yup, adding that WHERE clause DOES make a big difference, but not big enough (in a sense.)  It doesn’t really explain the performance difference (it still seems like maybe the LIMIT is not optimized by PostGIS?):

 

 

Adding a WHERE clause to the simple SELECT query which causes it to return EXACTLY the same number of rows causes it to be executed in about 1 second.  Note that this is filtering on a column that is NOT indexed:



# explain analyze select search_id from property_search_keys where search_id <= 699148 limit 1;

        QUERY PLAN         

 Limit  (cost=0.00..1.01 rows=1 width=4) (actual time=1091.551..1091.552 rows=1 loops=1)

   ->  Seq Scan on property_search_keys  (cost=0.00..711933.16 rows=701567 width=4) (actual time=1091.537..1091.537 rows=1 loops=1)

         Filter: (search_id <= 699148)

 Total runtime: 1091.579 ms

(4 rows)

 

 

 

In comparison, doing a similar query against the same table, but LIMITing on a GIS indexed column (which returns the same number of rows) takes about 12 seconds:

 

 

# explain analyze select search_id from property_search_keys where

# GeomFromText('Polygon((47.339753313135965

# -122.96722412109376,47.339753313135965

# -121.63375854492188,47.877670790949324

# -121.63375854492188,47.877670790949324

# -122.96722412109376,47.339753313135965 -122.96722412109376))', 4269) &&

# centroid limit 1;

        QUERY PLAN                                    

 Limit  (cost=0.00..1.67 rows=1 width=4) (actual time=11426.844..11426.845 rows=1 loops=1)

   ->  Seq Scan on property_search_keys  (cost=0.00..711933.16 rows=425818 width=4) (actual time=11426.841..11426.841 rows=1 loops=1)

         Filter: ('0103000020AD100000010000000500000035505C097DAB474001000000E7BD5EC035505C097DAB4740000000808F685EC012E4378457F04740000000808F685EC012E4378457F0474001000000E7BD5EC035505C097DAB474001000000E7BD5EC0'::geometry && centroid)

 Total runtime: 11426.865 ms

(4 rows)

 

 

 

Note that they both return the same number of rows:

 

# select count(*) from property_search_keys where search_id <= 699148;

 count

--------

 699148

 

# select count(*) from property_search_keys where

# GeomFromText('Polygon((47.339753313135965

# -122.96722412109376,47.339753313135965

# -121.63375854492188,47.877670790949324

# -121.63375854492188,47.877670790949324

# -122.96722412109376,47.339753313135965 -122.96722412109376))', 4269) &&

# centroid;

 count

--------

 699148

 

 

And they’re both a relatively small fraction of the table:

 

# select count(*) from property_search_keys;

  count

----------

 11507213

 

 

============================================================

 

 

 

Message: 3

Date: Thu, 16 Aug 2007 23:28:10 +0100

From: Mark Cave-Ayland <mark.cave-ayland at ilande.co.uk>

Subject: Re: [postgis-users] PostGIS supresses the "LIMIT"

      optimization?

To: PostGIS Users Discussion <postgis-users at postgis.refractions.net>

Message-ID: <1187303290.5711.3.camel at mca-desktop>

Content-Type: text/plain; charset=utf-8

 

On Thu, 2007-08-16 at 10:52 -0700, Michael Smedberg wrote:

> I believe that Postgres has an optimization that “short-circuits” plan

> execution when the SQL contains a LIMIT clause, but no “order by” or

> other function that requires the whole result set.  It seems like

> PostGIS turns this optimization off (I’m using PostGIS 1.2.1.)  Is

> this a known bug?  Is there a workaround that would allow me to have

> poorly constrained GIS queries return quickly (if it’s poorly

> constrained, I don’t care about the results, I just want to know that

> there are too many to display)?  In other words, is there a way for me

> to KNOW it’s poorly constrained without having to wait for the whole

> (large) result set to be generated?

 

(cut)

 

> I’m a newbie to this list, so please forgive me if this is a dumb

> question…

 

No, it's actually a good question :)

 

The first thing to notice is that your queries are different because

your second query has a WHERE clause - does your first query behave any

differently with a WHERE clause added? Make sure you choose a condition

that selects a similar percentage of the table as a good test.

 

 

HTH,

 

Mark.

 

-- 

ILande - Open Source Consultancy

http://www.ilande.co.uk <http://www.ilande.co.uk/> 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070817/ab43813c/attachment.html>


More information about the postgis-users mailing list