[postgis-users] ST_DWithin Performance

pcreso at pcreso.com pcreso at pcreso.com
Sat Feb 12 18:24:09 PST 2011


Hi Carl, 

If most of your data is reasonably static (which time series data often isn't) we have used this approach successfully with some 250,000,000 records in a table:

use Postgres inheritance facility to partition the table in chunks by time, eg: one year's data per partition. If your request is for data from one year, it can pretty much ignore all the other partitions when searching for data, which speeds things up.

Apply a Postgres clustered index to the timestamp column for each partition.This physically orders the data on disk by the column values, and given timeseries databases are often queried by time, this can significantly improve performance, as each disk read will normally grab sequential data needed by the query, reducing disk reads & seek times required. A clustered index does not need to be applied to the current year's partition unless really useful, as maintaining it requires the index to be rebuilt after every insert or update.

An order by/limit query can be significantly faster with data managed in this way.  

HTH,

  Brent Wood

--- On Sun, 2/13/11, Paul Ramsey <pramsey at opengeo.org> wrote:

From: Paul Ramsey <pramsey at opengeo.org>
Subject: Re: [postgis-users] ST_DWithin Performance
To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
Date: Sunday, February 13, 2011, 11:42 AM

How many records get returned when you drop the LIMIT? If you have to
sort 700K things into order, I can see that taking a second or two.

P.

On Sat, Feb 12, 2011 at 2:16 PM, Carl S. Yestrau Jr.
<carl at featureblend.com> wrote:
> When I removed the "ORDER BY created_on DESC" the results were almost
> instantaneous (which makes sense as it just returns the first 'x'
> result defined by LIMIT). I don't have access to the machine at the
> moment, but can get the plan for that if it's helpful.
>
> I'm pretty confused why things are so slow. The only thing I can think
> of is possibly the data density, I have over 700K records within a 20
> minute window.
>
> On Sat, Feb 12, 2011 at 12:27 AM, strk <strk at keybit.net> wrote:
>> On Fri, Feb 11, 2011 at 02:27:16PM -0800, Carl S. Yestrau Jr. wrote:
>>
>>> Table schema:
>>>                                        Table "public.geobits"
>>>     Column     |            Type             |
>>> Modifiers
>>> ---------------+-----------------------------+------------------------------------------------------
>>>  id            | integer                     | not null default
>>> nextval('geobits_id_seq'::regclass)
>>>  uuid          | uuid                        | not null
>>>  raw           | text                        | not null
>>>  search_config | regconfig                   |
>>>  ip            | inet                        | not null
>>>  user_agent    | text                        |
>>>  created_on    | timestamp without time zone | default now()
>>>  location      | geography(Point,4326)       | not null
>>>  parent_id     | integer                     |
>>>  language_code | text                        |
>>>  group_id      | integer                     |
>>> Indexes:
>>>     "geobits_pkey" PRIMARY KEY, btree (id)
>>>     "geobits_uuid_key" UNIQUE, btree (uuid)
>>>     "geobits_created_on_index" btree (created_on)
>>>     "geobits_group_id_index" btree (group_id)
>>>     "geobits_location_index" gist (location)
>>>     "geobits_parent_id_index" btree (parent_id)
>>>     "geobits_search_config_index" gin (to_tsvector(search_config, raw))
>>> Foreign-key constraints:
>>>     "geobits_group_id_fkey" FOREIGN KEY (group_id) REFERENCES groups(id)
>>>     "geobits_language_code_fkey" FOREIGN KEY (language_code)
>>> REFERENCES languages(code) ON DELETE SET NULL
>>>     "geobits_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES
>>> geobits(id) ON DELETE SET NULL
>>> Referenced by:
>>>     TABLE "geobits" CONSTRAINT "geobits_parent_id_fkey" FOREIGN KEY
>>> (parent_id) REFERENCES geobits(id) ON DELETE SET NULL
>>>
>>>
>>> Slow Query:
>>> EXPLAIN ANALYZE SELECT id, created_on FROM geobits WHERE
>>> ST_DWithin(location, ST_GeographyFromText('SRID=4326;POINT(180 90)'),
>>> 1000) ORDER BY created_on DESC LIMIT 10;
>>>
>>>
>>>                         QUERY PLAN
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>>  Limit  (cost=205352.10..205352.10 rows=1 width=12) (actual
>>> time=4078.057..4078.090 rows=10 loops=1)
>>>    ->  Sort  (cost=205352.10..205352.10 rows=1 width=12) (actual
>>> time=4078.053..4078.064 rows=10 loops=1)
>>>          Sort Key: created_on
>>>          Sort Method:  top-N heapsort  Memory: 17kB
>>>          ->  Seq Scan on geobits  (cost=0.00..205352.08 rows=1
>>> width=12) (actual time=0.015..3100.471 rows=708661 loops=1)
>>>                Filter: ((location &&
>>> '0101000020E610000000000000008066400000000000805640'::geography) AND
>>> ('0101000020E610000000000000008066400000000000805640'::geography &&
>>> _st_expand(location, 1000::double precision)) AND
>>> _st_dwithin(location, '0101000020E6100000000
>>> 00000008066400000000000805640'::geography, 1000::double precision, true))
>>>  Total runtime: 4078.127 ms
>>> (7 rows)
>>
>> The "geobits_location_index" gist (location) index is not being used,
>> for some reason. Why do you say ORDER BY is the culprit ? Does
>> the plan come out differently w/out that ?
>>
>> --strk;
>>
>>  ()   Free GIS & Flash consultant/developer
>>  /\   http://strk.keybit.net/services.html
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110212/a0eae95e/attachment.html>


More information about the postgis-users mailing list