[postgis-users] Partitioning spatial table

Bill Binko bill at binko.net
Fri Feb 17 11:26:29 PST 2006


Arnaud Lesauvage wrote:

> ....

>
>
> Do you have any better idea on how to implement this ?
>
I think I do!  I know there is already another response, and his advice 
is good... but, I have found that the partitioning solution you 
mentioned really solves other problems such as data storage, archival 
support, etc, and not necessarily query performance.

There are many reasons why your query might be slow, but a few simple 
things I'd do are:
1) Make sure it's using the GiST index (as mentioned post a EXPLAIN 
ANALYZE and we'll help).  There are many reasons including some simple 
ones like the sort_mem and shared_buffers parameters in postgresql.conf 
(which by default are just plain wrong)
2) Create functional indexes on any spatial functions you run (such as 
centroid() or area()) and expect to join to
3) Create Conditional indexes on the fields you're considering 
partitioning on.  This is a Biggie, so let me explain:

Lets say you have a field like admin_area_id that you were going to 
partition on.  You usually add a clause like 'WHERE admin_area_id = 4" 
which will limit the rest of the query to 1/100 of the data.

Rather than partitioning the table, just create partitioned 
(conditional) indexes:

First create an index on the field you're going to split on
CREATE INDEX theTable_admin_id on theTable (admin_id);

Cluster it so that the rows are sorted on-disk by admin_id:
CLUSTER theTable on theTable_admin_id;

And create one GiST Index for each value:
CREATE INDEX theTable_admin_gist_1 on theTable USING GIST (theShape 
GIST_GEOMETRY_OPS) WHERE admin_area_id=1;
CREATE INDEX theTable_admin_gist_2 on theTable USING GIST (theShape 
GIST_GEOMETRY_OPS) WHERE admin_area_id=2;
...
CREATE INDEX theTable_admin_gist_100 on theTable USING GIST (theShape 
GIST_GEOMETRY_OPS) WHERE admin_area_id=100;

Now, whenever you apply the 'WHERE admin_area_id = 4' clause, the 
'theTable_admin_gist_4' index will be used.  This index will have a 
spatial index that only includes the shapes where the admin_area_id = 
4.  It will find the entries very quickly...because it only searches 1% 
of the shapes.  It will load them quickly due to the CLUSTER command.

One of the nicest things about this is that there is monthly 
administration of this or other cruft necessary with partitioning: just 
run VACUUM FULL ANALYZE regularly, and you're done.

Hope this helps.

Bill



this will keep the items close together on disk, so that you get most of 
the benefits of the




More information about the postgis-users mailing list