[postgis-users] Partitioning spatial table

Fri Feb 17 06:20:44 PST 2006

Hi, Arnaud,

Arnaud Lesauvage schrieb:

> I have to build a database to hold a very large spatial dataset.
> (all the roads of a country, ~5.000.000 MULTILINESTRINGS).
> I think this would make queries too slow, so I would like to implement
> partitioning.

>From my personal experience, 5 million rows do not yet justify
partitioning, especially on road data, which usually is write once and
then read-only.

> -build one child table for every road-class in the master table (~10
> different classes). Many queries are based on this road-class field.

This partitioning will give you benefits if you often fetch only a
single road class.

> -build one child table for every administrative area in the master table
> (~100 administrative areas).

This partitioning will give you benefits if you often fetch roads from a
single (or only a few) administrative area.

> I could use a constraint like CHECK ( Within(this_geometry,
> AdministrativeAreaGeometry) ), but then querying country-wide would be
> quite difficult...

It's difficult to find constraints that the planner can automatically
map to your queries. If it can't, then it will still query all the
partitions, and you don't benefit.

> At the present time, I have just one huge table with a BTree index on
> the road-class and a gist index on the spatial column, but simple
> queries like
> SELECT my_geom FROM theTable WHERE roadclass=2 AND (somebox_geom &&
> my_geom)
> take ages to run...

Which indices did you put on the table? (send us the output of psql
command "\d tablename").

Have you recently VACUUMed and ANALYZed the table?

Could you send us the output of "EXPLAIN ANALYZE <your query>;"? If
estimations and reality are very different, increasing the statistics
target will help.

In case you're using older PostgreSQL versions, recreating the indices
migth also boost performance (see the postgresql list archives under
"index bloat").

> Do you have any better idea on how to implement this ?

If you don't have NULL geometries, add a NOT NULL constraint to your
geometry column and CLUSTER your table on the geometry index.

You may also want to use partial indices in addition to the full one, e. G.

CREATE INDEX roadclass_2_ids ON theTable USING GIST (my_geom) WHERE
roadclass=2;

Having lots of indices slows down insertion and updates, and reduces
cache efficiency, but can drastically speed up some queries, especially
for road classes that are a few roads only.

HTH,
Markus