[postgis-users] DB design considerations : geoAttributes vs geometry

Tue Apr 12 06:07:54 PDT 2005

Hi All,

Maybe a bit OT, but I have this basic DB design question.  I have a
bunch of data sources (sensors, people ...) that will dump data in
postGIS.  I am wondering what the best trade-off is between redundant
attribute data on which I could build indexes and just using the_geom to
build an index.

Example :

Station 1 : the_geom; [geoInfoColumns] (redundant attribute data);
dataColumns 

Where 

- geoInfoColumns = string {country; province; adminRegion; city ...} 

- dataColumns = the real data

The_geom and dataColumns are hard data.  The question is about optional
geoInfoColumns that I put in square brackets.  When I upload a new
station location, is it worth adding geoInfoColmuns *to use in filters*
or can I rely on the_geom only ?  

The point is that I can make an arbitrary number of such optional
sub-regions available that I could index on, e.g. "select * from table
where sub-sub-sub-region = 'aRegion'", even though those optional
columns really represent an over-determination in the sense that they
are not needed to locate the station : they would be used to speed up
access.

I am not asking to what level the over-determination can go.  That I can
play with.  I am just asking if making the DB much bigger by adding
redundant information in the form of geoInfoColumns attributes is worth
it in terms of access time.  I would rather write utilities to translate
queries filtered by sub-region so that they map on the_geom (e.g. have
some table or function or stroed proc that returns a polygon when I feed
it the region name) rather than add an arbitrary number of attributes
(plus I would avoid being tied to geographical names that are very
likely to change with time, e.g. a political riding or some other
administrative region!).

Thanx for any (basic!) pointers.

Yves Moisan