[postgis-users] Database Design for many tables

Fri Jan 5 17:51:09 PST 2007

Thanks for the initial pointers Webb! 

My replies are in-line below:

On Friday 05 January 2007 16:58, Webb Sprague wrote:
> On 1/5/07, Dylan Beaudette <dylan.beaudette at gmail.com> wrote:
> > On Friday 05 January 2007 01:07, Mark Cave-Ayland wrote:
> > > On Fri, 2007-01-05 at 09:53 +0100, Christian Braun wrote:
> > > > Hi all,
> > > >
> > > > I have a question regarding the database design of a PostGIS
> > > > database.
>
> It might be really interesting if either of you could either direct us
> to entity relation models of your data or post prototype schemas or
> some other design documentation (any attempt would be better than
> nothing); then we would have something to work with.

Our database setup is quite simple, and based in-part on the USDA-NCSS digital 
soil survey products: SSURGO and STATSGO. 

> With respect to soils, I would imagine that you would create a table
> for every classification that partitions the space under question,
> giving each polygon in a table a unique id. If the soils data is
> county by county shapefiles, you probably want to append it to a
> single table, as long as it is the "same thing".

The SURRGO database consists of a single spatial table with polygons, and 
polygon ids, along with an assortment of non-spatial tables. All of the soil 
surveys for CA, AZ, and NV have been loaded into these tables: i.e. combined. 

The STATSGO database has a *nearly* identical structure, with non-spatial 
tables that have the exact same names as their SSURGO brethren. 

For the sake of clarity, I would like to keep the original table names (for 
SSURGO and STATSGO), yet keep them within the same database to allow 
cross-querying. A simple solution would be to prefix the names of the STATSGO 
tables such that 'component' becomes 'ss_component' ... but there must be a 
more elegant approach, no?

In the past (dark and murky) I have kept two separate databases (in MySQL). 
This allowed some form of organization, isolation of namespace, and the 
ability to do cross-DB queries by prefixing the tablenames with the db name. 
i have heard that there is a similar mechanism for postgresql, but I cannot 
figure out how to make tablespaces work nicely with the postgis extensions.

> For non geographic data, typical normalization etc etc stuff applies,
> but you will be able to bring in geographic information through joins
> on the various primary keys.

Right. I think that we have this bit sorted out, with the system providing 
excellent response time and flexibility (way to go postgis team!!).

> If you can keep everything in one big database and have people
> remotely access it (not a problem usually) that would be easiest.  If
> performance suffers, have a master and slave set up, pushing out
> updates after they are QA'ed, otherwise you will have a nightmare of
> version control.

Yes. We are using a single database, with local and remote access by the psql 
client application and QGIS.

> If you have people editing shared data, lets talk :)  ...  Give us
> your SPECIFIC use cases and we can probably help (or at least sound
> intelligent).

Fortunately there will be none of that. This data in question is strictly 
read-only by others on the team. They will be creating subsets (i.e. new 
tables) derived from the original tables.

> If you are using rasters, I have no idea how to integrate postgis with
> them, but you should mention it to the list or in your rough design
> docs.

We'll keep the raster stuff to GRASS + starspan for now.

> If you don't know what an entity relationship model is, or what
> database normalization is, you probably need to read up on database
> design, but that would be worthwhile anyway.  Start with wikipedia, as
> usual.
>
> W

Thanks again for the tips, and looking forward to any input!

Cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341