[postgis-users] Question about optimal PostGIS schema

Thu Jan 3 08:34:17 PST 2013

On 1/3/2013 10:36 AM, lynx86 wrote:
> Hello
> I'm very new to GIS and have a question concerning the optimal design of the
> schema of my PostGIS geo database.
> The data are about traffic models and consist of geometric objects (roads,
> parkings, POIs, traffic sign positions) and related scalar and textual data
> (e.g. name/description of traffic sign, lane number of a road,...).
> At the moment, the data are partitioned into different tables depending on
> the traffic object which they describe (one table for roads, one for lanes,
> one for traffic signs).
> Each of the tables containes about 50000 to 100000 entries, but probably the
> number will grow in future.
>
> The goal is a more generic schema in which it's easier to
> - adjust positions of objects (e.g. roads, signs)
> - add new (geometric) traffic objects (e.g. public transport stations/lines,
> deviations, sensors)
> - add new metadata to existing traffic objects (e.g. data from a traffic
> census)
> - describe routes
>
> Main purpose of the database:
> - presentation of the geometric data via a GIS server and web application
> (e.g. Geoserver, OpenLayers)
> 	(I need fine-grained partition of the data into different layers, e.g.
> successive addition of different road types or traffic signs in the
> presentation map should be possible)
> - extraction of traffic data and conversion in data formats of simulation
> applications
>
> I got inspired by the schema of OSM, especially by the idea of simple
> geometric objects (points, ways, relations) which are tagged by
> key-value-pairs (metadata).
> As there are three different schemas (apidb, pgsnapshot, osm2pgsql), I
> wonder which of them is the most suitable to my requirements:
> - osm2pgsql allows easy presentation via Geoserver because it uses
> geometries, but extension of the metadata is difficult (no key-value-pairs)
> - pgsnapshot seems to be more suitable for data extraction but I'm not sure
> about the efficiency disadvantage of the hstore columns (e.g. change of one
> single value)
> - how can apidb and pgsnapshot tables be transformed into layers which can
> be displayed by OpenLayers (is there a efficiency disadvantage if I use
> views to convert the apidb data to displayable views?).
>
> Can you give me any hints about best practices?

This is totally my opinion and it is not meant to denigrate the good 
work of the OSM team and contributors.

The OSM schema is very simple with its key-value pairs but I find this 
simplicity a serious down-side for the following reasons:

1. there is very little standardization of keys and values - this makes 
it extremely hard to collect similar items.

2. While I can see why in the beginning this made it easy to get 
started, it lacks any formal definition of structure and relationships. 
For example how is a sign related to a road segment, if I need to split 
the segment in half, can I easily check to see if the sign is associated 
with it? can I easily re-associate it with the correct new segment.

3. polygons are not directly defined, but implied by collecting all 
edges that belong to a given boundary and then constructing a polygon 
from that. What does it mean if an edge was split in two and only one 
part is still assigned to the polygon or an edge is deleted for some 
unrelated reason. These implied relationships are not rigorous and will 
cause issues as entropy takes hold over time.

While it is more work up front, I think you would be better advised to 
think about your use cases and build a model that supports them. You 
could start with a list of distinct objects that you need to model and 
each of these becomes a table with attributes and geometry. If you need 
to add relationships between objects, then add little tables like:

obj1_to_obj2 table with columns obj1_id, obj2_id. These are very fast to 
use in table joins.

YMMV, but this is my 2 cents.

-Steve