[postgis-devel] Some thoughts on Topology

Fri Oct 28 01:01:18 PDT 2005

On Thu, Oct 27, 2005 at 11:42:50AM -0600, Charles F. I. Savage wrote:
> Hi everyone,
> 
> I have a few questions/thoughts about the recently committed topology 
> support in PostGIS.  In no particular order:
> 
> 1.  I am a bit worried about duplication of information.  Each 
> topogeometry may include many topology primitives (edges, nodes), but is 
> always assigned to one layer.  Thus it seems to me the layer_id column 
> in the relations column contains redundant information (i.e., is not 
> normalized fully).  I think this matters because its easy to imagine a 
> large database where the relation table would contain millions, if not 
> billions (well no more than 4 billion since the model uses int4) of 
> records.  What about doing something like this instead:
> 
> REATE TABLE city_data.topogeometry
> (
>  topogeo_id int4 DEFAULT nextval("topogeometry_id_seq") NOT NULL,
>  layer_id int4,
>  CONSTRAINT topogeometry_key UNIQUE (topogeo_id)
> )
> 
> REATE TABLE city_data.relation
> (
>  topogeo_id int4 NOT NULL,
>  element_id int4 NOT NULL,
>  element_type int4 NOT NULL,
>  CONSTRAINT relation_layer_id_key UNIQUE (topogeo_id, element_id, 
> element_type)
> )
> 
> I think this change would also remove the need for having the separate 
> topogeometry type.  In feature tables with topogeometry columns you 
> would just specify the topogeom id directly instead of using the more 
> complex topogeometry type (which also seems to me to contain a lot of 
> unnecessary, redundant information).  

Having both layer_id and topogeo_id in city_data.relation speeds up
index scans, where a btree is defined on both keys. Also, your
4 bilions limit really is 4bilions for each layer in each topology.

To obtain better performances we should keep separate RELATION
and TOPOGEOMETRY tables for each layer in a topology. I've been
thinking about using a TopoGeometry table, also for referential
integrity between feature tables and existing TopoGeometries.

About info duplication in TopoGeometry object the only redundancy
is really only TopoGeometry type, as TopologyId, LayerId and
TopoGeoId are needed to reference a specific TopoGeometry.

> I could also imagine you may wish to add "mbr" column to the 
> topogeometry column for quicker spatial scans.  This would work for all 
> types of topogeometries, instead of just faces as the current 
> implementation does (with the mbr column in the faces table, which would 
> be removed with this change).

I'm not sure this would be convenient, adding a separate column
in your feature table would be faster. The thing as that we should
not run into implementation but rather proceed top-down from the
ER schema. We should document the ER schema, assign cardinalities
in the schema and define common operation. Without this information
is hard to tell how to implement the physical schema (yes, already
implemented but more as of proof-of-concept).

> 2.  Next, is it possible to remove the element_type column from relation 
> and move it to the proposed topogeometry table?  I can see the argument 
> that it belongs in the relation table if you have topogeometries that 
> consist of different types of topo primitives.  However, I'm not sure 
> I've ever seen the need for a such a thing. 
> 
> From my experience, its sufficient to support points (maps to nodes), 
> linear topology features (i.e., a topology made up of a number of edges) 
> and area topology features (also made up of edges).  Of course a linear 
> topology feature is made up of edges and nodes, but the edges already 
> know about the nodes so there is no reason to duplicate that information.
> 
> If this is correct, then I'd move the element_type colum to the 
> topogeometry table and eliminate it from the relation table.  If this is 
> not correct, then maybe someone could provide an example where this is 
> needed so I could understand it better?

The TopoGeometry type can be see like a cache.
Take the GeometryType(TopoGeometry) function (not implemented yet).
If you drop it from the TopoGeometry table you'll be forced to 
make assumptions about the type looking at components.
Still (apart from performance) a geometry composed by two faces
could be both a Polygon (the faces are contained one within the other)
or a MultiPolygon (the face are disjoint), or a Collection (that currently
just happen to contain those two faces).

> 3.  I see the implementation quite closely matches what Oracle has 
> done.  Out of curiosity, is this done to meet the SQL MM standard 
> (unfortunately I don't have a copy of the standard and don't feel like 
> paying a few hundred dollars to buy it), or simply to stay close to what 
> Oracle has done?

If you read initial part of topology.sql.in you'll see SQL/MM functions
and PostGIS specific functions. The SQL/MM specification says nothing
about TopoGeometry types, just primitive node,edge and face.

So we are doing a mix of the two.
SQL/MM interfaces are implemented:
	- topology model (a topology is a schema, containing well defined
	  edge,face,node tables)
	- topology operations (see functions with ST_ prefix)

> Interested in hearing feedback to these ideas,
> 
> Thanks,
> 
> Charlie

Thank you for partecipating in this, I was feeling very lonely ;)

--strk;