[postgis-users] Dilemma: is gid really useful in a geoDB?
Mark Millman
mmillman at markmillman.com
Thu Feb 22 10:40:50 PST 2007
The problem is that the original SHAPE file doesn't have any attributes. It
has attribution only by a one-to-one row correlation with an associated DBF
file. Therefore the only way to associated the geometries found in the
SHAPE file with the non-spatial attribution found in the DBF file is by the
row number. Blame ESRI and 30 years of history on this one.
The argument of using a automatic number or sequence as opposed to "natural"
data as a primary key is an old one and both sides of the argument have
validity. Simple numbers index faster, join faster, and are found in almost
every database I've ever seen; however they are intrinsically meaningless.
But there are a lot of "John Smith"s in the world so we invent things like
Social Security Numbers and GIDs.
Historically GIS has supported many data sets that have no attribution,
other than their feature name; or they do have a unique "natural" attribute.
There are a lot of edges to the Mississippi River and I'd hate to have to
come up with a meaningful naming convention to keep them all unique. One
might be tempted to use a distance criteria; until one remembers that rivers
move and the next elbow that is cut off would result in changes that would
be very difficult to manage. So the only opportunity for a primary key is
the something like a GID.
So something like GID is important if you want to associate records that do
not have "natural" keys or if you want to avoid seriously long primary keys
like "USA,WA,Whatcom,Bellingham,Mill Ave,1340" for a primary key.
There is another argument for the use of a GID-like key and that is the
separation of spatial data from attribution, in a manner similar to that
used by some CAD based GIS like Bentley's Geographics and Intergraph's old
MGE. These systems allowed for a single spatial instance to serve as a
topological boundary to many themes. A single polyline could be the
boundary of a River, Municipality, County, and State; all at the same time.
Adjusting the line changed the all the associated boundaries simultaneously.
Geographics created topology dynamically based on the boundaries and
requested themes. Themes had display priority so that the line was always
rendered appropriately for the senior most theme in the display list.
There are reasonable arguments in favor of such model in a purely database
solution too. Each of the themes could have a foreign key to a common
spatial data table rather than embedding the spatial data directly in the
"theme" record. This would avoid duplication of data, avoid double
plotting, and allow applications to manage shared boundaries consistently.
Mark Millman
Mizar, LLC
mark.millman at mizar.com
www.mizar.com
589 S Beach Rd.
Point Roberts, WA 98281
(360) 945-2643
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Havard
Tveite
Sent: Thursday, February 22, 2007 3:07 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Dilemma: is gid really useful in a geoDB?
In the relational model, the primary key can be a combination
of several attributes. If you have such a combination of
attributes, there is (from the relational model's point of
view) no need for a separate gid.
Also, if geometry is considered fist class data type of the
(extended) relational model, location itself can, in principle,
act as (or be a part of) a primary key (as long as you can
guarantee that there is only one object per location).
As pointed out in several other postings, the "problem" is that
a lot of GIS software tend to rely on having a single attribute
primary key, and that this key has to be of a specific type.
The system (PostgreSQL) generated "oid" can also be used as a
primary key.
Håvard
Marc Compte wrote:
> Hi,
>
> Ermm ... wasn't one of Codd's rules "Each and every datum (atomic value)
> in a relational data base is guaranteed to be logically accessible by
> resorting to a combination of table name, primary key value and column
> name."?
>
> I find it surprising to see that "there's no need for a gid, from the
> database point of view" and that primary keys are only needed because
> "most applications want some way of identifying specific rows", as the
> very core definition of the relational model clearly identifies this as
> a crucial part of the model. Wanting to identify specific rows is not an
> application oddity, it's part of the very definition of the model.
>
> Every table needs to have a key, not only to conform with the relational
> model's theory, but for pragmatical purposes as well. Even (or
> specially) from the database point of view, no key means there is no way
> to update a unique field, there is no way to extract the information
> about one individual record, there is no way to cross-reference tables
> ... in short, no key means something like there is no relational
> database, only a flat list of data ... and PostgreSQL, even though it
> incorportes OO elements, has still a big part of relational, right?
>
> Whether you have to be using that particular field (gid) or not depends
> on your data. I don't think naming your key field "gid" is defined in
> any standard (it shouldn't be anyway). As other people said, shp2pgsql
> automatically generates this one for you, but I assume that is just to
> make your life easier and make sure there is at least one way to
> uniquely identify records.
>
> The only important thing to do should be to make sure that there's at
> least 1 field with values that can be used to uniquely identify any of
> the records of your table. If you are sure there's another field (or
> combination of fields) that serves the purpose right, you wouldn't
> actually need to have that particular gid field. You should be able to
> delete it and still be able to use on that table all of the features of
> the database and any other surrounding application by using your own
> self-made primary key.
>
> If an application has a problem with any of this (multiple field keys,
> using a different name for the key field other than gid, ...) then you
> could probably say it is not well designed and you might want to switch
> to something else.
>
> Marc
>
> En/na Markus Schaber ha escrit:
>> Hi, Antonello,
>>
>> Antonello <antonello.monetsen at gmail.com> wrote:
>>
>>
>>> I have a dilemma: is gid really useful in a geoDB?
>>> I am building a geodb for several harbor. When I create a layer(table
>>> with geometry) in postgis, I need a gid like a primary key, or I can
>>> use a different one? Can I use several field of from the table like
>>> a primary key?
>>> I am not new in the gis, but why in almost all geoDB I saw there is a
>>> gid?! Is it really necessary? Do it come from shapefile standards?
>>>
>>
>> Technically, there's no need for a gid, from the database point of
>> view, as SQL deals with sets of rows.
>>
>> However, most applications want some way of identifying specific rows,
>> especially when updating / editing data.
>>
>> GID is generated by shp2pgsql.
>>
>>
>>> For example QuantumGIS doesn't work if the layer has only one primary
>>> key, there are other application that don't work like qGis?
>>>
>>
>> QuantumGIS has the additional oddity that the primary key has to be of
>> type int4 - int8 or serial (which is optimal for primary keys) is not
>> accepted, AFAIK.
>>
>> For read-only access, you can try MezoGIS which makes very little
>> assumptions about your database, and you can specify layers via
>> arbitrary queries.
>>
>>
>> Regards,
>> Markus
>>
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
--
Håvard Tveite
Department of Mathematical Sciences and Technology, UMB
Drøbakveien 14, POBox 5003, N-1432 Ås, NORWAY
Phone: +47 64965483 Fax: +47 64965401 http://www.umb.no/imt
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list