Updating Shapefiles, and data integrity

Tue Oct 12 09:00:08 PDT 1999

see my comments below...

Brent Fraser

----- Original Message -----
From: Cameron Shorter <cshorter at optusnet.com.au>
To: mapserver <mapserver-users at lists.gis.umn.edu>
Sent: Sunday, October 10, 1999 4:14 AM
Subject: Updating Shapefiles, and data integrity

>
>
> -------- Original Message --------
> Subject: Re: request for comments...
> Date: Sun, 10 Oct 1999 19:16:24 +1000
> From: Cameron Shorter <cshorter at optusnet.com.au>
> Reply-To: camerons at cat.org.au
> To: Stephen Lime <steve.lime at dnr.state.mn.us>
> References: <s7ff017e.048 at smtp.dnr.state.mn.us>
>
>
>
> Stephen Lime wrote:
> >
> > Maintaining data integrity is going to be a big issue. I was at our
state GIS conference and got to chat with Jack Dangermond from ESRI about
the MapServer and their new ArcIMS product. Seems they're having trouble
with this editing stuff. Shapefiles just aren't a transactional environment
so unless you can assure yourself of single user access there's always the
potential for multiple concurrent edits. Then there's the issue of quality
control. I think the solution needs to offer immediate update and delayed
update. ArcIMS, as I understand it, caches updates until an operator on the
server site commits the edits to the main database. This operator could be a
cron process I suppose that could handle locking while
> > edits are processed. I think this may be a good approach as you could do
some simple transaction management- review, edit and delete, once the
initial work was done. Edits could be stored in a shapefile along with
attributes and enough additional information to commit the shape - source
shapefile, shape (or is a new one), type of edit (replace, attribute change)
etc.
> >
> > Anyway, just my thoughts...
> >
> > Steve
>

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
v

Enterprise-wide editing can require a lot of infrastructure to support it.
A large-scale
implementation might include (this is only one scenario):

 o  a data repository  / warehouse / database
 o  project workspaces for editing
 o  a "view-only" copy of the data

Typical workflows would include:

1. Edit: operator identifies features in warehouse for editing, locks them,
extracts them
    to the project workspace.  The features are edited, possibly reviewed,
then checked
    back into the warehouse.  This is sometimes known as a "long
transaction"
    Some things that may be important:
        1. feature level locking (as apposed to file locking) to prevent
simultaneous editing
        2. feature lineage tracking: timestamps, feature "retirement"
instead of deletion
        3. theme security: certain departments can edit only specific themes

2. Copy:  at a pre-determined schedule, the warehouse is copied to the
"View-only"
    database.  This may include re-formatting, indexing and distributing the
data to get better
    performance for viewing.  Depending on the edits, the copy could be once
a day,
    once a month, etc.  The good thing about this approach is that the user
    (viewer/querier) has a stable data set to operate on.  The bad thing is
it might not be
    up to date.

3. Viewing: the data is queried and rendered for thick and thin client apps.

Of course all this might be unnecessary if you only have occasional edits
and a few
viewers....

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
v

> I'm glad to hear I'm not the only one having problems with updating
> shapefiles. :)
>
> >From looking at the shapefile definition paper, you can see that there is
an
> index file .SHX which points to a .SHP file which has variable length
records.
>
> There are a few problems that I can see.  Please verify if any of these
are
> correct or not.
> 1. Deleting an old object.  I think this can be handled by setting the
> shapetype to a NULL shape.
>
> 2. Increasing the number of vertices of a shape, and hence increasing the
> record size.  I think the best way to handle this is to remove the old
shape
> by setting its shapetype to NULL, and to add a new shape to the end of the
> .SHP file.  The pointer in the .SHX file will now have to be redirected to
the
> end of the .SHP file.  This now means that the order of the .SHP file and
the
> .SHX file will not match, which will reduce query speeds, so periodically
the
> datafiles would need to be rebuilt.
>
> 3. There is an issue with the .SHP file and .SHX file becoming out of
sync.
> Basically, when a shape is updated, first the .SHP file will need to be
> updated, and some time later the .SHX file will be updated.  There is a
window
> of opportunity where the files will be out of sync.  I was planning to
address
> this by either putting in a lock file, or changing read/write permissions
to
> lock the files while the database is out of sync.
> This means that some reads of the database will fail because the database
is
> updating.
>
> 4. I'm not sure what the best way is to link into a SQL database.  If the
> shapefile is only added to, then the best way to reference an object is by
> using the index in the .SHX file.  However, if you delete an object,
should
> you rebuild the .SHX file?  This will keep the index file from blowing
out,
> but all the indexes will change and hence the  SQL database will reference
the
> wrong indices.

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
v
How about a unique key stored in the dbf file used to join to the SQL
database?

This would allow for many shapefiles joining to a single SQL table (might be
useful if the data is tiled.)
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
v

>
> Happy for any advice.
>
> Cameron.
>