Updating Shapefiles, and data integrity

Cameron Shorter cshorter at optusnet.com.au
Sun Oct 10 03:14:45 PDT 1999



-------- Original Message --------
Subject: Re: request for comments...
Date: Sun, 10 Oct 1999 19:16:24 +1000
From: Cameron Shorter <cshorter at optusnet.com.au>
Reply-To: camerons at cat.org.au
To: Stephen Lime <steve.lime at dnr.state.mn.us>
References: <s7ff017e.048 at smtp.dnr.state.mn.us>



Stephen Lime wrote:
> 
> Maintaining data integrity is going to be a big issue. I was at our state GIS conference and got to chat with Jack Dangermond from ESRI about the MapServer and their new ArcIMS product. Seems they're having trouble with this editing stuff. Shapefiles just aren't a transactional environment so unless you can assure yourself of single user access there's always the potential for multiple concurrent edits. Then there's the issue of quality control. I think the solution needs to offer immediate update and delayed update. ArcIMS, as I understand it, caches updates until an operator on the server site commits the edits to the main database. This operator could be a cron process I suppose that could handle locking while
> edits are processed. I think this may be a good approach as you could do some simple transaction management- review, edit and delete, once the initial work was done. Edits could be stored in a shapefile along with attributes and enough additional information to commit the shape - source shapefile, shape (or is a new one), type of edit (replace, attribute change) etc.
> 
> Anyway, just my thoughts...
> 
> Steve

I'm glad to hear I'm not the only one having problems with updating
shapefiles. :)

>From looking at the shapefile definition paper, you can see that there is an
index file .SHX which points to a .SHP file which has variable length records.

There are a few problems that I can see.  Please verify if any of these are
correct or not.
1. Deleting an old object.  I think this can be handled by setting the
shapetype to a NULL shape.

2. Increasing the number of vertices of a shape, and hence increasing the
record size.  I think the best way to handle this is to remove the old shape
by setting its shapetype to NULL, and to add a new shape to the end of the
.SHP file.  The pointer in the .SHX file will now have to be redirected to the
end of the .SHP file.  This now means that the order of the .SHP file and the
.SHX file will not match, which will reduce query speeds, so periodically the
datafiles would need to be rebuilt.

3. There is an issue with the .SHP file and .SHX file becoming out of sync. 
Basically, when a shape is updated, first the .SHP file will need to be
updated, and some time later the .SHX file will be updated.  There is a window
of opportunity where the files will be out of sync.  I was planning to address
this by either putting in a lock file, or changing read/write permissions to
lock the files while the database is out of sync.
This means that some reads of the database will fail because the database is
updating.

4. I'm not sure what the best way is to link into a SQL database.  If the
shapefile is only added to, then the best way to reference an object is by
using the index in the .SHX file.  However, if you delete an object, should
you rebuild the .SHX file?  This will keep the index file from blowing out,
but all the indexes will change and hence the  SQL database will reference the
wrong indices.

Happy for any advice.

Cameron.



More information about the MapServer-users mailing list