FW: Updating Shapefiles, and data integrity
Fawcett, David
david.fawcett at moea.state.mn.us
Wed Oct 13 06:19:18 PDT 1999
This may be fine as an "extension", but if it became integral, it would certainly change the product from a free program to a pretty expensive one.
Even if a non-commercial solution to this issue is a little kludgy and slower than real-time, it may serve most people's needs, and will avoid spendy Oracle or SDE licenses.
David Fawcett
Minnesota Office of Environmental Assistance
> ----------
> From: Sullivan, James R.[SMTP:SullivanJ at nima.mil]
> Sent: Wednesday, October 13, 1999 7:51 AM
> To: 'Stephen Lime'; camerons at cat.org.au; bfraser at geoanalytic.ab.ca; mapserver-users at lists.gis.umn.edu
> Subject: RE: Updating Shapefiles, and data integrity
>
> Believe a better approach would be to develop a Mapserver interface to
> Oracle Spatial or ESRI SDE. Let the database do all the locking, etc.
> This would answer the mail on a number of other issues, too. Both have open
> C api that can be downloaded via the net.
>
>
> Jim Sullivan
> NIMA / TES
>
> -----Original Message-----
> From: Stephen Lime [SMTP:steve.lime at dnr.state.mn.us]
> Sent: Tuesday, October 12, 1999 5:37 PM
> To: camerons at cat.org.au; bfraser at geoanalytic.ab.ca;
> mapserver-users at lists.gis.umn.edu
> Subject: Re: Updating Shapefiles, and data integrity
>
> The more I think about this, the more I understand why I never
> attempted it.
> Locking is a real pain in the CGI world. When do you lock, when a
> record
> is requested or when edits are submitted? If the latter then there
> is a chance
> more than one person could request the same shape. I don't think
> that
> on-the-fly edits are possible robustly. Somehow I think edits need
> to be cached
> and commited "behind the scenes". It's essential the shp, shx and
> dbf records
> remain in sync. What about something like this:
>
> Assuming the there is a mechanism to request a shape and attributes
> (and a checkout
> time) and make changes. A user now sends back some edits. This
> causes a record to
> be written to a "pending" database. What actually gets saved are
> things like source
> shapefile, feature id (-1 for new), timestamp, etc. The actually
> edits get saved in some format
> (shapefile) as a file whose name can be reconstructed from elements
> in the pending database.
> Now, periodically a process could go through and commit the edits in
> the pending database
> to production (not web accessible) versions. When this is finished
> the updated stuff
> could be swapped in for the old stuff and the pending database
> purged (in part). The
> commiting of the shapes would essentially involve rebuilding the
> shapefile from the user
> edits and the production version (i.e. pick the edited version if it
> exists). Put a lock in
> place while versions are being swapped and remove it when done,
> probably only a
> few seconds. You could even maintain a history by saving previous
> versions for
> some period of time or retiring shapes to some external format
> (shapefile).
>
> As requests for shapes come in a quick check of the pending database
> could be used
> to identify re-edits. If a timestamp is set when a shape is
> requested then it could be
> compared against edits in the pending database to identify possible
> problems. If a user
> requests an edited shape just send the pending edits as if they were
> part of the current
> shapefile. New shapes are just added to the pending database and
> make their way
> into the main database as part of the update process.
>
> Sounds complicated but really is only 2 processes, 1 database and a
> bunch of cached
> edits. Timestamps can help alleviate simultaneous edits and a the
> worst thing a user
> would see would be a message like "The record you're submitting has
> changed since
> you requested it, cannot process the edit. Would you like to work
> from the edited version?".
>
> Again, without some sort of a persistant connection I doubt that>
> real-time editing is possible.
> One could bump the commit time up to a few minutes or even seconds
> though so it would
> certainly seem real-time.
>
> (Cameron, this approach would involve no editing of Frank's shapelib
> at all since all you're
> doing is reading and writing individual records. The effort goes
> into getting all the communication
> working right. Could even be a perl script with system calls to the
> shapelib utils for creating
> files and adding info.)
>
> Steve
>
> Stephen Lime
> Internet Applications Analyst
> MIS Bureau - MN DNR
>
> (651) 297-2937
> steve.lime at dnr.state.mn.us
>
> >>> "bfraser" <bfraser at geoanalytic.ab.ca> 10/12 11:00 AM >>>
> see my comments below...
>
> Brent Fraser
>
>
> ----- Original Message -----
> From: Cameron Shorter <cshorter at optusnet.com.au>
> To: mapserver <mapserver-users at lists.gis.umn.edu>
> Sent: Sunday, October 10, 1999 4:14 AM
> Subject: Updating Shapefiles, and data integrity
>
>
> >
> >
> > -------- Original Message --------
> > Subject: Re: request for comments...
> > Date: Sun, 10 Oct 1999 19:16:24 +1000
> > From: Cameron Shorter <cshorter at optusnet.com.au>
> > Reply-To: camerons at cat.org.au
> > To: Stephen Lime <steve.lime at dnr.state.mn.us>
> > References: <s7ff017e.048 at smtp.dnr.state.mn.us>
> >
> >
> >
> > Stephen Lime wrote:
> > >
> > > Maintaining data integrity is going to be a big issue. I was at
> our
> state GIS conference and got to chat with Jack Dangermond from ESRI
> about
> the MapServer and their new ArcIMS product. Seems they're having
> trouble
> with this editing stuff. Shapefiles just aren't a transactional
> environment
> so unless you can assure yourself of single user access there's
> always the
> potential for multiple concurrent edits. Then there's the issue of
> quality
> control. I think the solution needs to offer immediate update and
> delayed
> update. ArcIMS, as I understand it, caches updates until an operator
> on the
> server site commits the edits to the main database. This operator
> could be a
> cron process I suppose that could handle locking while
> > > edits are processed. I think this may be a good approach as you
> could do
> some simple transaction management- review, edit and delete, once
> the
> initial work was done. Edits could be stored in a shapefile along
> with
> attributes and enough additional information to commit the shape -
> source
> shapefile, shape (or is a new one), type of edit (replace, attribute
> change)
> etc.
> > >
> > > Anyway, just my thoughts...
> > >
> > > Steve
> >
>
>
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> v
>
> Enterprise-wide editing can require a lot of infrastructure to
> support it.
> A large-scale
> implementation might include (this is only one scenario):
>
> o a data repository / warehouse / database
> o project workspaces for editing
> o a "view-only" copy of the data
>
> Typical workflows would include:
>
> 1. Edit: operator identifies features in warehouse for editing,
> locks them,
> extracts them
> to the project workspace. The features are edited, possibly
> reviewed,
> then checked
> back into the warehouse. This is sometimes known as a "long
> transaction"
> Some things that may be important:
> 1. feature level locking (as apposed to file locking) to
> prevent
> simultaneous editing
> 2. feature lineage tracking: timestamps, feature
> "retirement"
> instead of deletion
> 3. theme security: certain departments can edit only
> specific themes
>
> 2. Copy: at a pre-determined schedule, the warehouse is copied to
> the
> "View-only"
> database. This may include re-formatting, indexing and
> distributing the
> data to get better
> performance for viewing. Depending on the edits, the copy could
> be once
> a day,
> once a month, etc. The good thing about this approach is that>
> the user
> (viewer/querier) has a stable data set to operate on. The bad
> thing is
> it might not be
> up to date.
>
> 3. Viewing: the data is queried and rendered for thick and thin
> client apps.
>
> Of course all this might be unnecessary if you only have occasional
> edits
> and a few
> viewers....
>
>
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> v
>
>
> > I'm glad to hear I'm not the only one having problems with
> updating
> > shapefiles. :)
> >
> > >From looking at the shapefile definition paper, you can see that
> there is
> an
> > index file .SHX which points to a .SHP file which has variable
> length
> records.
> >
> > There are a few problems that I can see. Please verify if any of
> these
> are
> > correct or not.
> > 1. Deleting an old object. I think this can be handled by setting
> the
> > shapetype to a NULL shape.
> >
> > 2. Increasing the number of vertices of a shape, and hence
> increasing the
> > record size. I think the best way to handle this is to remove the
> old
> shape
> > by setting its shapetype to NULL, and to add a new shape to the
> end of the
> > .SHP file. The pointer in the .SHX file will now have to be
> redirected to
> the
> > end of the .SHP file. This now means that the order of the .SHP
> file and
> the
> > .SHX file will not match, which will reduce query speeds, so
> periodically
> the
> > datafiles would need to be rebuilt.
> >
> > 3. There is an issue with the .SHP file and .SHX file becoming out
> of
> sync.
> > Basically, when a shape is updated, first the .SHP file will need
> to be
> > updated, and some time later the .SHX file will be updated. There
> is a
> window
> > of opportunity where the files will be out of sync. I was
> planning to
> address
> > this by either putting in a lock file, or changing read/write
> permissions
> to
> > lock the files while the database is out of sync.
> > This means that some reads of the database will fail because the
> database
> is
> > updating.
> >
> > 4. I'm not sure what the best way is to link into a SQL database.
> If the
> > shapefile is only added to, then the best way to reference an
> object is by
> > using the index in the .SHX file. However, if you delete an
> object,
> should
> > you rebuild the .SHX file? This will keep the index file from
> blowing
> out,
> > but all the indexes will change and hence the SQL database will
> reference
> the
> > wrong indices.
>
>
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> v
> How about a unique key stored in the dbf file used to join to the
> SQL
> database?
>
> This would allow for many shapefiles joining to a single SQL table
> (might be
> useful if the data is tiled.)
>
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> v
>
>
> >
> > Happy for any advice.
> >
> > Cameron.
> >
>
More information about the MapServer-users
mailing list