[gdal-dev] Re: Coding for appending to a database in OGR

Jukka Rahkonen jukka.rahkonen at mmmtike.fi
Thu Oct 28 18:37:39 EDT 2010


Peter J Halls <P.Halls <at> york.ac.uk> writes:

> 
> Dear Team,
> 
>      I've reviewed the change log for release 8 so far and do not think that 
> this question is version specific.
> 
>      I'm coding to support an incremental process for loading a large spatial 
> database from a set of distribution objects.  I have noted that my present 
> approach results in the OGR_FID column values restarting at 1 for each append. 
> This is clearly an error!  However, there does not appear to be a method for 
> initialising the 'nextfidtowrite' value in the API.  What is the best strategy 
> to adopt?
>

Try to study how people have solved this problem with WFS-T servers. With WFS
insert case system must be able to append to a tabel and to generate a valid new
fid and also send it back to WFS client after successful insert.

I know that Geoserver default behaviour is to read the max value of primary key
column (what is PK is queried from the database when the new layer is added into
service or then user can show it in a bit peculiar but documented way). Insert
is then done by using fid=(max+1). Of course primary key must be numeric and it
can not be a combined key. This is not reliable in production systems with other
users doing inserts as well because max+1 may not be valid any more when it is
used. And even Geoserver can select the same max value several times if insert
takes a long time. However normally it works OK because primary key constraint
prevents insert with an already used fid and it leads to WFS error and client
can make a new trial.

Better way is to tell Geoserver the name of the Oracle sequence where to select
the nextval and use that as a fid. This works even with concurrent Geoserver
inserts because each insert candidate has for sure an unique fid and it does not
matter which one gets inserted first. However, there will be troubles if some
other system is also making inserts and does not take the fids from the same
sequence. It is possible that some bigger number is already used in the database
but sequence goes behind. Inevidently there will be a collision when sequence
reaches this used number but it will just lead to one failing insert, WFS error,
new trial with next nextval, and finally success.

In theory the right way to do with Oracle is to make a trigger that reads new
fid value from a sequence on inserts. The trouble is that the value of the new
fid is known for sure only after an successful insert. With the Geoserver logic
trigger can not be used. Geoserver thinks "I took an unique value from sequence,
I tried to do an insert by using that as a new fid, database accepted insert and
therefore the new fid in the database must be the value selected from the
sequence and I can send it back to the client". Insert trigger can not be used
because Geoserver wants to know the fid candidate before insert. If there were
an insert trigger it would select again nextval from the sequence and the stored
fid would not be the same than Geoserver believes. I can guarantee that this
lead us to some funny situations before we noticed it and managed to tell the
db-admins that we just cannot use our standard trigger-sequence system with
WFS-T tables.

Inserts into PostGIS are easier because fid can be in a "serial" column and it
handles unique fids automatically. I do not know how Geoserver is getting back
the value of the new fid from PostGIS.

Ogr does not need to get real fids back like WFS-T and if you just want to get
the job done right with Oracle, create a sequence and make a trigger on OGR_FID
column and let the database do the rest. Ogr can send whatever it wants but
Oracle makes the final decision. But that is not the right way to do it with a
general tool like ogr2ogr because this sequence-trigger system is tricky to make
and in well managed systems ogr user does not even have enough rights for that.
I would suggest an approach of "select max OGR_FID and start appending with
fid=(max+1)".

With PostGIS it could be done rather automatic. Ogr might create the OGR_FID
column as serial type and then just do inserts without fids. But what if there
is an error and ogr wants to print it into a log or somewhere? I do not know.

-Jukka Rahkonen-




More information about the gdal-dev mailing list