[gdal-dev] Append a shapefile to a postgis table using the GDAL/OGR Python interface

Wed Jul 6 15:46:58 EDT 2011

On Wed, Jul 6, 2011 at 2:47 PM, Even Rouault
<even.rouault at mines-paris.org> wrote:
> Le mercredi 06 juillet 2011 17:11:31, Luke Peterson a écrit :
>> On Sat, Jul 24, 2010 at 10:31PM, Chaitanya kumar CH <chaitanya.ch at
>>
> Hum, wait, how come can you use CopyLayer() to append to an existing table ?
> CopyLayer() is supposed to create a new layer, so I don't see how it would
> work with an existing target layer.
>
> Actually I've just tested it to append to an existing Postgis table, and as
> expected, I get :
>
> ERROR 1: Layer XXXXXX already exists, CreateLayer failed.
> Use the layer creation option OVERWRITE=YES to replace it.
>
> So, I don't even understand how you manage to get to the point where features
> are about to be inserted...
>
> So I'm not sure at all the PRESERVE_FID option makes sense. Or do I miss
> something ??
>

You're right, an error should be expected from your append case -- but
I'm not using it to append, I'm running into this FID error on a
simple invocation of CopyLayer() in an empty schema, trying to copy a
layer from a Shapefile DS to a PostGIS DS.

What CopyLayer() is supposed to do (and the code I've written does) is
create a new layer based on an existing layer, copies a field set from
the source layer, then iterates through each of the features in the
source layer, creating a copy in the new layer. The problem is, if I
use it as-is with a shapefile as the source DS (for which OGR has
generated an FID), it will copy and create a new layer, but it will
only insert the first feature and then die with an error saying I need
to manually set the FID to -1. But I can't do that with CopyLayer(),
because CopyLayer() automatically pulls an FID into the destination
feature if one exists in the source DS in each CreateFeature iteration
(line 318). The other thing I've done with mine that's different from
CopyLayer()'s logic is I set the geometry to wkbUnknown -- I've
encountered errors with PostGIS in cases where the first feature being
copied is a wkbPolygon and the second feature is a wkbMultiPolygon,
for example.

The C++ for the CopyLayer() method is here in the DataSource class
source: http://trac.osgeo.org/gdal/browser/branches/1.8/gdal/ogr/ogrsf_frmts/generic/ogrdatasource.cpp

Here's the function I've written (working code snippet incorporating
some of your suggestions from earlier and follows pretty much the same
path as the C++) -- it's just a shorthand "import-a-shapefile" wrapper
in python for a fairly easy task in OGR2OGR that I find myself doing
often enough in python. It's a bit clunky, but has worked in all
encountered use cases thus far:

def loadShapes(serverDS, table, sourceFile, temp=False,
overwrite=False, geography=False):
    if serverDS.GetDriver().GetName() == 'PostgreSQL':
        ogr.RegisterAll()
        shapeDS = ogr.Open(sourceFile)
        if temp == True:
            import random
            table = table+str(random.randint(0,99999))
        options = []
        name = ''
        if overwrite == True:
            options.append('OVERWRITE=YES')
        if geography == True:
            options.append('GEOM_TYPE=geography')

        sourceLayer = shapeDS.GetLayerByIndex(0)
        newLayer =
serverDS.CreateLayer(table,sourceLayer.GetSpatialRef(),ogr.wkbUnknown,options)
        for x in xrange(sourceLayer.GetLayerDefn().GetFieldCount()):
            newLayer.CreateField(sourceLayer.GetLayerDefn().GetFieldDefn(x))

        newLayer.StartTransaction()
        for x in xrange(sourceLayer.GetFeatureCount()):
            newFeature = sourceLayer.GetNextFeature()
            newFeature.SetFID(-1)
            newLayer.CreateFeature(newFeature)
            if x % 128 == 0:
                newLayer.CommitTransaction()
                newLayer.StartTransaction()
        newLayer.CommitTransaction()
        return newLayer.GetName()

The "else" for the path above simply calls CopyLayer() -- the only
other driver I've used for the destination DS is MSSQL, and
CopyLayer() works fine for it. The "temp" option is there because I
couldn't figure out how to generate a temp table, so I figured a
random-integer append on the table name would help avoid collisions if
I needed to run a parallel implementation for the function at some
point. I have to rewrite to make it check to see whether the
randomly-generated name exists, though.

I've got another insertShapes() function defined that has similar
iterative feature insertion language that I will eventually collapse
together with this into a sub-function when I stop writing and start
optimizing code -- except in that one, because I'm always doing a
feature-insert, I simply add a line:

if postgres == True: tempFeature.SetFID(-1)

right above the targetLayer.CreateFeature(tempFeature) call.

where the "postgres" variable is set at the top of the insert function with:

if serverDS.GetDriver().GetName() == 'PostgreSQL': postgres = True

To address your earlier point about field definition collisions:
>       * it is risky to pass to CreateFeature() a feature from another layer. In
>cases the layer definition of the source and target layers are not strictly
>identical, the probability of having crashes is really high. Mismatch can
>easily happen when drivers don't support the exact data type of the source and
>choose a best fit (OFTString) for example. The safe procedure is to instantiate
>a new feature with the layer definition of the target layer and to use
>CopyFrom() to copy the content of the source feature to the target feature

... in my insert function I have a step before the insert that loads
the shapefile-to-be-inserted into a postGIS table (utilizing the
temp==True logic above), then compares its field definition to the
field definition of the target table, ensuring that each of the
fieldName-fieldType combinations in the table-to-be-inserted's
definition exist in the target table's definition, otherwise it errors
out, citing the fields as incompatible (which relies on another
function that returns a set() filled with each name:type pair found in
an input ogr.FeatureDefn).

-----
Luke Peterson