[gdal-dev] Major performance improvement when reading large multi-layer GML files

Even Rouault even.rouault at mines-paris.org
Mon Aug 15 14:25:37 EDT 2011


Hi,

I've just commited in trunk an improvement that will interest people that 
process GML files that are hundreds of megabytes large or more.

The commit is http://trac.osgeo.org/gdal/changeset/22939

Below the inlined text of the new section that has been added to the GML help 
page (will be refreshed in the online version in a few hours) :

"""
*Performance issues with large multi-layer GML files.*

There is only one GML parser per GML datasource shared among the various 
layers. By default, the GML driver will restart reading from the beginning of 
the file, each time a layer is accessed for the first time, which can lead to 
poor performance with large GML files.

Starting with OGR 1.9.0, the GML_READ_MODE configuration option can be set to 
SEQUENTIAL_LAYERS if all features belonging to the same layer are written 
sequentially in the file. The reader will then avoid unnecessary resets when 
layers are read completely one after the other. To get the best performance, 
the layers must be read in the order they appear in the file.

If no .xsd and .gfs files are found, the parser will detect the layout of 
layers when building the .gfs file. If the layers are found to be sequential, a 
<SequentialLayers>true</SequentialLayers> element will be written in the .gfs 
file, so that the GML_READ_MODE will be automatically initialized to 
MONOBLOCK_LAYERS if not explicitely set by the user.

Starting with OGR 1.9.0, the GML_READ_MODE configuration option can be set to 
INTERLEAVED_LAYERS to be able to read a GML file whose features from different 
layers are interleaved. In the case, the semantics of the GetNextFeature() 
will be slightly altered, in a way where a NULL return does not necessarily 
mean that all features from the current layer have been read, but it could 
also mean that there is still a feature to read, but that belongs to another 
layer. In that case, the file should be read with code similar to the following 
one :

    int nLayerCount = poDS->GetLayerCount();
    int bFoundFeature;
    do
    {
        bFoundFeature = FALSE;
        for( int iLayer = 0; iLayer < nLayerCount; iLayer++ )
        {
            OGRLayer   *poLayer = poDS->GetLayer(iLayer);
            OGRFeature *poFeature;
            while((poFeature = poLayer->GetNextFeature()) != NULL)
            {
                bFoundFeature = TRUE;
                poFeature->DumpReadable(stdout, NULL);
                OGRFeature::DestroyFeature(poFeature);
            }
        }
    } while (bInterleaved && bFoundFeature);
"""

Testing appreciated,

Best regards,

Even



More information about the gdal-dev mailing list