[gdal-dev] Major performance improvement when reading large
multi-layer GML files
Even Rouault
even.rouault at mines-paris.org
Mon Aug 15 14:25:37 EDT 2011
Hi,
I've just commited in trunk an improvement that will interest people that
process GML files that are hundreds of megabytes large or more.
The commit is http://trac.osgeo.org/gdal/changeset/22939
Below the inlined text of the new section that has been added to the GML help
page (will be refreshed in the online version in a few hours) :
"""
*Performance issues with large multi-layer GML files.*
There is only one GML parser per GML datasource shared among the various
layers. By default, the GML driver will restart reading from the beginning of
the file, each time a layer is accessed for the first time, which can lead to
poor performance with large GML files.
Starting with OGR 1.9.0, the GML_READ_MODE configuration option can be set to
SEQUENTIAL_LAYERS if all features belonging to the same layer are written
sequentially in the file. The reader will then avoid unnecessary resets when
layers are read completely one after the other. To get the best performance,
the layers must be read in the order they appear in the file.
If no .xsd and .gfs files are found, the parser will detect the layout of
layers when building the .gfs file. If the layers are found to be sequential, a
<SequentialLayers>true</SequentialLayers> element will be written in the .gfs
file, so that the GML_READ_MODE will be automatically initialized to
MONOBLOCK_LAYERS if not explicitely set by the user.
Starting with OGR 1.9.0, the GML_READ_MODE configuration option can be set to
INTERLEAVED_LAYERS to be able to read a GML file whose features from different
layers are interleaved. In the case, the semantics of the GetNextFeature()
will be slightly altered, in a way where a NULL return does not necessarily
mean that all features from the current layer have been read, but it could
also mean that there is still a feature to read, but that belongs to another
layer. In that case, the file should be read with code similar to the following
one :
int nLayerCount = poDS->GetLayerCount();
int bFoundFeature;
do
{
bFoundFeature = FALSE;
for( int iLayer = 0; iLayer < nLayerCount; iLayer++ )
{
OGRLayer *poLayer = poDS->GetLayer(iLayer);
OGRFeature *poFeature;
while((poFeature = poLayer->GetNextFeature()) != NULL)
{
bFoundFeature = TRUE;
poFeature->DumpReadable(stdout, NULL);
OGRFeature::DestroyFeature(poFeature);
}
}
} while (bInterleaved && bFoundFeature);
"""
Testing appreciated,
Best regards,
Even
More information about the gdal-dev
mailing list