[gdal-dev] ogr2ogr problem converting UK Ordnance Survey

Peter J Halls P.Halls at york.ac.uk
Fri Jul 2 02:56:44 EDT 2010


Thanks, Frank.  I am pleased to hear that I have probably not missed much in my 
analysis of the current GDAL/OGR facilities.  I have not yet moved to 1.7, so it 
may be that there is a little more in the driver now than I have seen in 1.6.3.

I started out thinking that I could concatenate these multiple objects into 
simple strings ... until I discovered how long these could be.  Admittedly I 
opted to include the date along with the reason in the same string, but at 20 or 
so characters per entry - and some of the other multiples involve rather more 
characters per entry - I realised this was not realistic.  In my sample dataset 
and, eventually, I need to work with national coverage, I have objects 
comprising several thousand 20 character entries - yes, they could be processed 
to become long long integers but that is the same as holding the numeric part as 
text!  I decided that it would be easier to write a parser than to meddle with 
the GML driver ... I think it probably was.

I do not mind admitting that, over the past couple of years, I have spent 
several months effort evaluating apparently promising solutions for ingesting 
Mastermap.  Snowflake's products are out of our budget.  The crunch came this 
spring when advice from Safe Software's support team suggested that I could only 
solve the problems by using the FME API to extend the FME product's capability 
to handle these multiple objects.  It seemed to me that if I had to code my own 
solution, I might as well use an API with which I was already familiar: writing 
interfaces is not my main job!

I think that we need a strategic assessment of whether the OSGB usage of GML is 
likely to be a 'one off' or whether others may also exploit the power of GML in 
a similar fashion.  Certainly in Europe where the INSPIRE Directive is beginning 
to take effect and requires the delivery of interoperable data services there is 
the possibility of others following suit: I am not familiar with the German NAS 
work, so do not know how that might relate, if at all.  If OSGB is a 'one off', 
then the amount of work that would appear to be involved for GDAL/OGR is 
possibly a show stopper; if, however, there is a need for a generic solution 
which will solve the OSGB issues and support a range of other products, such 
that there is a reasonably widespread application, then it is probably worth at 
least a proper assessment of the effort involved.

Best wishes,

Peter

PS my sample gml dataset covers just 50 km square and comes in at 2.8Gb: editing 
constructs to avoid their being processed is not easy with files of that size!

Frank Warmerdam wrote:
> On Thu, Jul 1, 2010 at 4:55 PM, Peter J Halls <P.Halls at york.ac.uk> wrote:
>> Jez, Even,
>>
>>   there are actually several issues relating to using GDAL/OGR to read
>> Ordnance Survey of Great Britain (OSGB) GML files distributed as their
>> Mastermap
>> product.  One of these I reported as bug #1604 - I now find that this was
>> against GDAL-1.4.0 - which concerns handling 'duplicate' tokens: OGR ignores
>> Namespace and so treats <osgb:point> as the starter and fouls on the
>> following
>> <gml:point> which contains the geometry.  There is a similar problem with
>> polygon objects.
> 
> Peter,
> 
> Ouch, the namespace stripping issue is unfortunate.  I'm not sure
> of a cheap fix.
> 
>>   The data Jez describes below is simpler than much of the data in the file
> 
> I did implement some degree of "complex structure flattening" when
> I worked on the custom NAS (german GML profile) reader.  I thought
> perhaps it had made it into the mainline GML reader, but perhaps not.
> If so, I think it could be ported.
> 
> If someone files a ticket specifically on this issue I can try to address
> this or perhaps more likely have Chaitanya do it as he is now getting
> quite familiar with the GML driver.
> 
>> perhaps this next point is not an issue for him.  Several of the tokens are
>> described in the schema snippet as 'unbounded': this means that there can be
>> several instances
>>
>>        <osgb:changeHistory>
>>                <osgb:changeDate>2004-12-19</osgb:changeDate>
>>                <osgb:reasonForChange>Revised</osgb:reasonForChange>
>>                <osgb:changeDate>2002-09-07</osgb:changeDate>
>>                <osgb:reasonForChange>Revised</osgb:reasonForChange>
>>                <osgb:changeDate>2001-03-12</osgb:changeDate>
>>                <osgb:reasonForChange>New</osgb:reasonForChange>
>>        </osgb:changeHistory>
> 
> Hmm, that is also somewhat ugly.  OGR has the concept of a
> string list field type, so in theory this could be reduced to two
> string list fields:
> 
> changeHistory_changeData: 2004-12-19, 2002-09-07,...
> changeHistory_reasonForChange: Revised, Revised
> 
> I also thought I had done something like this for the NAS driver,
> but perhaps it did not make it back into the mainstream GML
> driver.
> 
> Likewise, if a focused ticket is filed, I'll turn this over to
> Chaitanya.
> 
>>  I do not know whether the
>> GDAL/OGR
>> GML driver was designed primarily for writing gml rather than for reading:
>> maybe.
>>
>>   Where does this leave us?  As I mentioned, there are also problems with
>> most
>> other gml readers: this is not solely an issue with GDAL/OGR.  I have an
>> immediate need for the ITN data and have written my own parser to extract
>> the
>> information from the gml source: so far, so good.  However, as I mentioned,
>> there is now the problem of how to store these data: shapefiles use the
>> dBaseIV
>> format and have no structure for handling these multiple attributes.  In a
>> sample dataset, I have a record with 49 changeHistory records, for example;
>> some
>> other multiple constructs have several thousand entries.  I happen to have
>> access to Oracle, although to use GDAL/OGR to write to it requires that I do
>> some significant work on the oci driver: I've been trying to understand the
>> code
>> of that to assess what I can reasonably do.  Alternatively, I could use oci
>> directly and bypass GDAL/OGR entirely.  All this, however, is non-trivial
>> and
>> holding me back from doing what I am supposed to be doing ... but does seem
>> to
>> be the only way forward, having exhausted FME, etc.
> 
> Even if we do the stringlist and related stuff you are quite
> right that there aren't many output formats that will support
> the esoteric arrangement well.  OGR was really intended
> to read *simplistic* GML files that match existing GIS
> type conventions (flat, non-repeating).  I am interested in
> extending it somewhat to read important GML profiles
> reasonably well, but there are limits to how much of this
> can be done without a fundamental rewrite.
> 
> I really do try to discourage GML generators from using
> some of these more esoteric practices.
> 
> Best regards,

-- 
--------------------------------------------------------------------------------
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806     Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
--------------------------------------------------------------------------------


More information about the gdal-dev mailing list