[gdal-dev] ogr2ogr problem converting UK Ordnance Survey MasterMap
Even Rouault
even.rouault at mines-paris.org
Tue Jul 13 13:37:19 EDT 2010
Peter,
answers below
Le Tuesday 13 July 2010 09:16:45 Peter J Halls, vous avez écrit :
> Even, Jez,
>
> sadly, I am not going to be able to try this out myself for at least a
> couple of weeks, due to other commitments.
>
> I think Even's solution to split the multiple entries into a large
> number of simple entries is workable, however I do have a couple of
> caveats. I think this is pragmatic and the only viable approach for the
> majority of output structures which, like Shapefiles / dBase IV, cannot
> support list columns. I think that this should be adequate for those who
> require to draw maps from these data - the cartographic instructions should
> be easily handled by this solution; my doubts concern the use of these data
> as inputs to analytic processes. My caveats are:
>
> 1) I would like to know how many such columns are present for each 'list'
> and how many were found. For example, the limit might be set at 80 but
> there be 85 in the source, so I would like to know that for a particular
> record only a subset / the first 'n' have been stored. This gives the
> possibility of making adjustments to the parameters. Of course, it also
> takes up a couple of precious columns per list entry, reducing further the
> proportion of the list that can be recorded ...
if you specify -splitlistfields -maxsubfields n, ogr2ogr will just retain the
n first items of the list and discard silently any extra items if the list is
longer. If -maxsubfiles is not specified, it will scan the whole layer to
compute the maximum number of items availabl and create as many subfields as
necessary.
There's no API currently to know the maximum number of items for a column of
StringList, RealList or IntegerList type. This can be done however with
existing OGR API like I did in the scan phase of ogr2ogr -splitlistfields
(this could be done via Python scripting also).
>
> 2) This 'list' structure is used by OS for both 'informational' entries, eg
> changeHistory, and 'data' entries, eg referenceToTopograhicArea (which
> lists all the TOIDS that comprise a complex feature such as, for example,
> 'Station Road' in ITN). In this case, OS are adopting a similar approach
> to that used by ESRI for the old 'coverage' format, by not repeating the
> geometry but creating the geometry once and then making pointers to those
> parts of the geometry required for a specific purpose. Here, truncation of
> the list means that the output data may not be usable for the intended
> purpose. Unfortunately, other than using lists, I cannot see a viable
> alternative as there can be many thousand of these - which can easily
> exceed the maximum number of columns permitted in several of the output
> formats. I had originally been considering comma separated lists in a
> single string, but these can quickly exceed the maximum string length,
> which brings us back to the reasons for Even's solution.
>
> This is a form of topology embedded within the OS data and it might be that
> it is desirable to continue with the 'no topology here' principle. This is
> another of the 'problems', in that many (most today?) spatial data
> structures are not designed to store topology, however topology does have
> its uses.
I'm afraid your needs are a bit specific and go beyond OGR scope, that isn't
designed to address topology issues. Keep in mind it's based on the Simple
Feature model.
>
> 3) What to do when a limit is reached. As I have not had the chance to try
> Even's development yet I do not know what approach has been chosen. From
> the perspective of using the output, I guess that I want a list of the FIDs
> (TOIDS) which contain truncated data structures: this would permit some
> measure of choice when handling these data ... a sort of 'exceptions list'.
> Of course, this does not permit the recovery of the lost data ... nor does
> it allow me to differentiate between those columns that do not matter to me
> and those that do but it may be the most practical approach.
Could be done with OGR coding/scripting. But this becomes a fairly specialized
need and I don't see how it could take place in an existing OGR utility such
as ogr2ogr
>
> Enough: I must try Even's work out for myself ...
>
> Thanks and best wishes,
>
> Peter
>
> Even Rouault wrote:
> > Jez,
> >
> > if you checkout latest GDAL trunk, you'll find a new -splitlistfields
> > option for ogr2ogr that will split fields of type IntegerList, RealList
> > or StringList into as many subfields of single type as necessary. You can
> > also specify -maxsubfields an_integer_value to limit the number of
> > subfields (can be usefull if you just want to keep the first element of
> > the list, or to keep the number of subfields to a reasonable number, as
> > some features from your GML file have a big number of elements in the
> > list)
> >
> > Even
> >
> > Le Monday 12 July 2010 20:04:00 Even Rouault, vous avez écrit :
> >> Jez,
> >>
> >> Yes this is a limitation of the shapefile format (and most drivers,
> >> PostgreSQL databases being one of the exceptions).
> >>
> >> Try adding -fieldTypeToString IntegerList,RealList,StringList to your
> >> ogr2ogr command line. This will transform any field of those types into
> >> a String field by concatenating the values into a single string (what
> >> you can see with ogrinfo). Beware that if the list if longer more than a
> >> few items, there will be a truncation at 80 characters.
> >>
> >> I'm considering to see if it's practical or not to add an option to
> >> ogr2ogr to split fields of type *List into several fields of simple
> >> type.
> >>
> >> Best regards,
> >>
> >> Even
> >>
> >> PS: For the record, in http://download.osgeo.org/gdal/daily/, you can
> >> find daily snapshots of the source code of the trunk (1.8.0dev) and the
> >> 1.7 stable branch.
> >>
> >> Le Monday 12 July 2010 18:09:16 Jez Walters, vous avez écrit :
> >>> Even,
> >>>
> >>>
> >>> I've just rebuilt GDAL/OGR using the latest code from the GDAL 'trunk',
> >>> but now I get the following error using ogr2ogr to convert an OS
> >>> MasterMap chunk (e.g.
> >>> http://www.ordnancesurvey.co.uk/oswebsite/products/innovations/sampleda
> >>>ta /O SMasterMap_Topo/58116-SX9192-2c1.gz) into ESRI shapefiles:
> >>>
> >>> "ERROR 6: Can't create fields of type StringList on shapefile layers."
> >>>
> >>> The various fields for which this error is reported do not appear to be
> >>> in the resultant shapefiles. Unfortunately this makes the new GDAL code
> >>> unusable for me. :-(
> >>>
> >>> Any thoughts?
> >>>
> >>>
> >>> Jez
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Even Rouault [mailto:even.rouault at mines-paris.org]
> >>> Sent: Sunday 11 July 2010 11:12
> >>> To: gdal-dev at lists.osgeo.org
> >>> Cc: Martin Daly; Peter J Halls; Jez Walters
> >>> Subject: Re: [gdal-dev] ogr2ogr problem converting UK Ordnance Survey
> >>> MasterMap
> >>>
> >>> Just to inform you that now that the NAS driver is in GDAL trunk, I've
> >>> been able to port its enhancements to the main GML driver. On the few
> >>> samples I've tested, OS Mastermap GML files seem to be read correctly
> >>> now.
> >>>
> >>> See http://trac.osgeo.org/gdal/ticket/3680
> >>>
> >>> Le Friday 02 July 2010 09:04:38 Martin Daly, vous avez écrit :
> >>>>> Here it is not only GDAL/OGR that has a problem! Currently, I
> >>>>> know of no importer that can handle this construct, other than the
> >>>>> tool (from Snowflake) used by OSGB to generate it - and there is also
> >>>>> the question of onwards storage.
> >>>>
> >>>> Not even close, I'm afraid.
> >>>>
> >>>> There are plenty of tools to read (all parts of) OS MM:
> >>>>
> >>>> http://www.ordnancesurvey.co.uk/oswebsite/products/osmastermap/informa
> >>>>t io n/ technical/software.html
> >>>>
> >>>> e.g. (an excellent one, at a very reasonable price...)
> >>>>
> >>>> http://www.ordnancesurvey.co.uk/oswebsite/products/osmastermap/informa
> >>>>t io n/ technical/software/cadcorp.html
> >>>>
> >>>> Also, as far as I am aware, OS GB use in-house software to generate
> >>>> the data.
> >>>>
> >>>> Martin
> >>>> **********************************************************************
> >>>>* ** ** * This email is confidential and may be privileged and should
> >>>> not be used, read or copied by anyone who is not the original
> >>>> intended recipient. If you have received this email in error please
> >>>> inform the sender and delete it from your mailbox or any other storage
> >>>> mechanism. Unless specifically stated, nothing in this email
> >>>> constitutes an offer by Cadcorp and Cadcorp does not warrant that any
> >>>> information contained in this email is accurate. Cadcorp cannot accept
> >>>> liability for any statements made which are clearly the sender's own
> >>>> and not expressly made on behalf of Cadcorp or one of its agents.
> >>>> Please rely on your own virus check. No responsibility is taken by
> >>>> Cadcorp for any damage arising out of any bug or virus infection.
> >>>> **********************************************************************
> >>>>* ** ** *
> >>>>
> >>>> _______________________________________________
> >>>> gdal-dev mailing list
> >>>> gdal-dev at lists.osgeo.org
> >>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >>>
> >>> The information transmitted is intended only for the person
> >>> or entity to which it is addressed and may contain
> >>> confidential and/or privileged material. If you are not the
> >>> addressee, any disclosure, reproduction, copying,
> >>> distribution, or other dissemination or use of this
> >>> communication is strictly prohibited. If you have received
> >>> this transmission in error please notify the sender
> >>> immediately and then delete this email.
> >>>
> >>> Any representations or commitments expressed in this email
> >>> are subject to contract.
> >>>
> >>> This message has been scanned for viruses and dangerous
> >>> content. However, it is essential that the recipient also
> >>> checks this message using commercially available mail
> >>> scanning and anti-virus software. IPL Information Processing
> >>> Limited accepts no liability for any loss or damage resulting
> >>> from any virus or other dangerous content in this message.
> >>>
> >>> IPL Information Processing Limited is registered in England
> >>> and Wales under company registration number 1418818.
> >>> Registration took place at Cardiff on 10 May 1979. IPL
> >>> Information Processing Limited's registered office and
> >>> normal place of business is Eveleigh House, Grove Street,
> >>> Bath, BA1 5LR, United Kingdom. IPL is also registered for
> >>> Value Added Tax (VAT) under registration number GB 601 2931 83.
> >>
> >> _______________________________________________
> >> gdal-dev mailing list
> >> gdal-dev at lists.osgeo.org
> >> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
More information about the gdal-dev
mailing list