[gdal-dev] Follow on to the "ISO Metadata" post
Tim Crook
tim.crook at sympatico.ca
Mon Oct 26 12:40:58 PDT 2015
Yes, I understand this. Namespaces use can go a long way to prevent loss of source information while presenting information in the new target format. It can be verbose, though.
-------- Original message --------
From: Damian Dixon <damian.dixon at gmail.com>
Date: 26-10-2015 3:04 PM (GMT-05:00)
To: Tim Crook <tim.crook at sympatico.ca>
Cc: doug_newcomb at fws.gov,gdal dev <gdal-dev at lists.osgeo.org>
Subject: Re: Follow on to the "ISO Metadata" post
My thoughts on an XML encapsulation of metadata would be (I'll leave the exact layout and details to the experts):
<product>
<name>name of product</name>
<data>
<format>vpf/shape etc...</format>
<item key="k1" value="val1" />
<item key="k2" value="val2" />
<item key="kN" value="valN" />
</data>
</product>
Problems I can see with this:
Should the data wrap the product?
How do you encapsulate XML metadata?
What information should be captured?
Product
GDAL/OGR reads data and does not identify product.
A product tends to use a carrier format such as shape, S57, VPF, GML, etc... If you know what the product is you can derive additional information that can be very useful in automatic styling or handling of the data.
Encapsulating XML metadata
Some data formats may contain a mixture of binary and XML. Take for example JPEG which contains both binary information and XML data such as Geo Spatial information.
I see no point translating native XML metadata to a different XML format. You risk losing information.
What information should be captured?
Some of the information can be derived from the data or the way the data is stored on the media.
Information that can be derived from the data can be just as important as the metadata stored in the data.
This in part refers to identifying the product, scale of the data, intended us, provenance, use restrictions, modification dates, creation dates, expiry dates, who created the data, etc...
The information may not be in the data itself but alongside the data in additional files.
Derived metadata should be created at the point that the data is read to generate the metadata. Sounds odd but consider, metadata is used in the process of cataloging data so that you can find the data you need for your GIS application.
Key/Value pairs
The keys are unique to the data and potentially to the product.
The aim should be to not lose information that is read from the data.
On 26 October 2015 at 12:59, Tim Crook <tim.crook at sympatico.ca> wrote:
Yes, it had occurred to me that XSLT would be a flexible way of handling a lot of the metadata mappings.
From: Damian Dixon
Sent: Monday, October 26, 2015 8:36 AM
To: Tim Crook
Cc: doug_newcomb at fws.gov ; gdal dev
Subject: Re: Follow on to the "ISO Metadata" post
Hi Tim,
Personally I would not use ISO 19115-1 as an internal format.
There are not a huge number of data formats/products that store metadata as XML out of the box. When they do store metadata it is usually specific to the data and data product (regardless of how the metadata is stored).
There have been attempts at adding metadata alongside data products such as UK MOD profile of IS0 19115 (MOD profile has problems). The French equivalent of the MOD have for a number of years mandated a metadata format alongside all data products used by them (wish I could find the actual standard for the metadata).
The biggest problem is actually mapping from data/'data product' metadata to the target metadata specification.
Just to highlight how much a problem the mapping of fields from one metadata format to another is; we have been arguing off and on for more than a year internally about the meaning of dates and which date should be in which field. Two of our big customers do not agree on the meaning of some of the source data date fields and the mappings we have done.
I believe ESRI have their own internal metadata format that they provide a tool to translate to other XML metadata specifications.
Where I work I have been pushing a per data/'data product' format that is XML based that uses tag value pairs. The tags would basically be a dump of all available information and specific to each data/'data product'. A set of XSLT scripts would then translate the information to what ever metadata standard you wanted to use and if you needed to modify the mapping you could change the XSLT script for that data/'data product'.
We have found that hard-coding the mapping is too costly to maintain and very difficult to get right.
Probably not the answer you are looking for.
Regards
Damian
On 22 October 2015 at 13:29, Tim Crook <tim.crook at sympatico.ca> wrote:
Hello Doug and Damian.
I saw your post about ISO 19103, ISO 19115 and ISO 19115-1. I am starting to look at ticket #3549 (https://trac.osgeo.org/gdal/ticket/3549). This ticket is a specific problem for metadata translation for image transformations to the PCIDSK format. The ticket references JPEG and TIFF.
The first thing I thought of was when I saw your posts was mapping the XML metadata from different sources into an internal format to GDAL, then passing through the information for mapping to the destination format. I suppose there are some image source formats that don't use XML to store their metadata, so this would require additional handling.
I suppose the internal format to GDAL could be XML in the ISO 19115-1 format.
Am I completely off base here?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20151026/83cd6e79/attachment-0001.html>
More information about the gdal-dev
mailing list