[MetaCRS] Standard (and simple) format for conversion tests.

Wed Nov 4 16:13:41 EST 2009

I have some responses to Norm's excellent comments below.

Norm wrote: "1> I prefer a simple .CSV type of test file format.  The
test file would then be a totally portable, non-binary, text file
(limited to 8 bit characters for more portability?), be easily parsed in
any language or application, and it can be easily maintained using
something like Excel, MySql, etc (anything that can export a simple .CSV
file)."

I think CSV is a little less readable, but I don't have a problem using
it.

Norm wrote: " I would like to see a "test type" field in the record
format which will support testing things addition to the basic "convert
this coordinate test".  Thus, datum shift tests, geoid height tests,
grid scale, convergence, vertical datum tests, etc. could all be
included in a single database."

Good idea Norm. Can you send me a quick list of the types you think
should be included. I think I may see an informal file format
description in the works, and it would be good to list these types
there.

Norm wrote: "We should strive for a "Source of Test Data" field
requirement in the database which indicates the source of the test.
That is, where did the test case data come from.  The source should
always (?) be something outside of the MetaCRS project base."

Do you mean the expected coordinate values that verify a passing test?
This presents us with an interesting chicken or egg first problem. The
only way to verify some conversions may be with our own
code/calculations. I agree we need to make sure tests aren't incorrect,
but to some degree this may have to come from good management of the
test database. What if we have a system where we keep two sets of test
data files. The first set will be "official" and the second set will be
"experimental". We won't move a test data from the experimental set to
the official set until there has been a peer reivew of some type.

Norm wrote: " Some sort of environment field would be nice.  That is, a
bit map sort of thing that would enable a program to skip certain tests
based on the environment (i.e. presence of the Canadian NTv2 data file
for example)."

It sounds like we might need some sore of "test execution requirements"
line, that outlines the dependencies for proper test execution. I don't
know if we can do this with a bit map or not. I think I would favor
plain english names for requirements. We could agree to have a consensus
on the names for requirements and keep a list of them in the file format
spec.

I think as we package more of this information, CSV will become less
handy. How would we separate a list of requirements in a CSV file? By
using another delimiter? This can quickly get messy. Just a thought...

Norm wrote: " A simpler database would result if we require separate
entries in the database to test both the forward and inverse cases.  I
prefer the latter, as inverse testing is not always appropriate and it
supports item 9 below."

I agree. I think we should keep "forward" and "backward" test data in
separate files.

Norm wrote: "Test data values should be entered in the form as the
source material (to the degree possible), implying (for example) that
geographic coordinates may be entered as degrees, minutes, and seconds
or decimal degrees."

I didn't think about units. Do we need to specify units as part of the
test data file, or will the units be defined by the CRS? Are there any
situations where a CRS doesn't specify a unit? If you are saying we
should require source and expected coordinate values to be in the same
units as defined by the CRS, then I think this is a good idea.

Norm wrote: "Tolerances in the test database should be based on the
quality or nature of the "Source of Test Data".  It could be a serious
legal issue if we publish something suggesting that this is the correct
result.

None of our projects will produce the exact same result, nor will any
other library match any of ours precisely.  At this level I do not think
it appropriate for MetaCRS to make the call as to which is the correct
one.  Therefore I suggest that the format be designed such that any
library (MetaCRS or otherwise) be able to simply publish a file with the
result produced by the library as opposed to a Boolean condition
indicating whether or not they meet the MetaCrs standard. standards.  It
is then up to the consumer of that information to decide which one is
correct.  This may be an important legal issue as well.  (Notice that
EPSG has never included test cases in their database.)"

To be completely honest, I'm not worried about this liability very much.
We use all sorts of open source software with the understanding that we
use it at our own risk.

I think we simply document how we determine if a test passes or fails as
you suggested, and then release the test data files under the public
domain or creative commons. I know this might be an issue for some
larger corporations, but I'm not worried about getting sued because
someone who built a bridge in the wrong place blames it on one of my
test data files.

I will, however, follow the wishes of the majority in this regard. I
just don't want to take something simple and make it overbearing because
of liability concerns.

Norm wrote: " Coordinate system references should be by EPSG number
where ever possible.  I suggest a format of the "EPSG:3745" type.  In
cases where this won't work, the test database should include a
namespace qualifier and then the definition:

	CSMAP:LL84
	PROJ4:'+proj=utm +zone=11 +datum=WGS84'
	ORACLE:80114
	.
	.
	.
Test applications would, of course, skip any test which it is incapable
of deciphering the CRS's referenced.

The CS-MAP distribution includes a test data file named TEST.DAT which
includes a couple thousand test cases.  The comments in this file
usually indicate the "Source of Test Data" to some degree.  Many need to
be commented out due to environmental reasons, thus item 5 above."

I'd like to keep the file format for test data as simple as we can. I
don't really want to stick long winded CRS definitions in them. I think
we should (1) keep separate files for separate definition systems as
Martin suggested or (2) use a mapping as Martin suggested.

Only my 2 cents. I'm glad we got a conversation started. It is also good
to see an Autodesk guy on this list, as an Autodesk user. :]

Landon

-----Original Message-----
From: Norm Olsen [mailto:norm.olsen at autodesk.com] 
Sent: Wednesday, November 04, 2009 12:53 PM
To: Frank Warmerdam (External); Landon Blake
Cc: metacrs at lists.osgeo.org
Subject: RE: [MetaCRS] Standard (and simple) format for conversion
tests.

Hello All . . .

I too am interested in a general test format and universal test case
database.  I believe a set of standard test cases would be a great thing
for the MetaCRS project.  For legal and other reasons, I believe we
should simply qualify the "target" values of all of our test cases as
suggested results with some generous tolerance values; and issue a
disclaimer as to the accuracy of the published results.

Having wrestled with this problem for many years, I have some comments:

1> I prefer a simple .CSV type of test file format.  The test file would
then be a totally portable, non-binary, text file (limited to 8 bit
characters for more portability?), be easily parsed in any language or
application, and it can be easily maintained using something like Excel,
MySql, etc (anything that can export a simple .CSV file).

2> I would like to see a "test type" field in the record format which
will support testing things addition to the basic "convert this
coordinate test".  Thus, datum shift tests, geoid height tests, grid
scale, convergence, vertical datum tests, etc. could all be included in
a single database.

3> We should strive for a "Source of Test Data" field requirement in the
database which indicates the source of the test.  That is, where did the
test case data come from.  The source should always (?) be something
outside of the MetaCRS project base.

4> Test cases derived from the various projects of MetaCRS could/should
be included and classified as being regression tests only.

5> Some sort of environment field would be nice.  That is, a bit map
sort of thing that would enable a program to skip certain tests based on
the environment (i.e. presence of the Canadian NTv2 data file for
example).

6> Separate tolerances on the source and target is a nice idea enabling
an automatic inverse test for each test case.  A simpler database would
result if we require separate entries in the database to test both the
forward and inverse cases.  I prefer the latter, as inverse testing is
not always appropriate and it supports item 9 below.

7> Test data values should be entered in the form as the source material
(to the degree possible), implying (for example) that geographic
coordinates may be entered as degrees, minutes, and seconds or decimal
degrees.

8> Tolerances in the test database should be based on the quality or
nature of the "Source of Test Data".  It could be a serious legal issue
if we publish something suggesting that this is the correct result.

9> None of our projects will produce the exact same result, nor will any
other library match any of ours precisely.  At this level I do not think
it appropriate for MetaCRS to make the call as to which is the correct
one.  Therefore I suggest that the format be designed such that any
library (MetaCRS or otherwise) be able to simply publish a file with the
result produced by the library as opposed to a Boolean condition
indicating whether or not they meet the MetaCrs standard. standards.  It
is then up to the consumer of that information to decide which one is
correct.  This may be an important legal issue as well.  (Notice that
EPSG has never included test cases in their database.)

10> Coordinate system references should be by EPSG number where ever
possible.  I suggest a format of the "EPSG:3745" type.  In cases where
this won't work, the test database should include a namespace qualifier
and then the definition:

	CSMAP:LL84
	PROJ4:'+proj=utm +zone=11 +datum=WGS84'
	ORACLE:80114
	.
	.
	.
Test applications would, of course, skip any test which it is incapable
of deciphering the CRS's referenced.

The CS-MAP distribution includes a test data file named TEST.DAT which
includes a couple thousand test cases.  The comments in this file
usually indicate the "Source of Test Data" to some degree.  Many need to
be commented out due to environmental reasons, thus item 5 above.

Norm

-----Original Message-----
From: metacrs-bounces at lists.osgeo.org
[mailto:metacrs-bounces at lists.osgeo.org] On Behalf Of Frank Warmerdam
Sent: Wednesday, November 04, 2009 11:50 AM
To: Landon Blake
Cc: metacrs at lists.osgeo.org
Subject: Re: [MetaCRS] Standard (and simple) format for conversion
tests.

Landon Blake wrote:
> I will be helping Martin Davis on some testing and improvements to 
> Proj4J. One of my tasks will be to test some of the improvements we
are 
> making to the coordinate conversion calculations. I think this testing

> is currently being done with Java unit tests. A while back on this
list 
> I remember we discussed a simple format for test data that could be 
> provided to software tests. I think the goal would be to assemble a 
> standard library of test data files that could be used by different 
> coordinate conversion projects.
> 
>  
> 
> Is there still an interest in this?

Landon,

I am interested in such a thing existing.  In my Python script for
testing PROJ.4 (through OGRCoordinateTransformation) I have:

########################################################################
#######
# Table of transformations, inputs and expected results (with a
threshold)
#
# Each entry in the list should have a tuple with:
#
# - src_srs: any form that SetFromUserInput() will take.
# - (src_x, src_y, src_z): location in src_srs.
# - src_error: threshold for error when srs_x/y is transformed into
dst_srs and
#              then back into srs_src.
# - dst_srs: destination srs.
# - (dst_x,dst_y,dst_z): point that src_x/y should transform to.
# - dst_error: acceptable error threshold for comparing to dst_x/y.
# - unit_name: the display name for this unit test.
# - options: eventually we will allow a list of special options here
(like one
#   way transformation).  For now just put None.
# - min_proj_version: string with minimum proj version required or null
if unknown

transform_list = [ \

     # Simple straight forward reprojection.
     ('+proj=utm +zone=11 +datum=WGS84', (398285.45, 2654587.59, 0.0),
0.02,
      'WGS84', (-118.0, 24.0, 0.0), 0.00001,
      'UTM_WGS84', None, None ),

     # Ensure that prime meridian changes are applied.
     ('EPSG:27391', (20000, 40000, 0.0), 0.02,
      'EPSG:4273', (6.397933,58.358709,0.000000), 0.00001,
      'NGO_Oslo_zone1_NGO', None, None ),

     # Verify that 26592 "pcs.override" is working well.
     ('EPSG:26591', (1550000, 10000, 0.0), 0.02,
      'EPSG:4265', (9.449316,0.090469,0.00), 0.00001,
      'MMRome1_MMGreenwich', None, None ),
..

I think one important thing is to provide an acceptable error threshold
with
each test in addition to the expected output value.  I also think each
test
should support a chunk of arbitrary test which could be used to explain
the purpose of the test (special issues being examined) and pointing off
to a ticket or other relavent document.

Actually one more thing is a name for the test, hopefully slightly
self-documenting.  I suppose if each test is a distinct file, we
could use meaningful filenames.

The other dilemma is how to define the coordinate systems.  I feel that
limiting things to EPSG defined coordinate systems is a problem though
of
course otherwise we have serious problems with defining in the
coordinate
system in an interoperable fashion.   So, perhaps starting with EPSG
codes
is reasonable with an understanding that eventually some tests might
need
to be done another way - perhaps OGC WKT.

If you wanted to roll out something preliminary I would be interested
writing a Python script that would run the test against OGR/PROJ.4.

Best regards,
-- 
---------------------------------------+--------------------------------
------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent

_______________________________________________
MetaCRS mailing list
MetaCRS at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/metacrs

Warning:
Information provided via electronic media is not guaranteed against defects including translation and transmission errors. If the reader is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this information in error, please notify the sender immediately.