[MetaCRS] Standard (and simple) format for conversion tests.

Wed Nov 4 16:45:32 EST 2009

Whew! Quite a shopping list, Norm.  But it all makes good sense.

I guess I agree about the CSV format.  It certainly works well for the 
CSMap tests - I found it fairly easy to parse and use the test data 
there.  One thing about CSV is that it doesn't make provision for 
comments.  The '#' commenting convention in the TEST.DAT file looked 
fairly useful. But it would be easy to adopt this convention and strip 
out comments if required.  (I'll insert a plug here for JEQL - it is 
handy for reading/transforming/subsetting CSV files)

The "test type" indicator is a very good point.  This is another thing 
that it might be nice to physically segment the unit test files on, to 
make it easy to only run test types of interest/capability for a given 
lib.  But it's still good to have this as an explicit field in the test 
description, since this allows the logical model of the tests to be 
independent of their physical file organization  (This also applies to 
the authoritynamespace identifier - and yes, I am reversing my position 
of my previous email!)

For the environment indicator, how about a string with a delimited set 
of keywords/tags?  This is likely to be somewhat freeform and 
lib-specific, isn't it?  Or is there a clear idea of what would go here? 

I assume the disclaimer would take the form of a README file associated 
with the test archive?

Now, how to move forward on this?  Perhaps:
- define a prototype format (on a wiki page)
- create a SVN location for this archive
- create some sample tests in the test format (perhaps by re-modelling 
the CSMap tests?  Or using extracts from the test suites from other libs)
- and then all the lib teams can get to work and start creating testrunners!

Martin

Norm Olsen wrote:
> Hello All . . .
>
> I too am interested in a general test format and universal test case database.  I believe a set of standard test cases would be a great thing for the MetaCRS project.  For legal and other reasons, I believe we should simply qualify the "target" values of all of our test cases as suggested results with some generous tolerance values; and issue a disclaimer as to the accuracy of the published results.
>
> Having wrestled with this problem for many years, I have some comments:
>
> 1> I prefer a simple .CSV type of test file format.  The test file would then be a totally portable, non-binary, text file (limited to 8 bit characters for more portability?), be easily parsed in any language or application, and it can be easily maintained using something like Excel, MySql, etc (anything that can export a simple .CSV file).
>
> 2> I would like to see a "test type" field in the record format which will support testing things addition to the basic "convert this coordinate test".  Thus, datum shift tests, geoid height tests, grid scale, convergence, vertical datum tests, etc. could all be included in a single database.
>
> 3> We should strive for a "Source of Test Data" field requirement in the database which indicates the source of the test.  That is, where did the test case data come from.  The source should always (?) be something outside of the MetaCRS project base.
>
> 4> Test cases derived from the various projects of MetaCRS could/should be included and classified as being regression tests only.
>
> 5> Some sort of environment field would be nice.  That is, a bit map sort of thing that would enable a program to skip certain tests based on the environment (i.e. presence of the Canadian NTv2 data file for example).
>
> 6> Separate tolerances on the source and target is a nice idea enabling an automatic inverse test for each test case.  A simpler database would result if we require separate entries in the database to test both the forward and inverse cases.  I prefer the latter, as inverse testing is not always appropriate and it supports item 9 below.
>
> 7> Test data values should be entered in the form as the source material (to the degree possible), implying (for example) that geographic coordinates may be entered as degrees, minutes, and seconds or decimal degrees.
>
> 8> Tolerances in the test database should be based on the quality or nature of the "Source of Test Data".  It could be a serious legal issue if we publish something suggesting that this is the correct result.
>
> 9> None of our projects will produce the exact same result, nor will any other library match any of ours precisely.  At this level I do not think it appropriate for MetaCRS to make the call as to which is the correct one.  Therefore I suggest that the format be designed such that any library (MetaCRS or otherwise) be able to simply publish a file with the result produced by the library as opposed to a Boolean condition indicating whether or not they meet the MetaCrs standard. standards.  It is then up to the consumer of that information to decide which one is correct.  This may be an important legal issue as well.  (Notice that EPSG has never included test cases in their database.)
>
> 10> Coordinate system references should be by EPSG number where ever possible.  I suggest a format of the "EPSG:3745" type.  In cases where this won't work, the test database should include a namespace qualifier and then the definition:
>
> 	CSMAP:LL84
> 	PROJ4:'+proj=utm +zone=11 +datum=WGS84'
> 	ORACLE:80114
> 	.
> 	.
> 	.
> Test applications would, of course, skip any test which it is incapable of deciphering the CRS's referenced.
>
> The CS-MAP distribution includes a test data file named TEST.DAT which includes a couple thousand test cases.  The comments in this file usually indicate the "Source of Test Data" to some degree.  Many need to be commented out due to environmental reasons, thus item 5 above.
>
> Norm
>
> -----Original Message-----
> From: metacrs-bounces at lists.osgeo.org [mailto:metacrs-bounces at lists.osgeo.org] On Behalf Of Frank Warmerdam
> Sent: Wednesday, November 04, 2009 11:50 AM
> To: Landon Blake
> Cc: metacrs at lists.osgeo.org
> Subject: Re: [MetaCRS] Standard (and simple) format for conversion tests.
>
> Landon Blake wrote:
>   
>> I will be helping Martin Davis on some testing and improvements to 
>> Proj4J. One of my tasks will be to test some of the improvements we are 
>> making to the coordinate conversion calculations. I think this testing 
>> is currently being done with Java unit tests. A while back on this list 
>> I remember we discussed a simple format for test data that could be 
>> provided to software tests. I think the goal would be to assemble a 
>> standard library of test data files that could be used by different 
>> coordinate conversion projects.
>>
>>  
>>
>> Is there still an interest in this?
>>     
>
> Landon,
>
> I am interested in such a thing existing.  In my Python script for
> testing PROJ.4 (through OGRCoordinateTransformation) I have:
>
> ###############################################################################
> # Table of transformations, inputs and expected results (with a threshold)
> #
> # Each entry in the list should have a tuple with:
> #
> # - src_srs: any form that SetFromUserInput() will take.
> # - (src_x, src_y, src_z): location in src_srs.
> # - src_error: threshold for error when srs_x/y is transformed into dst_srs and
> #              then back into srs_src.
> # - dst_srs: destination srs.
> # - (dst_x,dst_y,dst_z): point that src_x/y should transform to.
> # - dst_error: acceptable error threshold for comparing to dst_x/y.
> # - unit_name: the display name for this unit test.
> # - options: eventually we will allow a list of special options here (like one
> #   way transformation).  For now just put None.
> # - min_proj_version: string with minimum proj version required or null if unknown
>
> transform_list = [ \
>
>      # Simple straight forward reprojection.
>      ('+proj=utm +zone=11 +datum=WGS84', (398285.45, 2654587.59, 0.0), 0.02,
>       'WGS84', (-118.0, 24.0, 0.0), 0.00001,
>       'UTM_WGS84', None, None ),
>
>      # Ensure that prime meridian changes are applied.
>      ('EPSG:27391', (20000, 40000, 0.0), 0.02,
>       'EPSG:4273', (6.397933,58.358709,0.000000), 0.00001,
>       'NGO_Oslo_zone1_NGO', None, None ),
>
>      # Verify that 26592 "pcs.override" is working well.
>      ('EPSG:26591', (1550000, 10000, 0.0), 0.02,
>       'EPSG:4265', (9.449316,0.090469,0.00), 0.00001,
>       'MMRome1_MMGreenwich', None, None ),
> ...
>
> I think one important thing is to provide an acceptable error threshold with
> each test in addition to the expected output value.  I also think each test
> should support a chunk of arbitrary test which could be used to explain
> the purpose of the test (special issues being examined) and pointing off
> to a ticket or other relavent document.
>
> Actually one more thing is a name for the test, hopefully slightly
> self-documenting.  I suppose if each test is a distinct file, we
> could use meaningful filenames.
>
> The other dilemma is how to define the coordinate systems.  I feel that
> limiting things to EPSG defined coordinate systems is a problem though of
> course otherwise we have serious problems with defining in the coordinate
> system in an interoperable fashion.   So, perhaps starting with EPSG codes
> is reasonable with an understanding that eventually some tests might need
> to be done another way - perhaps OGC WKT.
>
> If you wanted to roll out something preliminary I would be interested
> writing a Python script that would run the test against OGR/PROJ.4.
>
> Best regards,
>   

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022