[gdal-dev] Shapefile: automatic retrieval of EPSG code

Even Rouault even.rouault at spatialys.com
Mon Sep 25 10:30:14 PDT 2017


Hi,

As part of a AWEOC (*), following a recent email thread about year-long issues when 
importing shapefiles, I've committed a first version of a OSRFindMatches() function that 
iterates through the EPSG database to check for one or several matches between the input 
SRS and the catalog.

The shapefile driver now uses this function, instead of the more simple AutoIdentifyEPSG(), 
which only worked on a tiny subset of SRS. When a single match is found by 
OSRFindMatches() with an excellent match quality (meaning IsSame() returning TRUE), it is 
automatically picked up by the Shapefile driver.

OSRFindMatches() returns both a list of candidate SRS and a confidence value: 0=bad, 
100=perfect . Currently if several candidates pass IsSame(), they will be returned with 90% 
confidence. If a SRS doesn't pass IsSame() but is the single one to have a PROJCS/GEOGCS 
name matching the input SRS name (name matching is done with a few hardcoded 
approximation rules), it is returned with 50% confidence.

OSRFindMatches() execution time is :
- first execution ever: around 500 ms. It will then create files in
~/.gdal/$(GDAL_VERSION_MAJOR).$(GDAL_VERSION_MINOR)/srs_cache to speed-up later 
executions
- first execution in a process, using the above cache: around 100ms
- later executions in the process: around 10 ms per SRS (55s to identify the 5216 entries of 
GEOGCS and PROJCS from the EPSG database that are valid for GDAL)

So I feel this is good enough to be enabled by default in the shapefile driver.

I've tested & tuned the function:
- the 5216 above mentionned SRS exported from OGC WKT to ESRI WKT, and reimported 
from ESRI WKT to OGC WKT. 100% confidence single match is obtained for most SRS. Except 
occurences using Polar_Stereographic where ESRI WKT lose the scale_factor and other using 
Hotine_Oblique_Mercator where the rectified_grid_angle parametr is lost. But those are 
probably issues with morphToESRI() / morphFromESRI()
- the SRS definitions at
http://help.arcgis.com/en/arcgisserver/10.0/apis/rest/pcs.html . Results are less good there. 
Sometimes for good reason: no entry in the EPSG code, different projection or datum 
parameters, etc.

It could certainly be improved. For example by using instead of the back&white IsSame() 
method a (to-be-created) GetSimilarityLevel() method that would compare how different 2 
SRS are. More relaxed SRS name comparisons would also help (as the 'tokens' making SRS 
names are separated with underscore once 'massaged', you could split on that, and then 
compare how many tokens are in common (or have partial match), and sum the number of 
identical characters, etc..).


The gdalsrsinfo utility is also enhanced to use the new OSRFindMatches() function if you use 
the -e switch (for epsg).

For example, with some.prj containing: 
PROJCS["NAD_1983_StatePlane_Texas_Central_FIPS_4203_Feet",GEOGCS["GCS_North_Ame
rican_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",
6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",
0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easti
ng",2296583.333333333],PARAMETER["False_Northing",
9842500.0],PARAMETER["Central_Meridian",-100.3333333333333],PARAMETER["Standard_P
arallel_1",30.11666666666667],PARAMETER["Standard_Parallel_2",
31.88333333333333],PARAMETER["Latitude_Of_Origin",
29.66666666666667],UNIT["Foot_US",0.3048006096012192]]

$ gdalsrsinfo -e ESRI::some.prj

EPSG:2277

PROJ.4 : +proj=lcc +lat_1=31.88333333333333 +lat_2=30.11666666666667 
+lat_0=29.66666666666667 +lon_0=-100.3333333333333 +x_0=699999.9998983998 
+y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs 

OGC WKT :
PROJCS["NAD83 / Texas Central (ftUS)",
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4269"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",31.88333333333333],
    PARAMETER["standard_parallel_2",30.11666666666667],
    PARAMETER["latitude_of_origin",29.66666666666667],
    PARAMETER["central_meridian",-100.3333333333333],
    PARAMETER["false_easting",2296583.333],
    PARAMETER["false_northing",9842500.000000002],
    UNIT["US survey foot",0.3048006096012192,
        AUTHORITY["EPSG","9003"]],
    AXIS["X",EAST],
    AXIS["Y",NORTH],
    AUTHORITY["EPSG","2277"]]


It can also return several occurences if there isn't a single candidate

For example:
$ gdalsrsinfo  -e 
'PROJCS["undefined",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_
1989",SPHEROID["GRS 1980",6378137,298.257222101]],PRIMEM["Greenwich",
0],UNIT["degree",
0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origi
n",0],PARAMETER["central_meridian",27],PARAMETER["scale_factor",
0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",
1]]'

Confidence in this match: 90 %

EPSG:3047

PROJ.4 : +proj=utm +zone=35 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 

OGC WKT :
PROJCS["ETRS89 / UTM zone 35N (N-E)",
    GEOGCS["ETRS89",
        DATUM["European_Terrestrial_Reference_System_1989",
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170925/25ec8fb2/attachment-0001.html>


More information about the gdal-dev mailing list