[gdal-dev] Shapefile: automatic retrieval of EPSG code
Even Rouault
even.rouault at spatialys.com
Mon Sep 25 10:30:14 PDT 2017
Hi,
As part of a AWEOC (*), following a recent email thread about year-long issues when
importing shapefiles, I've committed a first version of a OSRFindMatches() function that
iterates through the EPSG database to check for one or several matches between the input
SRS and the catalog.
The shapefile driver now uses this function, instead of the more simple AutoIdentifyEPSG(),
which only worked on a tiny subset of SRS. When a single match is found by
OSRFindMatches() with an excellent match quality (meaning IsSame() returning TRUE), it is
automatically picked up by the Shapefile driver.
OSRFindMatches() returns both a list of candidate SRS and a confidence value: 0=bad,
100=perfect . Currently if several candidates pass IsSame(), they will be returned with 90%
confidence. If a SRS doesn't pass IsSame() but is the single one to have a PROJCS/GEOGCS
name matching the input SRS name (name matching is done with a few hardcoded
approximation rules), it is returned with 50% confidence.
OSRFindMatches() execution time is :
- first execution ever: around 500 ms. It will then create files in
~/.gdal/$(GDAL_VERSION_MAJOR).$(GDAL_VERSION_MINOR)/srs_cache to speed-up later
executions
- first execution in a process, using the above cache: around 100ms
- later executions in the process: around 10 ms per SRS (55s to identify the 5216 entries of
GEOGCS and PROJCS from the EPSG database that are valid for GDAL)
So I feel this is good enough to be enabled by default in the shapefile driver.
I've tested & tuned the function:
- the 5216 above mentionned SRS exported from OGC WKT to ESRI WKT, and reimported
from ESRI WKT to OGC WKT. 100% confidence single match is obtained for most SRS. Except
occurences using Polar_Stereographic where ESRI WKT lose the scale_factor and other using
Hotine_Oblique_Mercator where the rectified_grid_angle parametr is lost. But those are
probably issues with morphToESRI() / morphFromESRI()
- the SRS definitions at
http://help.arcgis.com/en/arcgisserver/10.0/apis/rest/pcs.html . Results are less good there.
Sometimes for good reason: no entry in the EPSG code, different projection or datum
parameters, etc.
It could certainly be improved. For example by using instead of the back&white IsSame()
method a (to-be-created) GetSimilarityLevel() method that would compare how different 2
SRS are. More relaxed SRS name comparisons would also help (as the 'tokens' making SRS
names are separated with underscore once 'massaged', you could split on that, and then
compare how many tokens are in common (or have partial match), and sum the number of
identical characters, etc..).
The gdalsrsinfo utility is also enhanced to use the new OSRFindMatches() function if you use
the -e switch (for epsg).
For example, with some.prj containing:
PROJCS["NAD_1983_StatePlane_Texas_Central_FIPS_4203_Feet",GEOGCS["GCS_North_Ame
rican_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",
6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",
0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easti
ng",2296583.333333333],PARAMETER["False_Northing",
9842500.0],PARAMETER["Central_Meridian",-100.3333333333333],PARAMETER["Standard_P
arallel_1",30.11666666666667],PARAMETER["Standard_Parallel_2",
31.88333333333333],PARAMETER["Latitude_Of_Origin",
29.66666666666667],UNIT["Foot_US",0.3048006096012192]]
$ gdalsrsinfo -e ESRI::some.prj
EPSG:2277
PROJ.4 : +proj=lcc +lat_1=31.88333333333333 +lat_2=30.11666666666667
+lat_0=29.66666666666667 +lon_0=-100.3333333333333 +x_0=699999.9998983998
+y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
OGC WKT :
PROJCS["NAD83 / Texas Central (ftUS)",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.257222101,
AUTHORITY["EPSG","7019"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4269"]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["standard_parallel_1",31.88333333333333],
PARAMETER["standard_parallel_2",30.11666666666667],
PARAMETER["latitude_of_origin",29.66666666666667],
PARAMETER["central_meridian",-100.3333333333333],
PARAMETER["false_easting",2296583.333],
PARAMETER["false_northing",9842500.000000002],
UNIT["US survey foot",0.3048006096012192,
AUTHORITY["EPSG","9003"]],
AXIS["X",EAST],
AXIS["Y",NORTH],
AUTHORITY["EPSG","2277"]]
It can also return several occurences if there isn't a single candidate
For example:
$ gdalsrsinfo -e
'PROJCS["undefined",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_
1989",SPHEROID["GRS 1980",6378137,298.257222101]],PRIMEM["Greenwich",
0],UNIT["degree",
0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origi
n",0],PARAMETER["central_meridian",27],PARAMETER["scale_factor",
0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",
1]]'
Confidence in this match: 90 %
EPSG:3047
PROJ.4 : +proj=utm +zone=35 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
OGC WKT :
PROJCS["ETRS89 / UTM zone 35N (N-E)",
GEOGCS["ETRS89",
DATUM["European_Terrestrial_Reference_System_1989",
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170925/25ec8fb2/attachment-0001.html>
More information about the gdal-dev
mailing list