[gdal-dev] ogr2ogr reprojection, features are not transformed

Thu Nov 17 18:09:39 EST 2011

Hello

On Wed, Nov 16, 2011 at 5:09 PM, Even Rouault
<even.rouault at mines-paris.org> wrote:
> Etienne,
>
>>
>> It seems that setting source srs is needed when using shapefiles, as
>> you said.  This should be documented somewhere (probably on the
>> ogr2ogr page and/or shapefile driver page).
>
> Feel free to add a warning. Logically, this should be more in the shapefile
> driver page. But this assumes that people actually read docs, which is dubious
> ;-)

It would be nice to put it in the ogr2ogr page too, but I understand
you wouldn't want to put format-specific stuff in there.
I'll update the shapefile driver docs.

>
>>
>> The more I use shapefiles the more I see the limitation in this file
>> format, and am quite puzzled as to why it is still so widespread...
>
> Yes shapefiles suffers from a lot of deficiencies (limitations of dbf format, no
> native - documented - spatial indexing, prj files, ...) You might experiment
> with spatialite which is far more capable, but still less widespread.
>
>>
>> Any other ideas on how we can fix this?
>> Here is how I think it could be done:
>>
>> 1- for all EPSG projections, generate its ESRI WKT (and perhaps a few
>> variations)
>> 2- make a mapping from ESRI WKT (or its hash) to EPSG codes
>> 3- use the hash mapping to find the EPSG code from a given WKT.
>>
>> Does this make sense?
>>
>> An obvious hurdle is that WKTs can have small variations.
>>
>> For example,
>>
>> EPSG:4618 as output by GDAL:
>> $ gdalsrsinfo -o wkt_esri EPSG:4618
>> GEOGCS["SAD69",DATUM["D_South_American_1969",SPHEROID["GRS_1967_Truncated",
>> 6378160,298.25]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
>>
>> whereas an example file (brazil.prj) has:
>> GEOGCS["SAD69",DATUM["D_South_American_1969",SPHEROID["GRS_1967_Modified",6
>> 378160,298.25]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
>>
>> however, GDAL can deal with these variations:
>> $ gdalsrsinfo -o wkt_esri ESRI::brazil.prj
>> GEOGCS["SAD69",DATUM["D_South_American_1969",SPHEROID["GRS_1967_Truncated",
>> 6378160,298.25]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
>
> The conversion between GDAL WKT and ESRI WKT belongs to the field of
> experimental science certainly. There are some known rules, but a lot of
> particular cases, some still remaining to be unearthed. The version of
> ogr_srs_esri.cpp in 1.8-esri branch is far more complicated than the one in
> trunk.

Is this going to be merged into trunk eventually?

>
> As far as your above algorithm is concerned, I'm wondering how it could work,
> with the variations you gave above. Perhaps a statistical approach with fuzzy
> string matching would give better results than something based on hashing ;-)
> More seriously, I think that a campaign of collecting a lot of .PRJ files
> (ideally coming from ESRI software, and not produced by GDAL) would be needed
> first to see which rules can work in practice.

I have been playing around a bit and here is what I did that works (first try):

- take a given CRS definition (from say EPSG or .prj file) and find
it's ESRI WKT or "simple" WKT.
- for all the EPSG codes in pcs.csv and gcs.csv, get it's ESRI (or
simple WKT), and compare that to the target WKT
- if you've a matching WKT, then get the full WKT corresponding to the
EPSG code that matches.

The problem is that it's pretty inefficient as you can imagine, taking
a few seconds to find one single target.

A second iteration:

- generate full WKT, ESRI WKT and "simple" (StripCT) WKT for all EPSG
codes in pcs.csv and gcs.csv
- save these to a flat (gzipped) file in csv form
- use these tables to find the EPSG code that matches a given WKT (in
whatever WKT flavor you need)

This is rather efficient in terms of processing time.

I thought that a hashing method could decrease the time to find a
matching string, but probably not because you have to load
the entire dataset anyway, and it doesn't make sense when you are scanning once.

This works for all EPSG codes I tried (think of it as a reverse EPSG
lookup), and also a few .prj files.
A problem I encountered was the differences in significant digits in
the ESRI-WKT and OGC-WKT, so for now it works best if warping to ESRI
WKT.

I will file a bug about this, concerning the shapefile driver, and
also incorporate this into the gdalsrsinfo utility (with a new "EPSG"
output).
Should I create a sandbox for an experimental gdalsrsinfo util
implementing this idea?

I found a few "fuzzy string" algorithms floating around, the idea is
not bad but could be expensive computationally.  It could serve as a
backup if direct string matching fails.

>
> Another point to keep in mind is that the TOWGS84 parameters proposed by GDAL
> do not always make concensus. The GRASS developers are not particularly happy
> with that : they would prefer that a list of possible transformations would be
> proposed when EPSG lists several of them, instead of just one picked up. See
> http://lists.osgeo.org/pipermail/gdal-dev/2011-September/030280.html

That's interesting also.  So what is best, using the TOWS84 params
that GDAL chooses, or using none at all (as happens in this case)?

merci,
Etienne
>
> Best regards,
>
> Even
>