[PROJ] Using latest realization of a datum ensemble ?

Wed Oct 14 11:45:44 PDT 2020

Thanks for your detailed comments and for taking the time to think about
my issue..  I am going to quote and reply to only the big-picture issue
in this message to keep it general interest.

Even Rouault <even.rouault at spatialys.com> writes:

> What is tricky in the suggestion to 'promote' to the latest realization of a datum ensemble is 
> that you might have both low accuracy transformations that exist like shown above for 
> NAD83 -> WGS84 and high accuracy for NAD83(2011) -> WGS84 (G1762) (here I assume that 
> NAD83 would be an enssemble, which it is not formally currently). Depending on the 
> situation, one or the other might be relevant.

I am not really trying to suggest promotion.  I am trying to separate
the concepts of

  what is the best transform

  what is the expected accuracy of the result, given the accuracy of the
  input data, the input datum/ensemble intrinsic error, the transform
  accuracy and the output datum/ensemble intrinsic error

Let's take a concrete case where someone has data in WGS84 web mercator.
They aren't 100% sure which realization and they aren't sure how
accurate it is.  But it could actually be accurate to say 0.1m (the
MassGIS data, assuming they transformed, would be an example).

Then, assume they want to convert it to NAD83(2011) to examine it
relative to some data in that frame that is pretty accurate.  (This is a
real example that I want to do.)

What I'm suggesting is that we should find the transform from
WGS84(G1762), to NAD83(2011), because if the data is in the modern frame
and accurate, that's the right thing.  If the data is in a very old
frame, it's not really a bad transform, especially because data in
WGS84(TRANSIT) is extremely unlikely to be accurate.  In either case,
using this transform should go along with error estimates being
propagated (if we were to start storing those with coordinates) that
reflect the ensemble uncertainty.

Another way of looking at this is that when data is labeled with an
ensemble, there is some probability distribution of what frame the data
really is in.  So if one considers all of those and the probabilities,
and then asks: which transform results in the lowest expected square
error averaged over all the possible input datums?  Here I'd argue that
if the input is in the most recent realization, that precise transform
is much better.  And if it's in some old realization like
WGS84(TRANSIT), then the modern transform is not necessarily worse than
the null transform.

With my proposed approach, the errors for the recent realization will
drop from 2m to maybe 10cm, and the errors for WGS84(TRANSIT) might or
might not change from 2mish to a bit more.

I also believe that data that is actually in WGS84(TRANSIT) is extremely
unlikely, so it's best to optimize for the likely case of more recent
realizations.  (Similarly, I think accurate data in NAD83(1986) is
extremely unlikely.)

I find the justification for "ensemble has high intrinsic errors so null
transform is good" to be hard to undertand.  I certainly can understand
"null transform error is comparable to existing possible error", but to
me it's picking zero as a magic number because it is a very round number
rather than picking the value that minimizes errors overall.

Finally, while I realize we probably can't strictly have this property,
I think it would be good if the relative output positions of data in CRS
A and CRS B are consistent, whether the prroject CRS is A, B, C, or D.
Right now we essentially "round to zero" differently, and this seems
avoidable.

Does this make sense?

Does anyone know how this is handled in ESRI products or any context
other than qgis/proj?

Greg