[pdal] Scaling filter and asking for dimensions

Mon Dec 19 12:50:43 EST 2011

Michael,

Consider the following scenario of a filter to arbitrarily scale data takes in options that define a from- and a to- dimension -- for example taking the X dimension for scale-0.001 to scale-0.01, but keeping its datatype the same.  This scenario means touching the raw data to reapply the scaling, and you want to be able to address both X dimensions because one may be more precise than the other (if the scaling were to cause things to overflow, for example).

Using the old schema/dimension semantics, having two dimensions with the same type on a Schema didn't work very well (you couldn't select for a specific one). For the scaling filter, it was up to the filter to make sure to stuff its output into "the only" X_i32 dimension on the schema if you wanted the data that were scaled by the filter to be written into the Writer.  The Scaling filter worked because it added data that were X_f64 in type, and the writer disregarded them.  

Using the new semantics I'm working on in the dimension_types branch, a Stage can now ask the Schema for the "X" dimension,  the "filters.scale.X", or "drivers.las.X" dimensions on a Schema -- that is, the Schema can have two dimensions named 'X' on it, but they can be discriminated by their namespace (and/or guid which is available underneath for global addressability).  Having two 'X' dimensions with the same name and namespace is still the same problem as before, however.

We also have the problem of how to tell the writer which 'X' dimension it should be using.  As I have things currently implemented, schema.getDimension("X") is going to return the first 'X' dimension it comes across.  How should we be able to tell writers while writing/constructing a pipeline, "Hey, use filters.scale.X for your X dimension instead of drivers.las.X"?  We need some kind of mapping that the user can specify "For your 'X' dimension, use 'filters.scale.X'".  The new Scaling filter I've implemented can arbitrarily scale any dimension, so it is not sufficient for us to specify this just in terms of the X, Y, and Z dimensions of the LAS writer.  

My idea to resolve this is to simply have a convention where the Writer/Filter/Reader look up-pipeline for all namespaces ahead of it, and default to using those in preference to dimensions that have namespaces that are down-pipeline relative to it. It would involve adding two methods to StageBase to return a vector of up-pipeline and down-pipeline namespaces, and using this information to inform schema.getDimension invocations.

There may be instances where you would want to override this behavior, and I'm not sure of a clean way to do so, yet. Maybe there's a StageBase option to ask to turn off the namespace-aware dimension finding for a specific Stage.  

This email was FYI to see if it causes any thoughts about the subject to come to mind. I will continue trudging ahead in dimension_types and implement something like this.

Howard