[pdal] PLang status

Howard Butler hobu.inc at gmail.com
Wed Mar 14 09:09:58 EDT 2012


On Mar 13, 2012, at 8:39 PM, Michael P. Gerlek wrote:

> Those of you playing along at home will have noticed that the PLang support is being finished at last.  The status is this:

w00t!

> * The PLang system allows you to write scripts that get executed at run time inside a filter, so you can "trivially" write a complex, field-dependent filter operation.
> 
> * We are now calling embedded Python as opposed to the hand-rolled Qi parser originally planned.  While writing the AST code was fun, the Qi parser was truly a horrific experience and not at all maintainable by the casual developer.  Python, on the other hand, rocks.
> 
> * You need to cmake with -DWITH_PYTHON=on.  (Howard, can you pls ask Jenkins to do his builds this way, if he's not already?)  You need to have python (probably 2.7) and numpy installed.

Done.  CMake config to find NumPy also added. I will update the configuration to complain if Python is found but the NumPy headers are not.

> 
> * There are good unit tests for the embedded python interpreter mechanism (PythonTests) and for the two derived filters (ProgrammableFilterTests and PredicateTests).  The tests are all passing and are leak-free right now.
> 
> * I still need to clean up more stuff, so it's not ready for general testing/using yet.  Probably tomorrow, if all goes well.
> 
> * I plan to add support for adding a script to the PipelineXML system.

Being able to reference external .py files would be helpful here too, especially since Python's syntax is whitespace sensitive and XML is not without CDATA entries.  

> 
> * I will publish guidelines for writing the scripts, but the basic rules are these:
> 
> - you need to import numpy, and treat the fields as numpy arrays
> - the function expects two dictionaries passed in, "ins" and "outs", which contain the fields being passed in and returned
> - we use numpy so that each call to the script function processes a bunch of points at a time (a PointBuffer's worth) and so the execution can be way fast
> - if you define a bool numpy array "Result" and add it to the "outs" dictionary, it is treated as mask; elements of the Results array that are false will not have that point copied to the output PointBuffer
> - the script function should return a bool; if nontrue, it will be treated as an error condition and the system will halt and catch fire
> - the ins/outs dictionary only uses the "simple" name of the Dimension; if we wanted to add scoped names, it's trivial

We will definitely need scoped names. The rules for for fetching Dimension names from Schema are:

- Search for the simple name
  * If more than one dimension with the same simple name is found, check from parent-child relationships of the dimensions of the same name, and negotiate them until the final child dimension is found -- return that if only one child is found
- If no simple name is found, try making a UUID out of the search string and searching for a dimension with it. If found return it.

I'm going to be adding aliases to this too. You're going to be able to set an alias on the X dimension to be 'x' or 'longitude' and Schema::getDimension is going to do the right thing when finding it.  The rules for multiple dimensions of the same alias will apply as above.

Is ins['X'] an array of dereferencable numpy pointers to the X dimension for the PointBuffer, or are data copied?  

Is ins a reference and outs a copy?

> Note the ability to put comments and printfs in the script.

What happens is pure eval()'d Python, right? I would expect that any valid Python script for your environment should be good.



More information about the pdal mailing list