[pdal] PLang status

Michael P. Gerlek mpg at flaxen.com
Tue Mar 13 21:39:44 EDT 2012


Those of you playing along at home will have noticed that the PLang support is being finished at last.  The status is this:

* The PLang system allows you to write scripts that get executed at run time inside a filter, so you can "trivially" write a complex, field-dependent filter operation.

* We are now calling embedded Python as opposed to the hand-rolled Qi parser originally planned.  While writing the AST code was fun, the Qi parser was truly a horrific experience and not at all maintainable by the casual developer.  Python, on the other hand, rocks.

* You need to cmake with -DWITH_PYTHON=on.  (Howard, can you pls ask Jenkins to do his builds this way, if he's not already?)  You need to have python (probably 2.7) and numpy installed.

* There are good unit tests for the embedded python interpreter mechanism (PythonTests) and for the two derived filters (ProgrammableFilterTests and PredicateTests).  The tests are all passing and are leak-free right now.

* I still need to clean up more stuff, so it's not ready for general testing/using yet.  Probably tomorrow, if all goes well.

* I plan to add support for adding a script to the PipelineXML system.

* I will publish guidelines for writing the scripts, but the basic rules are these:

- you need to import numpy, and treat the fields as numpy arrays
- the function expects two dictionaries passed in, "ins" and "outs", which contain the fields being passed in and returned
- we use numpy so that each call to the script function processes a bunch of points at a time (a PointBuffer's worth) and so the execution can be way fast
- if you define a bool numpy array "Result" and add it to the "outs" dictionary, it is treated as mask; elements of the Results array that are false will not have that point copied to the output PointBuffer
- the script function should return a bool; if nontrue, it will be treated as an error condition and the system will halt and catch fire
- the ins/outs dictionary only uses the "simple" name of the Dimension; if we wanted to add scoped names, it's trivial

Here is an example script that takes in the X,Y,Z fields and sets the X values to increment by 10 and sets the Z values to always be 22/7.

       import numpy as np
       def yow(ins,outs):
         X = ins['X']
         Y = ins['Y']
         Z = ins['Z']
         #print ins['X']
         X = X + 10.0
         # Y: leave as-is, don't export back out
         # Z: goofiness to make it a numpy array of a constant
         Z = np.zeros(X.size) + 3.14
         outs['X'] = X
         #print outs['X']\
         outs['Z'] = Z
         return True

Note the ability to put comments and printfs in the script.

Here is a script which acts as a predicate, keeping only the points whose Y field is greater than 1.0:

        import numpy as np
        def yow(ins,outs):
          Y = ins['Y']
          Result = np.greater(Y, 1.0)
          #print Result
          outs['Result'] = Result
          return True

We may evolve this over time to get away from the icky ins/outs convention, but I'd like to let the code settle out for a while first and see how it goes.
 
Your considered opinions and kind thoughts would be most welcome.

_mpg




More information about the pdal mailing list