[pdal] PLang status
Michael P. Gerlek
mpg at flaxen.com
Tue Mar 13 21:39:44 EDT 2012
Those of you playing along at home will have noticed that the PLang support is being finished at last. The status is this:
* The PLang system allows you to write scripts that get executed at run time inside a filter, so you can "trivially" write a complex, field-dependent filter operation.
* We are now calling embedded Python as opposed to the hand-rolled Qi parser originally planned. While writing the AST code was fun, the Qi parser was truly a horrific experience and not at all maintainable by the casual developer. Python, on the other hand, rocks.
* You need to cmake with -DWITH_PYTHON=on. (Howard, can you pls ask Jenkins to do his builds this way, if he's not already?) You need to have python (probably 2.7) and numpy installed.
* There are good unit tests for the embedded python interpreter mechanism (PythonTests) and for the two derived filters (ProgrammableFilterTests and PredicateTests). The tests are all passing and are leak-free right now.
* I still need to clean up more stuff, so it's not ready for general testing/using yet. Probably tomorrow, if all goes well.
* I plan to add support for adding a script to the PipelineXML system.
* I will publish guidelines for writing the scripts, but the basic rules are these:
- you need to import numpy, and treat the fields as numpy arrays
- the function expects two dictionaries passed in, "ins" and "outs", which contain the fields being passed in and returned
- we use numpy so that each call to the script function processes a bunch of points at a time (a PointBuffer's worth) and so the execution can be way fast
- if you define a bool numpy array "Result" and add it to the "outs" dictionary, it is treated as mask; elements of the Results array that are false will not have that point copied to the output PointBuffer
- the script function should return a bool; if nontrue, it will be treated as an error condition and the system will halt and catch fire
- the ins/outs dictionary only uses the "simple" name of the Dimension; if we wanted to add scoped names, it's trivial
Here is an example script that takes in the X,Y,Z fields and sets the X values to increment by 10 and sets the Z values to always be 22/7.
import numpy as np
def yow(ins,outs):
X = ins['X']
Y = ins['Y']
Z = ins['Z']
#print ins['X']
X = X + 10.0
# Y: leave as-is, don't export back out
# Z: goofiness to make it a numpy array of a constant
Z = np.zeros(X.size) + 3.14
outs['X'] = X
#print outs['X']\
outs['Z'] = Z
return True
Note the ability to put comments and printfs in the script.
Here is a script which acts as a predicate, keeping only the points whose Y field is greater than 1.0:
import numpy as np
def yow(ins,outs):
Y = ins['Y']
Result = np.greater(Y, 1.0)
#print Result
outs['Result'] = Result
return True
We may evolve this over time to get away from the icky ins/outs convention, but I'd like to let the code settle out for a while first and see how it goes.
Your considered opinions and kind thoughts would be most welcome.
_mpg
More information about the pdal
mailing list