[pdal] PointBuffer/Stage Modifications

Andrew Bell andrew.bell.ia at gmail.com
Mon May 12 15:14:13 PDT 2014


All,

I've got a branch going that begins to implement a new storage mechanism
for points in PDAL.  The primary goal of the the change is to greatly
reduce the copying of data as it moves through the library.  The essential
notion is that once data is read by a Reader, it never moves and never
needs to be copied.  In addition, the changes should allow us to remove all
of the random iterators and the Cache filter.

In order to facilitate this change, I introduced a few new classes that
isolate the PointBuffers from the actual points.  They're very simple for
now, but here's a brief description:

RawPtBuf: This is the actual storage for the points.  Nobody should touch
this other than the PointBuffer.  It's not "public" as such.  Currently
data is stored in the same manner as it used to be stored in a PointBuffer

PointContext: Provides a container for information about a Schema and
RawPtBuffer.  A PointContext should be created for each set of points to be
dealt with in a pipeline.  Most uses of PDAL should require a single
PointContext.  Applications like "diff" and "delta" require two, as they
are comparing two sets of points.  PointContext objects are very simple and
can be treated as POD.

Changes of Note:

A PointBuffer will no longer store point data.  Instead, it contains a
simple index (currently, just a vector), into the RawPtBuffer.  When you
create a point buffer, you provide it a PointContext, which ties the
PointBuffer to the RawPtBuffer and Schema.  Right now, you can create a
PointContext with the old constructor.  In that case, it will use a
PointContext created in the GlobalEnvironment and things will probably
break if you aren't careful.  This is just a stopgap until we can get
things modified.

Stages no longer accept a prevStage or prevStages argument.  After a Stage
is created, you can call setInput(Stage *) or setInput(vector<Stage *>) to
create a pipeline by hand.  I'm thinking this will change, but for now this
is how things work.

In the old code, you would call initialize() on the end stage before
reading points.  The notion of initialization has been replaced with
preparation.  Where you would have called "initialize()", you now call
"prepare(PointContext)".  prepare is a public function on a Stage which
calls private functions processOptions(Options&), initialize(), and
buildSchema(Schema *) on each of the stages in a pipeline.  You no longer
need to invoke the base class's initialization from a Filter/Reader/etc.
 buildSchema is the single opportunity that a stage has to add dimensions
to the schema.  You shouldn't try to do this in an iterator (as some stages
currently do).  appendDimension() now returns a pointer to the actual
Dimension that's stored in the schema in case a stage needs to store that
away for later use.

In the old code, you would read a specific number of points by creating a
PointBuffer with a capacity equal to the number of points you wanted to
read and pass that to read().  PointBuffers no longer have capacity (though
it's still hanging around for the time being).  In order to read N points
from an iterator into a PointBuffer, you just call read(PointBuffer, N).
 When you use this form of read(), readBegin(), readBufferBegin(),
readEnd() and readBufferEnd() are not called.  In order for this not to
throw an error, you must implement readImpl(PointBuffer, point_count_t) in
your Reader/Filter.

I created two types that should be used as things get modified:

point_count_t (currently uint32_t) is an equivalent to size_t for PDAL
points.  If we ever want to allow more than 2^32 points, this should save a
ton of time.

PointId (currently uin32_t) is essentially a point number, but this may
change.


Implementing readImpl():

When changing/copying readBufferImpl() to readImpl(), there are a couple of
things to note:

1) You should call PointBuffer().size() in order to determine the starting
index at which to write data to the PointBuffer -- don't start from 0.
 This allows a point buffer to be reused without having its data
overwritten by subsequent calls to read().

2) You don't need to tell the PointBuffer how many points are read -- this
is tracked automatically.  setNumPoints() will go away soon.

3) You can no longer copy data from some raw buffer into a PointBuffer.
 You should assume that you have no control over the way point data are
stored.  Call setField()/getField().

4) You can't treat a PointBuffer as random-access for writing.  You must
call setField on point N-1 before you can call setField on point N.  This
shouldn't be a real limitation as far as I can see.

This is a lot.  Thanks for reading.  If you're going to tackle
modifications to drivers/filters, you can look at BPF and the LAS reader,
as I've changed them along with the associated tests to work the new way.
 If you're going to work on converting things, open a ticket or let me know
so that we keep duplicate effort to a minimum.

Let me know if you have questions/comments.

Thanks!

-- 
Andrew Bell
andrew.bell.ia at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20140512/52d8a3cf/attachment.html>


More information about the pdal mailing list