[pdal] questions about iterator and buffer semantics

Tue Jul 2 07:55:51 PDT 2013

On Jul 1, 2013, at 9:58 PM, Chris Foster <chris.foster at roames.com.au> wrote:
> Ok.  They would have to cache the dimensions by value though, unlike
> the las reader which stores pointers: There's no way to know when the
> PointBuffer from a previous call to read() may be deleted, which leads to
> stale pointers to Dimensions even if the next PointBuffer has identical
> layout.  This is what caused the crash in the Mosaic case.

/me nods. 

Please clean these up as you find them.

> 
> Having a look at the way getDimension() is implemented I can see several
> things which are convenient but probably make it go much slower than
> necessary, so I think a lot of performance can be clawed back by simply
> optimizing getDimension() and getDimensionOptional().  Things which stick out
> to me:
> 
> * Exceptions are thrown as a matter of course (even in getDimensionOptional,
>  where they are caught and used to return an empty optional).  I'm very
>  suspicious of any code which uses exceptions as part of normal flow control.
> 
> * _Lots_ of memory management traffic happens behind the scenes:
>  std::ostringstream is used to format an error message as a matter of course
>  (rather than only in error situations), several std::string objects are
>  created, etc.

Agreed. I'll spend some time trying to tighten this up a bit.

> * Complicated data structures are allocated for the dimension inheritance
>  stuff.  I don't know how this works yet so I don't know whether it could be
>  faster.
> 
> Ideally getDimension() can be made zero allocation, and getDimensionOptional
> should never cause an exception to be thrown and gobbled behind the scenes.
> I'll have a look at how to improve the performance there if I get a chance.

Excellent. I'll try to do some too.

> Side note: I expect it's too late to change now, but what benefit do we get by
> making getDimensionOptional() return a boost::optional rather than a pointer?
> It strikes me as very slightly safer but less convenient when you end up
> caching raw pointers in places like the las reader anyway.

The code is in transition. I want to move to boost::optional rather than naked pointers. They're more 
convenient and a bit safer. Mateusz may have some more to say on that account as well

> 
>>> This is a bit surprising as it indicates the cost of dimension lookup in the
>>> schema is rather large, unless there's something else I'm missing.  I'll have
>>> to do some additional performance testing to try to get to the bottom of this.
>> 
>> The cost of dimension lookup per-point would be prohibitive. It is
>> essentially a std::map (actually boost::multi_array) traverse to find a
>> dimension.
> 
> Definitely.  What I'm noting is that the cost is *still* noticeable
> even when amortized over buffer sizes of 1000 points.  Something must be
> suboptimal!

Agreed. This is something I haven't spent a lot of time performance testing. Our usage scenario is to 
have one giant PointBuffer for our entire read (millions of points, usually), so these costs aren't felt so much by us. There's definitely