[pdal] questions about iterator and buffer semantics

Chris Foster chris.foster at roames.com.au
Tue Jul 2 20:15:07 PDT 2013


On 3 July 2013 00:55, Howard Butler <hobu.inc at gmail.com> wrote:
>> * Exceptions are thrown as a matter of course (even in getDimensionOptional,
>>  where they are caught and used to return an empty optional).  I'm very
>>  suspicious of any code which uses exceptions as part of normal flow control.
>>
>> * _Lots_ of memory management traffic happens behind the scenes:
>>  std::ostringstream is used to format an error message as a matter of course
>>  (rather than only in error situations), several std::string objects are
>>  created, etc.
>
> Agreed. I'll spend some time trying to tighten this up a bit.

I did some performance testing last night - initial results indicate that
the major problem is throwing and gobbling exceptions unnecessarily.  For a
buffer size of 1000 and a large file this can be half the runtime!  Not that I
want small buffers per se, but why use extra memory and thrash the processor
cache if you don't have to ;-)

For good performance we really need to avoid

try
{
    Thing& x = some_function_which_may_throw();
    // do stuff with x
}
catch(Exception&)
{
    // Silently swallow the exception, since x not existing isn't
actually an error
}

This is much more efficiently (and conveniently!) written as

Thing* x = some_function_optional();
if (x)
{
    // do stuff with x
}

Some debate can be had about whether to use Thing* or boost::optional<Thing&>
here.  To some extent that's just a matter of taste and I tend to prefer a raw
pointer.  If you have functions named nicely like getDimensionOptional(), it's
super easy to remember that the returned pointer may not actually be valid and
must be checked.

>> Side note: I expect it's too late to change now, but what benefit do we get by
>> making getDimensionOptional() return a boost::optional rather than a pointer?
>> It strikes me as very slightly safer but less convenient when you end up
>> caching raw pointers in places like the las reader anyway.
>
> The code is in transition. I want to move to boost::optional rather than
> naked pointers. They're more convenient and a bit safer. Mateusz may have
> some more to say on that account as well

They are (arguably) slightly safer, but more conveient?

Thing* x = some_function_optional();
if (x)
{
    x->bar();
    x->foo();
    baz(x);
}

vs

boost::optional<Thing&> x = some_function_optional();
if (x)
{
    x->bar();
    x->foo();
    baz(*x);
}

or possibly

boost::optional<Thing&> x = some_function_optional();
if (x)
{
    Thing& x2 = *x;
    x2.bar();
    x2.foo();
    baz(x2);
}

boost::optional definitely demands more typing in this case, and more human
parsing effort when reading the code.  It's also very slightly less efficient
because you end up checking validity of x at least twice (four times in
example two).

Another reason for using optional is that value types don't naturally have a
special "invalid value".  In this case we're talking about reference semantics
however and you can always use the NULL pointer which also happens to be very
convenient to check in an if() statement.

IMO the only reason to choose optional for references is to generate
an exception rather than a segfault in the case of programming error.  But
this isn't even much of an advantage since the program will crash in a visible
way in either case.  I'll concede that if you're embedding PDAL as a plugin
into a larger program it may be nicer to catch the error and gracefully
terminate the plugin.  PDAL isn't shy about unsafe fiddling with pointers
though - PointBuffer.getField is a prime example.

>> Definitely.  What I'm noting is that the cost is *still* noticeable
>> even when amortized over buffer sizes of 1000 points.  Something must be
>> suboptimal!
>
> Agreed. This is something I haven't spent a lot of time performance testing.
> Our usage scenario is to have one giant PointBuffer for our entire read
> (millions of points, usually), so these costs aren't felt so much by us.

No problems at all, obviously the code will be more solid in areas which are
heavily used.

~Chris


More information about the pdal mailing list