[libpc] Re: Dimension/Schema discussion

Howard Butler hobu.inc at gmail.com
Mon May 9 12:05:52 EDT 2011


On May 9, 2011, at 11:01 AM, Michael P. Gerlek wrote:

> I agree -- this is something I've been aware of, but not thought through to
> any solutions yet.   What we have doesn't scale very well, and now that
> we've got more formats/features coming online, the problem is becoming
> clearer.  I guess you're just the first one to hit it hard enough to decide
> we need to fix it :-)
> 
> If we went with your "mulit-index lookup, but called outside the PointBuffer
> loop" concept, one thing I'd like to see is we put back the concept of
> "readBegin/readEnd", which wrap the set of "read PointBuffer" calls.  In
> this readBegin function, then, we could put the multi-index lookup code, so
> it would be truly only once per full read operation, as opposed to once per
> chunk of the full read.

Agreed.

> 
> Making this change would break a lot of code, but the breakage would be
> largely "syntactic" and we've got good unit tests, so I'd not be at all
> concerned about trying it.

Agreed.  It'll bust a bunch of stuff but AFAIK, we're our only users at this point.  Unit tests should preserve our sanity.

> 
> Unfortunately, I'm booked up all today (including the chunked zip code), am
> out of the office all day tomorrow, and have some other things on my plate
> for the next couple days after that.
> 
> How much of a block is this for you right now?  Do you want to just dive in
> now, or do you want to bounce some ideas around for a while, or wait for me
> to do it later on, or..?

Not such a blocker for me at the moment, but I'm going to add BAG, TerraSolid .bin, and start cribbing up a Spirit-based text parser in the not too distant future here (next month or so).  I wasn't trying to put this on your plate, and I'd be willing to take it on, but I wanted to get some rationale and pushback on it since my picture of it isn't probably is clear as yours. 

> 
> -mpg
> 
> 
>> -----Original Message-----
>> From: Howard Butler [mailto:hobu.inc at gmail.com]
>> Sent: Monday, May 09, 2011 8:43 AM
>> To: Michael Gerlek
>> Cc: libpc at lists.osgeo.org
>> Subject: Dimension/Schema discussion
>> 
>> Michael,
>> 
>> An immediate thing I noticed while implementing a non-LAS-conforming
>> driver Friday is the desire to name my dimensions appropriately.  The
> names
>> of course are set in a static array right now, and there isn't any way to
>> override them.
>> 
>> My question is can you describe how what we have now is different than
>> what I had been cooking in libLAS?  How was this design arrived at?
>> 
>> In libLAS, I had a boost::multi_index that included a random lookup and a
>> name-based lookup.  Because of libLAS' point-at-a-time nature, this meant
>> doing map look ups in the critical path and killed performance.  But this
>> wouldn't be necessary with our PointBuffer-based approach, and dimension
>> positions within a schema could continue to be fetched from outside the
>> loop and passed into methods that would expect a position and a width to
>> fetch.
>> 
>> We maintain an array of names, and a dimension type that is used to do
>> random lookups in the dimensions array of Schema. . The slots are named to
>> avoid doing so in the critical path, and we do our dimension name ->
>> dimension slot lookup outside loops where we're fetching data.  Could we
>> not continue to do that with the SchemaLayout maintaining the multi_index?
>> 
>> My complaint is that as we add more and more data providers, the
>> Dimension Type system we've currently outlined is going to start feeling
> like
>> quite a straight jacket.   The QFIT driver really needs two "X" dimensions
>> because there's an X dimension of the sensor and X dimension of the
>> measurement, per-point.  So these are logically X,  if we're to follow the
>> dimension type system, but they can't be modeled that way because we
>> can't have two X's.
>> 
>> What could happen if we threw out the dimension type system we have
>> now?  What would be the consequences?  What if the user "marked" their
>> XYZ dimensions (first three are to be assumed that if no dimensions are
>> marked) for use with bounds/windows/etc?
>> 
>> Sorry for the still-a-bit-incompletely-formed-thoughts,
>> 
>> Howard =
> 




More information about the pdal mailing list