[libpc] RE: Dimension/Schema discussion

Mon May 9 12:01:30 EDT 2011

I agree -- this is something I've been aware of, but not thought through to
any solutions yet.   What we have doesn't scale very well, and now that
we've got more formats/features coming online, the problem is becoming
clearer.  I guess you're just the first one to hit it hard enough to decide
we need to fix it :-)

If we went with your "mulit-index lookup, but called outside the PointBuffer
loop" concept, one thing I'd like to see is we put back the concept of
"readBegin/readEnd", which wrap the set of "read PointBuffer" calls.  In
this readBegin function, then, we could put the multi-index lookup code, so
it would be truly only once per full read operation, as opposed to once per
chunk of the full read.

Making this change would break a lot of code, but the breakage would be
largely "syntactic" and we've got good unit tests, so I'd not be at all
concerned about trying it.

Unfortunately, I'm booked up all today (including the chunked zip code), am
out of the office all day tomorrow, and have some other things on my plate
for the next couple days after that.

How much of a block is this for you right now?  Do you want to just dive in
now, or do you want to bounce some ideas around for a while, or wait for me
to do it later on, or..?

-mpg

> -----Original Message-----
> From: Howard Butler [mailto:hobu.inc at gmail.com]
> Sent: Monday, May 09, 2011 8:43 AM
> To: Michael Gerlek
> Cc: libpc at lists.osgeo.org
> Subject: Dimension/Schema discussion
> 
> Michael,
> 
> An immediate thing I noticed while implementing a non-LAS-conforming
> driver Friday is the desire to name my dimensions appropriately.  The
names
> of course are set in a static array right now, and there isn't any way to
> override them.
> 
> My question is can you describe how what we have now is different than
> what I had been cooking in libLAS?  How was this design arrived at?
> 
> In libLAS, I had a boost::multi_index that included a random lookup and a
> name-based lookup.  Because of libLAS' point-at-a-time nature, this meant
> doing map look ups in the critical path and killed performance.  But this
> wouldn't be necessary with our PointBuffer-based approach, and dimension
> positions within a schema could continue to be fetched from outside the
> loop and passed into methods that would expect a position and a width to
> fetch.
> 
> We maintain an array of names, and a dimension type that is used to do
> random lookups in the dimensions array of Schema. . The slots are named to
> avoid doing so in the critical path, and we do our dimension name ->
> dimension slot lookup outside loops where we're fetching data.  Could we
> not continue to do that with the SchemaLayout maintaining the multi_index?
> 
> My complaint is that as we add more and more data providers, the
> Dimension Type system we've currently outlined is going to start feeling
like
> quite a straight jacket.   The QFIT driver really needs two "X" dimensions
> because there's an X dimension of the sensor and X dimension of the
> measurement, per-point.  So these are logically X,  if we're to follow the
> dimension type system, but they can't be modeled that way because we
> can't have two X's.
> 
> What could happen if we threw out the dimension type system we have
> now?  What would be the consequences?  What if the user "marked" their
> XYZ dimensions (first three are to be assumed that if no dimensions are
> marked) for use with bounds/windows/etc?
> 
> Sorry for the still-a-bit-incompletely-formed-thoughts,
> 
> Howard =