[Liblas-devel] Design/architecture questions

Tue Nov 30 11:57:14 EST 2010

On Nov 30, 2010, at 9:50 AM, Michael Gerlek wrote:

> Dear liblas-dev:
> 
> I've got some questions...  I've already got some partial answers from
> Hobu off-list, but he suggested I reflect this back to the list for
> wider discussion.
> 
> 
> 1. I need some info on the policies for memory management of Headers
> (and possibly Points).  It looks like the Header data is kept in a
> reference counted object (HeaderPtr), and I think the intent is that
> the Header data kept live with the Points returned, so they can be
> interpreted later.  Is that right?

That was the intent, although the situation is a slight bit more insidious than that :)  Previously, liblas::Point instances had no concept of a layout -- only a all-encompassing fixed set of fields that one would ever want to interact with on the liblas::Point.  Mateusz and I have now developed a liblas::Schema class that can be used to describe the layout of the points, including custom layouts of your own choosing.  A liblas::Schema describes a single point, and a liblas::Header describes a set of points.  With that in mind, upon construction, each liblas::Point carries a reference to a DefaultHeader singleton which has its own default (point record 3 in LAS-speak) schema.  A liblas::Reader can then come along and issue SetHeaderPtr and reset the Header (and implicitly Schema) of the point.  If the point is carrying a HeaderPtr, it will always use that for operations (like scaling/descaling, etc) instead of the DefaultHeader.

In summary, you do not need to set the liblas::Point's HeaderPtr if you are ok with using the DefaultHeader.  Doing so might require you to properly rescale the xyz data so it can fit within the confines of the DefaultHeader's scale/offset.  It also might mean carrying attribute information (like color, time, etc) that you don't need.

> 2. Auto_ptr is used for ReaderImpl and some others.  What is the intent here?

I hope Mateusz can chime in on this one.  I never fully understood the intent here, but I'm kind of a dolt on these things.

> 
> 3. The hierarchy of Reader, ReaderImpl, ReaderI, CachedReader, etc, is
> a bit confusing, so a couple words of explanation would be helpful.  I
> am creating a new reader type for laszip files: I
> *think* I want my "LazReader" to be a ReaderI passed into the
> libas::Reader object, and Howard agreed.  He also said
> 
>> ReaderI -- interface
>> Reader -- original implementation <-- Implements ReaderI
>> ReaderImpl -- concrete implementation that Reader uses to do its work and implement ReaderI
>> CachedReader(Impl) -- a point caching implementation (with a bad name)
> 
> So at some level, I should think of Reader and ReaderImpl as really
> being LasReader and LasReaderImpl?

Yep, although you should get in the habit of reading them as liblas::Reader and liblas::ReaderImpl.  I removed the redundant "LASReader" in a grand-renaming effort a number of months ago, but it means that there are now things like liblas::detail::reader::Header and liblas::Header and they are two very different things.

> 
> Is ReaderI intended to be the basis for all future extensions?  

Yep.  Although I don't think the dust has settled on the actual virtuals of both the ReaderI and WriterI.  I don't know of any implementations of those yet other than libLAS', and I suspect there's quite a bit of room for improvement.

> One
> could imagine (and Howard suggested) a factory which spits out
> ReaderI's based on given inputs (based on file extensions or magic
> numbers or whathaveyou, see next question).

Mateusz had taken on some of this effort, but backed off after it became apparent that it would require tearing apart a number of things.  I hope he can explain what he thinks might be needed to pick this back up.

> 
> 
> 4. How to determine if a file is regular LAS or LASzip?  

It might not be a file.  It might be a stream of bytes from stdin or a database or something.

> I'm not sure
> what the canonical extension is (.laz?), but Howard suggests we should
> actually be cracking open the header and looking at it.
> 
> He said:
> 
>> A. The point data format id is 8 bits long. For a compressed data indicator, we will set the highest bit to zero (in addition to setting the lower bits as
>> before based on the point type) for a compressed data point format id set of 128, 129, 130, 131, 132, ...
>> (1 << 7) + 1 # <-- point format 1
>> (1 << 7) + 2 # <-- point format 2, etc.
>> The rationale for this approach is another software could potentially read the header but not recognize the point format.
>> 
>> B. The point record length will be set to 0.  This should halt a software that ignores the point format from reading any (mangled) data from the rest
>> of the file/stream/db entry.
>> 
>> C. VLR records (possibly) exist that inform the compression.  We will have at least one or two different VLR types to hint the compression.  One for a
>> chunk size, and maybe another for the compression type.
> 
> This logic likely belongs in a Factory, then?

Yep.