[Liblas-devel] Dealing with "bad" data

Howard Butler hobu.inc at gmail.com
Thu Nov 4 12:07:45 EDT 2010


All,

There are a number of softwares that are quite lax in how they write LAS files.  Some of the things I've found softwares doing include:

* miswriting and generally screwing things up in the header, but having a legitimate offset so you could read points
* writing invalid point counts in the header (very common)
* following the extremely broken LAS 1.3 R10 specification that had a 7*long return count in the header instead of the required and expected 5*long

This email asks what should be our default stance should be in the face of bad data.  Some things, like an invalid point count, are partially recoverable, but attempts to reconcile many other will often result in proliferating bad data.  Should we be hard asses and always throw an error?  Do our best to recover on a case-by-case basis?  

The most common case of bad data that I've seen is invalid point counts in the header.  An accurate point count isn't so important for LAS 1.0-1.2 data because you can provide a calculated point count by measuring the size of the file, removing the header, and dividing that value by the number of bytes each point takes.  It is very important for LAS 1.3 data because waveform data can exist after the point data.  

In this most common case, I propose the following:

For LAS 1.0-1.2 we will use a calculated point count if the header's value does not match the expected point count *and* the actual point data contains the exact number of bytes required to completely contain points (ie, point_data % point_format == 0).  For LAS 1.3 data, we're going to just blindly believe the header, and do no checking.  If the modulo function fails, an exception is going to be thrown with some numbers that someone could do some simple math to maybe have a chance at figuring out what's going on.

Sound good?

Howard




More information about the Liblas-devel mailing list