[Liblas-devel] Dealing with "bad" data

Thu Nov 4 12:29:51 EDT 2010

On 04/11/10 16:07, Howard Butler wrote:
> Should we be hard asses and always throw an error?

Providing two modes for processing LAS files: strict and transitional.

> Do our best to recover on a case-by-case basis?

Sounds like GDAL's approach to WKT and such.

> The most common case of bad data that I've seen is invalid point
> counts in the header.  An accurate point count isn't so important for
> LAS 1.0-1.2 data because you can provide a calculated point count by
> measuring the size of the file, removing the header, and dividing
> that value by the number of bytes each point takes.

If number of points in header is invalid
    Read until one of the following is true
       End of file
       Number of consumed points equals number reported by header

> In this most common case, I propose the following:
>
> For LAS 1.0-1.2 we will use a calculated point count if the header's
>  value does not match the expected point count *and* the actual point
>  data contains the exact number of bytes required to completely
> contain points (ie, point_data % point_format == 0).

This kind of implicit fixing of broken data stays in contradiction to
performance requirements.

Could be applied in transitional. In strict mode, just give up.

> For LAS 1.3 data, we're going to just blindly believe the header, and
> do no checking.  If the modulo function fails, an exception is going
> to be thrown with some numbers that someone could do some simple math
> to maybe have a chance at figuring out what's going on.
>
> Sound good?

I don't know. Broken standards always suck as standards.

However, see XHTML, it's not die hard always, but allows users to 
consciously choose between string and transitional mode, and validate
their data against selected mode.

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Member of ACCU, http://accu.org