[Liblas-devel] Re: Liblas-devel Digest, Vol 35, Issue 4

Charles A. Cowart charliec at sdsc.edu
Fri Nov 5 17:46:27 EDT 2010


> For LAS 1.0-1.2 we will use a calculated point count if the header's  
> value does not match the expected point count *and* the actual point  
> data contains the exact number of bytes required to completely  
> contain points (ie, point_data % point_format == 0).  For LAS 1.3  
> data, we're going to just blindly believe the header, and do no  
> checking.  If the modulo function fails, an exception is going to be  
> thrown with some numbers that someone could do some simple math to  
> maybe have a chance at figuring out what's going on.

Hello everyone,

If I may, I think it would be preferable to have lasinfo explicitly  
enforce the specifications and return descriptive errors for where the  
data fails; over time, it will be more valuable to the community to  
have a standard to test by. As others have suggested it may be  
valuable to have an option to ignore inconsistencies between the  
header metadata and the data itself so long as the data segment  
remains identifiable. Putting the above suggestion into say the liblas  
library code or lasinfo would force the user to understand the very  
situational nature of what is being checked and when; over time it may  
be difficult to remember why this code was inserted but removing it  
will be inherently bad for compatibility.

I would suggest a new utility (as others have also proposed) such as  
LASRepair or LASValidate that could be a repository for all the  
situational code that would be needed to bring versions into  
alignment. As part of the OpenTopography group here at SDSC, I'd be  
happy to help write this utility if volunteers are needed.

Best Regards,

Charles Cowart
OpenTopography.org

On Nov 4, 2010, at 11:54 AM, liblas-devel-request at lists.osgeo.org wrote:

> Send Liblas-devel mailing list submissions to
> 	liblas-devel at lists.osgeo.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.osgeo.org/mailman/listinfo/liblas-devel
> or, via email, send a message with subject or body 'help' to
> 	liblas-devel-request at lists.osgeo.org
>
> You can reach the person managing the list at
> 	liblas-devel-owner at lists.osgeo.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Liblas-devel digest..."
>
>
> Today's Topics:
>
>   1. Dealing with "bad" data (Howard Butler)
>   2. Re: Dealing with "bad" data (Mateusz Loskot)
>   3. Re: Dealing with "bad" data (Andrew Bell)
>   4. Re: Dealing with "bad" data (Volker Wichmann)
>   5. Re: Dealing with "bad" data (Mateusz Loskot)
>   6. Re: Dealing with "bad" data (Volker Wichmann)
>   7. Re: Dealing with "bad" data (Mike Grant)
>   8. Re: LAS 1.3 point support working for upcoming libLAS	1.6
>      (Mike Grant)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 4 Nov 2010 11:07:45 -0500
> From: Howard Butler <hobu.inc at gmail.com>
> Subject: [Liblas-devel] Dealing with "bad" data
> To: "Liblas-devel at lists.osgeo.org" <liblas-devel at lists.osgeo.org>
> Message-ID: <574F0E63-07AE-40A4-BD29-762A03CB70BD at gmail.com>
> Content-Type: text/plain; charset=us-ascii
>
> All,
>
> There are a number of softwares that are quite lax in how they write  
> LAS files.  Some of the things I've found softwares doing include:
>
> * miswriting and generally screwing things up in the header, but  
> having a legitimate offset so you could read points
> * writing invalid point counts in the header (very common)
> * following the extremely broken LAS 1.3 R10 specification that had  
> a 7*long return count in the header instead of the required and  
> expected 5*long
>
> This email asks what should be our default stance should be in the  
> face of bad data.  Some things, like an invalid point count, are  
> partially recoverable, but attempts to reconcile many other will  
> often result in proliferating bad data.  Should we be hard asses and  
> always throw an error?  Do our best to recover on a case-by-case  
> basis?
>
> The most common case of bad data that I've seen is invalid point  
> counts in the header.  An accurate point count isn't so important  
> for LAS 1.0-1.2 data because you can provide a calculated point  
> count by measuring the size of the file, removing the header, and  
> dividing that value by the number of bytes each point takes.  It is  
> very important for LAS 1.3 data because waveform data can exist  
> after the point data.
>
> In this most common case, I propose the following:
>
> For LAS 1.0-1.2 we will use a calculated point count if the header's  
> value does not match the expected point count *and* the actual point  
> data contains the exact number of bytes required to completely  
> contain points (ie, point_data % point_format == 0).  For LAS 1.3  
> data, we're going to just blindly believe the header, and do no  
> checking.  If the modulo function fails, an exception is going to be  
> thrown with some numbers that someone could do some simple math to  
> maybe have a chance at figuring out what's going on.
>
> Sound good?
>
> Howard
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 04 Nov 2010 16:29:51 +0000
> From: Mateusz Loskot <mateusz at loskot.net>
> Subject: Re: [Liblas-devel] Dealing with "bad" data
> To: Howard Butler <hobu.inc at gmail.com>
> Cc: "Liblas-devel at lists.osgeo.org" <liblas-devel at lists.osgeo.org>
> Message-ID: <4CD2DF7F.2040005 at loskot.net>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 04/11/10 16:07, Howard Butler wrote:
>> Should we be hard asses and always throw an error?
>
> Providing two modes for processing LAS files: strict and transitional.
>
>> Do our best to recover on a case-by-case basis?
>
> Sounds like GDAL's approach to WKT and such.
>
>> The most common case of bad data that I've seen is invalid point
>> counts in the header.  An accurate point count isn't so important for
>> LAS 1.0-1.2 data because you can provide a calculated point count by
>> measuring the size of the file, removing the header, and dividing
>> that value by the number of bytes each point takes.
>
> If number of points in header is invalid
>    Read until one of the following is true
>       End of file
>       Number of consumed points equals number reported by header
>
>> In this most common case, I propose the following:
>>
>> For LAS 1.0-1.2 we will use a calculated point count if the header's
>> value does not match the expected point count *and* the actual point
>> data contains the exact number of bytes required to completely
>> contain points (ie, point_data % point_format == 0).
>
> This kind of implicit fixing of broken data stays in contradiction to
> performance requirements.
>
> Could be applied in transitional. In strict mode, just give up.
>
>> For LAS 1.3 data, we're going to just blindly believe the header, and
>> do no checking.  If the modulo function fails, an exception is going
>> to be thrown with some numbers that someone could do some simple math
>> to maybe have a chance at figuring out what's going on.
>>
>> Sound good?
>
> I don't know. Broken standards always suck as standards.
>
> However, see XHTML, it's not die hard always, but allows users to
> consciously choose between string and transitional mode, and validate
> their data against selected mode.
>
> Best regards,
> -- 
> Mateusz Loskot, http://mateusz.loskot.net
> Charter Member of OSGeo, http://osgeo.org
> Member of ACCU, http://accu.org
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 4 Nov 2010 11:43:10 -0500
> From: Andrew Bell <andrew.bell.ia at gmail.com>
> Subject: Re: [Liblas-devel] Dealing with "bad" data
> To: Howard Butler <hobu.inc at gmail.com>
> Cc: "Liblas-devel at lists.osgeo.org" <liblas-devel at lists.osgeo.org>
> Message-ID:
> 	<AANLkTimwWuyJ5OCVCPTWGFBVucks_+WWwsdKLbiq9t1O at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Thu, Nov 4, 2010 at 11:07 AM, Howard Butler <hobu.inc at gmail.com>  
> wrote:
>> All,
>>
>> There are a number of softwares that are quite lax in how they  
>> write LAS files.  Some of the things I've found softwares doing  
>> include:
>>
>> * miswriting and generally screwing things up in the header, but  
>> having a legitimate offset so you could read points
>> * writing invalid point counts in the header (very common)
>> * following the extremely broken LAS 1.3 R10 specification that had  
>> a 7*long return count in the header instead of the required and  
>> expected 5*long
>>
>> This email asks what should be our default stance should be in the  
>> face of bad data.  Some things, like an invalid point count, are  
>> partially recoverable, but attempts to reconcile many other will  
>> often result in proliferating bad data.  Should we be hard asses  
>> and always throw an error?  Do our best to recover on a case-by- 
>> case basis?
>
> If you are detecting the problem anyway, why not just throw the
> exception and if someone has the energy, write a utility that will try
> to coerce the bad data into good data.  That way you don't have to
> clutter the code with non-conforming crap but would still have a way
> to get to something useful.  This is pretty typical for DBs.
>
> -- 
> Andrew Bell
> andrew.bell.ia at gmail.com
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 04 Nov 2010 18:06:09 +0100
> From: Volker Wichmann <wichmann at laserdata.at>
> Subject: Re: [Liblas-devel] Dealing with "bad" data
> To: liblas-devel at lists.osgeo.org
> Message-ID: <4CD2E801.7010904 at laserdata.at>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Am 04.11.2010 17:43, schrieb Andrew Bell:
>> On Thu, Nov 4, 2010 at 11:07 AM, Howard Butler<hobu.inc at gmail.com>   
>> wrote:
>>> All,
>>>
>>> There are a number of softwares that are quite lax in how they  
>>> write LAS files.  Some of the things I've found softwares doing  
>>> include:
>>>
>>> * miswriting and generally screwing things up in the header, but  
>>> having a legitimate offset so you could read points
>>> * writing invalid point counts in the header (very common)
>>> * following the extremely broken LAS 1.3 R10 specification that  
>>> had a 7*long return count in the header instead of the required  
>>> and expected 5*long
>>>
>>> This email asks what should be our default stance should be in the  
>>> face of bad data.  Some things, like an invalid point count, are  
>>> partially recoverable, but attempts to reconcile many other will  
>>> often result in proliferating bad data.  Should we be hard asses  
>>> and always throw an error?  Do our best to recover on a case-by- 
>>> case basis?
>>
>> If you are detecting the problem anyway, why not just throw the
>> exception and if someone has the energy, write a utility that will  
>> try
>> to coerce the bad data into good data.  That way you don't have to
>> clutter the code with non-conforming crap but would still have a way
>> to get to something useful.  This is pretty typical for DBs.
>>
>
> I'm in favour of this approach too - provide a utility which tries to
> fix broken files but always throw an exception in case a file is not
> compliant to the specification. This will allow users to have some
> control on how to handle such files.
>
> Volker
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 04 Nov 2010 17:22:18 +0000
> From: Mateusz Loskot <mateusz at loskot.net>
> Subject: Re: [Liblas-devel] Dealing with "bad" data
> To: Volker Wichmann <wichmann at laserdata.at>
> Cc: liblas-devel at lists.osgeo.org
> Message-ID: <4CD2EBCA.4060805 at loskot.net>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 04/11/10 17:06, Volker Wichmann wrote:
>> provide a utility which tries to fix broken files but always throw an
>> exception in case a file is not compliant to the specification
>
> For me, the two parts of this stay in contradiction.
> Always throwing if file is not compliant, means never accept broken
> data. Never accept broken data, implies not try to fix it.
>
> Whatever it is called, strict and transitional mode (enable/disable  
> with
> single switch for libLAS utility) or a separate utility trying to
> recover whatever is recoverable from broken data...
> The job of implementing those utils repairing broken data produced by
> companies loosely interpreting ASPRS LAS will be on expanses of  
> libLAS.
>
> Best regards,
> -- 
> Mateusz Loskot, http://mateusz.loskot.net
> Charter Member of OSGeo, http://osgeo.org
> Member of ACCU, http://accu.org
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 04 Nov 2010 18:31:44 +0100
> From: Volker Wichmann <wichmann at laserdata.at>
> Subject: Re: [Liblas-devel] Dealing with "bad" data
> To: liblas-devel at lists.osgeo.org
> Message-ID: <4CD2EE00.5010901 at laserdata.at>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Am 04.11.2010 18:22, schrieb Mateusz Loskot:
>> On 04/11/10 17:06, Volker Wichmann wrote:
>>> provide a utility which tries to fix broken files but always throw  
>>> an
>>> exception in case a file is not compliant to the specification
>>
>> For me, the two parts of this stay in contradiction.
>> Always throwing if file is not compliant, means never accept broken
>> data.
>
> Yes, this is what I'm in favour.
>
>> Never accept broken data, implies not try to fix it.
>
> Yes, I think this is not something which libLAS needs to deal with.
>
>>
>> Whatever it is called, strict and transitional mode (enable/disable  
>> with
>> single switch for libLAS utility) or a separate utility trying to
>> recover whatever is recoverable from broken data...
>> The job of implementing those utils repairing broken data produced by
>> companies loosely interpreting ASPRS LAS will be on expanses of  
>> libLAS.
>
> I think this is in contradiction - in case you implement some fallback
> mechanisms in libLAS you will already provide something like a  
> "tool". I
> fully agree that it is impossible to provide a utility which allows to
> fix almost all broken files - but I think such a utility could be more
> easily extended to catch specific errors than libLAS core. Why not  
> start
> with a utility that catches the issues mentioned by Howard?
>
> best regards,
> Volker
>
>
> ------------------------------
>
> Message: 7
> Date: Thu, 04 Nov 2010 18:25:49 +0000
> From: "Mike Grant" <mggr at pml.ac.uk>
> Subject: Re: [Liblas-devel] Dealing with "bad" data
> To: "Liblas-devel at lists.osgeo.org" <liblas-devel at lists.osgeo.org>
> Message-ID: <4CD2FAAD.8000204 at pml.ac.uk>
> Content-Type: text/plain;	charset="ISO-8859-1"
>
> On 04/11/10 16:07, Howard Butler wrote:
>> This email asks what should be our default stance should be in the
>> face of bad data.  Some things, like an invalid point count, are
>> partially recoverable, but attempts to reconcile many other will
>> often result in proliferating bad data.  Should we be hard asses and
>> always throw an error?  Do our best to recover on a case-by-case
>> basis?
>
> I'd just throw an exception when a LAS file isn't standards compliant
> (+1 hard ass).  A separate tool can be written that catches these and
> tries to fix up files.  This keeps things clean and gives direct
> feedback on naughty LAS files.
>
> If you want to include additional code to handle bad data, it would be
> definitely be nice to be have a flag that enables or disables this
> behaviour (the strict/loose interpretation Mateusz suggests).
>
> Cheers,
>
> Mike.
>
> --------------------------------------------------------------------------------
> Plymouth Marine Laboratory
>
> Registered Office:
> Prospect Place
> The Hoe
> Plymouth  PL1 3DH
>
> Website: www.pml.ac.uk
> Registered Charity No. 1091222
> PML is a company limited by guarantee
> registered in England & Wales
> company number 4178503
>
> --------------------------------------------------------------------------------
> This e-mail, its content and any file attachments are confidential.
>
> If you have received this e-mail in error please do not copy,  
> disclose it to any third party or use the contents or attachments in  
> any way. Please notify the sender by replying to this e-mail or e- 
> mail forinfo at pml.ac.uk and then delete the email without making any  
> copies or using it in any other way.
>
> The content of this message may contain personal views which are not  
> the views of Plymouth Marine Laboratory unless specifically stated.
>
> You are reminded that e-mail communications are not secure and may  
> contain viruses. Plymouth Marine Laboratory accepts no liability for  
> any loss or damage which may be caused by viruses.
> --------------------------------------------------------------------------------
>
>
> ------------------------------
>
> Message: 8
> Date: Thu, 04 Nov 2010 18:54:35 +0000
> From: "Mike Grant" <mggr at pml.ac.uk>
> Subject: Re: [Liblas-devel] LAS 1.3 point support working for upcoming
> 	libLAS	1.6
> To: "Howard Butler" <hobu.inc at gmail.com>
> Cc: "Liblas-devel at lists.osgeo.org" <liblas-devel at lists.osgeo.org>
> Message-ID: <4CD3016B.7030003 at pml.ac.uk>
> Content-Type: text/plain;	charset="ISO-8859-1"
>
> On 28/07/10 14:19, Mike Grant wrote:
>> On 27/07/10 20:53, Howard Butler wrote:
>>> Your file is properly broken now :)
>>
>> Perfect - that gives me more incentive to nag for the new processor
>> release ;)
>>
>> It might be worth marking the sample I gave you as broken until we  
>> get a
>> new one, so as not to confuse people.
>
> I'm reminded by the talk of bad files that we got an update to our LAS
> 1.3 processor :)  See if this reprocessed sample blows anything else  
> up..
>
> http://arsf-dan.nerc.ac.uk/files/NERC-ARSF-LAS1_3-sample-release2.tar.bz2
>
> Cheers,
>
> Mike.
>
> --------------------------------------------------------------------------------
> Plymouth Marine Laboratory
>
> Registered Office:
> Prospect Place
> The Hoe
> Plymouth  PL1 3DH
>
> Website: www.pml.ac.uk
> Registered Charity No. 1091222
> PML is a company limited by guarantee
> registered in England & Wales
> company number 4178503
>
> --------------------------------------------------------------------------------
> This e-mail, its content and any file attachments are confidential.
>
> If you have received this e-mail in error please do not copy,  
> disclose it to any third party or use the contents or attachments in  
> any way. Please notify the sender by replying to this e-mail or e- 
> mail forinfo at pml.ac.uk and then delete the email without making any  
> copies or using it in any other way.
>
> The content of this message may contain personal views which are not  
> the views of Plymouth Marine Laboratory unless specifically stated.
>
> You are reminded that e-mail communications are not secure and may  
> contain viruses. Plymouth Marine Laboratory accepts no liability for  
> any loss or damage which may be caused by viruses.
> --------------------------------------------------------------------------------
>
>
> ------------------------------
>
> _______________________________________________
> Liblas-devel mailing list
> Liblas-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/liblas-devel
>
>
> End of Liblas-devel Digest, Vol 35, Issue 4
> *******************************************



More information about the Liblas-devel mailing list