[Liblas-devel] ReadPointAt bug
Howard Butler
hobu.inc at gmail.com
Mon Apr 21 20:17:41 EDT 2008
Pete,
I have found the bug, but I'm not sure the fix for it is overly
desirable.
Here is the file 155mb I used for testing:
ftp://ftp.igsb.uiowa.edu/Lidar_FTP/Data/non_iowa/puget_sound/LAS/06144792.las
lasinfo says the following:
>
> ---------------------------------------------------------
> Header Summary
> ---------------------------------------------------------
> File Name: ..\test\data\06144792.las
> Version: 1.0
> Source ID: 0
> Reserved: 0
> Project ID/GUID: '00000000-0000-0000-0000-000000000000'
> System Identifier: ''
> Generating Software: 'CPS/RTW LAS Lib v1.08'
> File Creation Day/Year: 0/0
> Header Size 227
> Offset to Point Data 639
> Number Var. Length Records 3
> Point Data Format 1
> Point Data Record Length 28
> Number of Point Records 5847380
> Number of Points by Return 2920986 2926394 0 0 0
> Scale Factor X Y Z 0.001000 0.001000 0.001000
> Offset X Y Z 0.000000 4000000.000000 0.000000
> Min X Y Z 614000.000000 4791999.999000 311.676000
> Max X Y Z 615999.990000 4793999.989000 398.959000
Some strange things:
1) the Point Data Format is 1
2) the Point Data Record Length is 28
3) the Version is 1.0
I modified your simple Python script to be the following:
> from liblas import file
> f = file.File(r'..\test\data\06144792.las')
> i = 0
> for i in range(100):
> p = f.read(i)
> print p.x, p.y, p.z
> print 'return ', p.return_number, ' of ', p.number_of_returns
> print 'cls = ', p.classification, ' edge = ', p.flightline_edge
> if i ==3:
> break
> f.close()
> f = file.File(r'..\test\data\06144792.las')
> i = 0
> for p in f:
> print p.x, p.y, p.z
> print 'return ', p.return_number, ' of ', p.number_of_returns
> print 'cls = ', p.classification, ' edge = ', p.flightline_edge
> i=i+1
> if i ==3:
> break
> f.close()
There are two different reading methods that are possible with
libLAS. The first method, expressed in Python as the 'for i in
range(100)' call, actually calls a method down in the C++ library
called ReadPointAt. Every time a read is done, the stream is
repositioned to the point position specified (i==0, i==1, i==2, ...
i==n), and the point is read. The second reading method, expressed in
Python as the 'for i in f' call, calls GetNextPoint over and over
again. This reading method is going to be much faster than the
GetPointAt method because the stream does not need to be repositioned
for every read.
The problem with this particular file and our GetPointAt method is
that we were repositioning the stream as follows:
> std::streamsize pos = (static_cast<std::streamsize>(n) *
> LASHeader::ePointSize0) + m_offset;
For this file, it meant we moving through the file in 20 byte chunks
because the header said it was 1.0. The problem is that the point
record length for this file is actually 28 bytes, as you described.
A simple fix is to just cache the point record length in the reader
implementation and then reposition with:
> std::streamsize pos = (static_cast<std::streamsize>(n) *
> m_recordlength) + m_offset;
I'm not sure, and we would need a spec hound to verify, but I think
that the file in question is not following the spec. It *says* it is
1.0, but it has a 28 byte record length (this would imply a 1.1
version file, right)? So, who are we to believe? The file version,
like we were doing, or the record length, which in this case produces
the correct results?
Mateusz, do you have anything to add on this?
Howard
On Apr 21, 2008, at 2:24 PM, Kollasch, Pete [DNR] wrote:
> Howard,
>
> I am not finished testing, but I thought I should let you know that I
> have found something that appears to be anomalous. Because in the
> manual checks I was seeing anomalies in the return number and number
> of
> returns, I wrote a small python program that allowed me to look at a
> number of point records more easily. The program is attached. Some
> typical output follows below. Notice the cyclicity of similar values
> every 7 records. It appears that 20 bytes are being read for each
> point, where 28 bytes should be read (in the second record below,
> the z
> value displayed is actually an x value). I suspect (but can't be
> sure)
> that this is affecting only the python bindings (since I was getting
> what I felt were credible results from the utilities before). It is
> also possible that in the data we have received from Sanborn (which
> I am
> using for testing), the actual data format, and the specified data
> format, are at variance with each other.
>
> The results below were from our Sanborn data, but I got the same
> results
> with some Puget sound data that happens to be on our FTP site. The
> following link should get you to a file you can test with:
>
> ftp://ftp.igsb.uiowa.edu/Lidar_FTP/Data/non_iowa/puget_sound/LAS/
>
> I renamed the file Puget.las for ease of reference. Hit enter to see
> the next record.
>
> Pete.
>
>
> 397999.979 4807590.689 353.157
> return 1 of 1
> cls = 12 edge = 0
>
> 3440834.058 5092498.576 397999.929
> return 2 of 4
> cls = 48 edge = 0
>
> 206110.93 4007143.442 3441146.627
> return 6 of 3
> cls = 65 edge = 0
>
> 807589.969 4000353.347 206110.915
> return 5 of 5
> cls = 0 edge = 0
>
> 1092498.576 4397999.869 807589.679
> return 5 of 0
> cls = 0 edge = 0
>
> 7143.442 7441771.031 1092498.576
> return 0 of 7
> cls = 23 edge = 1
>
> 353.817 4206110.874 7143.443
> return 2 of 5
> cls = 205 edge = 0
>
> 397999.799 4807589.009 353.897
> return 1 of 1
> cls = 12 edge = 0
>
> 3442396.168 5092498.576 397999.769
> return 2 of 4
> cls = 48 edge = 0
>
> 206110.907 4007143.443 3442708.737
> return 6 of 3
> cls = 65 edge = 0
>
> 807588.369 4000354.177 206110.888
> return 5 of 5
> cls = 0 edge = 0
>
> 1092498.576 4397999.689 807588.049
> return 5 of 0
> cls = 0 edge = 0
>
> 7143.443 7443333.141 1092498.576
> return 0 of 7
> cls = 23 edge = 1
>
> 354.077 4206110.924 7143.443
> return 1 of 0
> cls = 205 edge = 0
>
> 397999.559 4807587.179 353.727
> return 1 of 1
> cls = 12 edge = 0
> <petes.py>
More information about the Liblas-devel
mailing list