[gdal-dev] Fast Pixel Access

David Baker (Geoscience) david.m.baker at chk.com
Mon Feb 10 05:14:38 PST 2014


Even,

No not an i386...  A Dell Precision T3500 w/Intel W3680 @ 3.33GHhz 6x2 cores with 12.0GB.  Thought the data is on the network, not local, with 1Gbps access.  The GDAL_DISABLE_READDIR_ON_OPEN = TRUE did significantly increase the speed.  Does the BIL driver read the whole file into memory first?  Might a direct read be faster?

And Even, please excuse my ignorance, but what is "gdb?"  I really would like to do the profiling.

David

-----Original Message-----
From: Even Rouault [mailto:even.rouault at mines-paris.org]
Sent: Sunday, February 09, 2014 6:36 AM
To: David Baker (Geoscience)
Cc: 'Brian Case'; 'gdal-dev at lists.osgeo.org'
Subject: Re: [gdal-dev] Fast Pixel Access

Le samedi 01 février 2014 15:04:46, David Baker (Geoscience) a écrit :
> Evan,
>
> I am not sure how to profile as I do not have access to the code to
> profile.  I did do a timing test...
>
> vrt file = 22,970 KB
> bil file = 35,180 KB * 55,501
>
> I piped five locations from the loc.txt file:
> -96.0 36.0
> -98.0 37.0
> -100.0 38.0
> -99.0 39.0
> -101.0 35.0
>
> gdallocationinfo -valonly -geoloc intermap.vrt < loc.txt
> 189.841857910156        25.5 sec
> 384.857452392578        22.6 sec
> 762.015930175781        22.9 sec
> 550.719116210938        23.6 sec
> 883.637023925781        22.9 sec
>
> Note: I used a lap timer on my iPhone to capture the split times as the
> results appeared in the console window.  Does this give any insight?

Woo I agree that's utterly slow ! When you mentionned slow I thought it was
more in the order of 0.1 second ! We can already exclude the parsing time of
the VRT since you do that in the same gdallocationinfo session and that there
will be just one parsing.
And I can't believe that the intersection test for 55 000 rectangles takes ~
20 seconds, unless you have an old i386 at 5 MHz ;-)
My usual way of profiling stuff that is slow in the order of more than one
second is to run under gdb, break with Ctrl+C, display the stack trace,
continue the run, break again, display the stack trace, etc.. If you end up
breaking in the same function, then you've found the bottleneck.

I see now that in that thread GDAL_DISABLE_READDIR_ON_OPEN = TRUE was
suggested and seems to improve things significantly. Perhaps we should try to
cache the result of the initial readdir so it can benefits to later attempts,
but I haven't checked how easily that could be miplemented. Or perhaps we
should just change the default value of GDAL_DISABLE_READDIR_ON_OPEN since it
causes problem from time to time.
But generally filesystems don't behave very well when there are a lot of files
in the same directory. You'd better organizing your tiles in subdirectories.
But still 1 to 3 seconds sounds a bit slow to me. Would be cool if you could
try the above suggestion to identify where the time is spent.

Even

>
> David
>
> -----Original Message-----
> From: gdal-dev-bounces at lists.osgeo.org
> [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Even Rouault Sent:
> Saturday, February 01, 2014 1:28 AM
> To: Brian Case
> Cc: gdal-dev at lists.osgeo.org
> Subject: Re: [gdal-dev] Fast Pixel Access
>
> Le samedi 01 février 2014 00:23:13, Brian Case a écrit :
> > evenr
> >
> >
> > what about the use of a tileindex?
>
> You really mean a tileindex as produced by gdaltindex ? Well, that's not
> exactly the same beast as a VRT, but yes if it was recognized as a GDAL
> dataset then you could potentially save the cost of XML parsing. One could
> imagine that the VRT driver would accept a tileindex as an altenate
> connection string.
>
> Anyway it would be interesting to first profile where the time is spent in
> David use case. If it's in the XML parsing, then I can't see what could be
> easily improved in that area. If it's the intersection, then there's
> potential for improvement.
>
> > seems an intersection with a set of
> > polys first would be quick
> >
> >
> >
> > brian
> >
> > On Fri, 2014-01-31 at 19:30 +0100, Even Rouault wrote:
> > > Le vendredi 31 janvier 2014 17:15:53, David Baker (Geoscience) a écrit :
> > > > Dev's,
> > > >
> > > > I have a set of 55,501 bil files in a single directory.  They are
> > > > DEMS data that cover the US in 7.5 minute tiles.  I would like to
> > > > randomly access elevations at a given lat/lon's from the whole
> > > > dataset.  I created a vrt file from the directory of bil files, and
> > > > have been able to access the elevation at a given lat/lon using
> > > > gdallocationinfo, but because of the size of the dataset, this
> > > > operation is somewhat slow. Can the vrt be indexed?
> > >
> > > No, it isn't currently, although I think it could be improved to have a
> > > in- memory index with moderate effort.
> > >
> > > But are you sure the slowness is due to the lack of index ? 55,000 is a
> > > big number, but not that big. Maybe the slowness just comes from the
> > > opening time (XML parsing) of such a big VRT. That would need to be
> > > profiled to be sure where the bottleneck is.
> > >
> > > > Or, is there a faster, better way to access the pixels?  I would
> > > > first like to do this with the utilities before diving into code
> > > > (C#). The files are regularly named base on their location within a
> > > > 1 arc-second grid.
> > > >
> > > > Thanks,
> > > > David
> > > >
> > > > David M. Baker
> > > > Senior Advisor - Geoscience Technology
> > > > Chesapeake Energy Corporation
> > > > david.m.baker at chk.com<mailto:david.m.baker at chk.com>
> > > >
> > > >
> > > > ________________________________
> > > >
> > > > This email (and attachments if any) is intended only for the use of
> > > > the individual or entity to which it is addressed, and may contain
> > > > information that is confidential or privileged and exempt from
> > > > disclosure under applicable law. If the reader of this email is not
> > > > the intended recipient, or the employee or agent responsible for
> > > > delivering this message to the intended recipient, you are hereby
> > > > notified that any dissemination, distribution or copying of this
> > > > communication is strictly prohibited. If you have received this
> > > > communication in error, please notify the sender immediately by
> > > > return email and destroy all copies of the email (and attachments if
> > > > any).
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> ________________________________
>
> This email (and attachments if any) is intended only for the use of the
> individual or entity to which it is addressed, and may contain information
> that is confidential or privileged and exempt from disclosure under
> applicable law. If the reader of this email is not the intended recipient,
> or the employee or agent responsible for delivering this message to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If
> you have received this communication in error, please notify the sender
> immediately by return email and destroy all copies of the email (and
> attachments if any).

--
Geospatial professional services
http://even.rouault.free.fr/services.html

________________________________

This email (and attachments if any) is intended only for the use of the individual or entity to which it is addressed, and may contain information that is confidential or privileged and exempt from disclosure under applicable law. If the reader of this email is not the intended recipient, or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return email and destroy all copies of the email (and attachments if any).


More information about the gdal-dev mailing list