[GRASS-user] Does v.kernel have to take 16+ hours?

Aren Cambre aren at arencambre.com
Sat Nov 17 07:06:42 PST 2012


I have a dataset of just over 700,000 incidents that happened in square-ish
Texas county that's about 30 miles on each side.

Here's the parameters reported by v.kernel as it's executing:

*STDDEV: 1000.000000*
*RES: 111.419043 ROWS: 458   COLS: 447*
*
*
*Writing output raster map using smooth parameter=1000.000000.*
*
*
*Normalising factor=6482635.018778.*


I am running this on a Windows 7 x64 machine with 8 GB RAM and an Intel
Core i7 Q720 1.6 GHz with 4 physical cores. I notice that it's not
multithreaded, only using 1 core.

It takes about 16 hours to complete. Is this correct? I'd like to use this
on a dataset with closer to 5 million records, and I'm really concerned how
long it may take.

I tried the v.kernel again with about 10X more cells on each dimension,
just to see what would happen. It ran for several hours, and based on the
progress meter, I guess it would take less than a day to complete.
Unfortunately, it crashed mid-way. See
http://trac.osgeo.org/grass/ticket/1800.

I posted my question about the 16+ hours at
http://gis.stackexchange.com/questions/41058/how-do-i-compute-v-kernel-maps-in-less-than-16-hours/.
Bill Huber, who si apparently knowledgeable about kernel density
calculations in general, posted a response, and he felt like a kernel
density map shouldn't take much time at all. But digging more deeply, turns
out he had come up with a kernel density calculation method over a decade
ago using Fourier transforms. See
http://www.directionsmag.com/features/convolution/129753 and the next two
articles linked to it (they are short articles). Apparently this transforms
it from an O(n^2) problem to an O(n ln n) complexity problem.

I inspected v.kernel's main.c (
http://trac.osgeo.org/grass/browser/grass/trunk/vector/v.kernel/main.c),
and looks like v.kernel uses an output-centric method (using Bill's
wording) of calculating the output, which seems like O(n^2) complexity.

So I guess what I'm getting at is it appears to me that the algorithm
behind GRASS GIS's v.kernel is straightforward but is a greedy algorithm (
http://en.wikipedia.org/wiki/Greedy_algorithm), which is fine, but it make
take a while to execute. Is this true?

Is there not spatial indexing I could add to the dataset? I've done various
Google searches on that and can't come up with anything clear.

Aren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20121117/8fcca7ae/attachment.html>


More information about the grass-user mailing list