[GRASS-user] Does v.kernel have to take 16+ hours?
Markus Metz
markus.metz.giswork at gmail.com
Fri Dec 7 08:16:37 PST 2012
On Thu, Dec 6, 2012 at 10:19 PM, Aren Cambre <aren at arencambre.com> wrote:
> OK, I think I get it now. After you've hit 4 SDs (or even arguably 3 SDs!)
> the contribution of more points to that square fades to almost nothing
> without extreme clusters of points.
Right.
>
> Seems like this ought to be clarified in the UI and documentation? Should I
> file an enhancement request?
I would rather have the option 'stddeviation' renamed to 'radius',
kernel radius in map units, which it actually is currently for all
kernels but gaussian, and internally adjust standard deviation instead
of radius for the gaussian kernel. I guess that the kernel radius is
of more importance to users than the standard deviation. Makes sense?
Markus M
>
> BTW, I dropped down to a SD of 200, and it ran in 212 minutes. I'm trying a
> SD of 300 now but on a grid with 1/4 as many squares as I cut the rows and
> cols by half. We'll see how it does.
>
> Aren
>
>
> On Thu, Dec 6, 2012 at 12:17 PM, Markus Metz <markus.metz.giswork at gmail.com>
> wrote:
>>
>> On Thu, Dec 6, 2012 at 3:13 PM, Aren Cambre <aren at arencambre.com> wrote:
>> > Thanks. I am using EPSG:3081, and its unit is meters. So is it the case
>> > that
>> > a SD of x on EPSG:3081 means a search radius of 4x meters? If so, why
>> > 4x?
>>
>> Because the gaussian function is infinite, a cutoff of 3 to 6 SDs is
>> usually applied when using a gaussian function as kernel density
>> function. In theory, an alternative is to combine the infinite
>> gaussian function with a finite function, e.g. uniform. I think
>> historically v.kernel had only one kernel function, gaussian. The
>> original authors decided to ask for SD and not for the search radius
>> as input and have set the cutoff to 4 x SD where the gaussian function
>> is reasonably close to zero.
>>
>> BTW, search radius = 4 x SD applies only to the gaussian kernel, for
>> all other kernel functions the search radius is equal to SD.
>>
>> Markus M
>>
>> >
>> > On Thu, Dec 6, 2012 at 2:29 AM, Markus Metz
>> > <markus.metz.giswork at gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Dec 6, 2012 at 3:58 AM, Aren Cambre <aren at arencambre.com>
>> >> wrote:
>> >> > It's gotten slow again. This run will probably take more than 10
>> >> > hours.
>> >> > However, I am using a standard deviation of 1000. Is that what could
>> >> > be
>> >> > causing this?
>> >>
>> >> Yes. With a standard deviation of 1000, the search radius is now 4000,
>> >> that is, for each cell a 8000x8000 box is searched. With many densely
>> >> packed points, this can take quite some time.
>> >>
>> >> Markus M
>> >>
>> >> >
>> >> > v.kernel input=tickets at PERMANENT output=tickets_new_heatmap_1000
>> >> > stddeviation=1000
>> >> > STDDEV: 1000.000000
>> >> > RES: 18.290457 ROWS: 2370 COLS: 2650
>> >> >
>> >> > Writing output raster map using smooth parameter=1000.000000.
>> >> >
>> >> > Normalising factor=6482635.018778.
>> >> >
>> >> > On Sat, Nov 24, 2012 at 9:03 PM, Aren Cambre <aren at arencambre.com>
>> >> > wrote:
>> >> >>
>> >> >> I installed r53983. The v.kernel execution that took almost a day
>> >> >> now
>> >> >> executes in 25.5 minutes. Thank you!
>> >> >>
>> >> >> Aren
>> >> >>
>> >> >>
>> >> >> On Fri, Nov 23, 2012 at 12:51 PM, Markus Metz
>> >> >> <markus.metz.giswork at gmail.com> wrote:
>> >> >>>
>> >> >>> On Fri, Nov 23, 2012 at 5:35 PM, Aren Cambre <aren at arencambre.com>
>> >> >>> wrote:
>> >> >>> > Thanks!
>> >> >>> >
>> >> >>> > I am not familiar with GRASS's release customs. Will this become
>> >> >>> > part
>> >> >>> > of a
>> >> >>> > binary release soon, or should I just pull down the latest
>> >> >>> > release
>> >> >>> > in
>> >> >>> > the
>> >> >>> > 6.4.2 trunk? I'm assuming this has been merged into the trunk...
>> >> >>>
>> >> >>> It should be available as a binary for Windows by tomorrow in the
>> >> >>> nightly builds [0].
>> >> >>>
>> >> >>> Markus M
>> >> >>>
>> >> >>> [0] http://wingrass.fsv.cvut.cz/grass64/
>> >> >>>
>> >> >>> >
>> >> >>> > Aren
>> >> >>> >
>> >> >>> >
>> >> >>> > On Fri, Nov 23, 2012 at 7:32 AM, Markus Metz
>> >> >>> > <markus.metz.giswork at gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> On Fri, Nov 23, 2012 at 2:07 PM, Aren Cambre
>> >> >>> >> <aren at arencambre.com>
>> >> >>> >> wrote:
>> >> >>> >> > Isn't taking about 10,000% too much time considered a bug? :-)
>> >> >>> >>
>> >> >>> >> Hmm, yes. v.kernel is fixed in devbr6 and relbr6 with r53982 and
>> >> >>> >> r53983, respectively.
>> >> >>> >>
>> >> >>> >> Markus M
>> >> >>> >>
>> >> >>> >> >
>> >> >>> >> > On Nov 23, 2012 5:11 AM, "Markus Metz"
>> >> >>> >> > <markus.metz.giswork at gmail.com>
>> >> >>> >> > wrote:
>> >> >>> >> >>
>> >> >>> >> >> On Fri, Nov 23, 2012 at 4:14 AM, Aren Cambre
>> >> >>> >> >> <aren at arencambre.com>
>> >> >>> >> >> wrote:
>> >> >>> >> >> > I'm able to reproduce reliably here. I'll email you details
>> >> >>> >> >> > privately.
>> >> >>> >> >>
>> >> >>> >> >> Thanks. I can confirm that v.kernel takes a long time in
>> >> >>> >> >> GRASS 6
>> >> >>> >> >> with
>> >> >>> >> >> the settings provided by you. It does not crash, however.
>> >> >>> >> >>
>> >> >>> >> >> I can speed up v.kernel in GRASS 6 to complete in 10 minutes
>> >> >>> >> >> instead
>> >> >>> >> >> of 16+ hours, but I am not sure if this fix can/will go into
>> >> >>> >> >> GRASS
>> >> >>> >> >> 6.4
>> >> >>> >> >> because by now only bugs should be fixed.
>> >> >>> >> >>
>> >> >>> >> >> Markus M
>> >> >>> >> >>
>> >> >>> >> >> >
>> >> >>> >> >> > Aren
>> >> >>> >> >> >
>> >> >>> >> >> >
>> >> >>> >> >> > On Thu, Nov 22, 2012 at 9:02 AM, Markus Metz
>> >> >>> >> >> > <markus.metz.giswork at gmail.com>
>> >> >>> >> >> > wrote:
>> >> >>> >> >> >>
>> >> >>> >> >> >> On Sat, Nov 17, 2012 at 4:06 PM, Aren Cambre
>> >> >>> >> >> >> <aren at arencambre.com>
>> >> >>> >> >> >> wrote:
>> >> >>> >> >> >> > I have a dataset of just over 700,000 incidents that
>> >> >>> >> >> >> > happened
>> >> >>> >> >> >> > in
>> >> >>> >> >> >> > square-ish
>> >> >>> >> >> >> > Texas county that's about 30 miles on each side.
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > Here's the parameters reported by v.kernel as it's
>> >> >>> >> >> >> > executing:
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > STDDEV: 1000.000000
>> >> >>> >> >> >> > RES: 111.419043 ROWS: 458 COLS: 447
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > Writing output raster map using smooth
>> >> >>> >> >> >> > parameter=1000.000000.
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > Normalising factor=6482635.018778.
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > I am running this on a Windows 7 x64 machine with 8 GB
>> >> >>> >> >> >> > RAM
>> >> >>> >> >> >> > and
>> >> >>> >> >> >> > an
>> >> >>> >> >> >> > Intel
>> >> >>> >> >> >> > Core
>> >> >>> >> >> >> > i7 Q720 1.6 GHz with 4 physical cores. I notice that
>> >> >>> >> >> >> > it's
>> >> >>> >> >> >> > not
>> >> >>> >> >> >> > multithreaded,
>> >> >>> >> >> >> > only using 1 core.
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > It takes about 16 hours to complete. Is this correct?
>> >> >>> >> >> >> > I'd
>> >> >>> >> >> >> > like
>> >> >>> >> >> >> > to
>> >> >>> >> >> >> > use
>> >> >>> >> >> >> > this
>> >> >>> >> >> >> > on a dataset with closer to 5 million records, and I'm
>> >> >>> >> >> >> > really
>> >> >>> >> >> >> > concerned
>> >> >>> >> >> >> > how
>> >> >>> >> >> >> > long it may take.
>> >> >>> >> >> >>
>> >> >>> >> >> >> The time required by v.kernel is a function of the number
>> >> >>> >> >> >> of
>> >> >>> >> >> >> cells
>> >> >>> >> >> >> and
>> >> >>> >> >> >> the input parameter stddeviation. The larger any of these
>> >> >>> >> >> >> values
>> >> >>> >> >> >> is,
>> >> >>> >> >> >> the more time v.kernel will need. Nevertheless, I think
>> >> >>> >> >> >> that
>> >> >>> >> >> >> the
>> >> >>> >> >> >> 16+
>> >> >>> >> >> >> hours are not correct. I tested with a vector with 3
>> >> >>> >> >> >> million
>> >> >>> >> >> >> points
>> >> >>> >> >> >> for a grid with 2700 rows and 1087 columns, magnitudes
>> >> >>> >> >> >> larger
>> >> >>> >> >> >> than
>> >> >>> >> >> >> the
>> >> >>> >> >> >> grid used by you. v.kernel completes in just over one
>> >> >>> >> >> >> minute.
>> >> >>> >> >> >>
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > I posted my question about the 16+ hours at
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > http://gis.stackexchange.com/questions/41058/how-do-i-compute-v-kernel-maps-in-less-than-16-hours/.
>> >> >>> >> >> >> > Bill Huber, who si apparently knowledgeable about kernel
>> >> >>> >> >> >> > density
>> >> >>> >> >> >> > calculations in general, posted a response, and he felt
>> >> >>> >> >> >> > like a
>> >> >>> >> >> >> > kernel
>> >> >>> >> >> >> > density map shouldn't take much time at all. But digging
>> >> >>> >> >> >> > more
>> >> >>> >> >> >> > deeply,
>> >> >>> >> >> >> > turns
>> >> >>> >> >> >> > out he had come up with a kernel density calculation
>> >> >>> >> >> >> > method
>> >> >>> >> >> >> > over a
>> >> >>> >> >> >> > decade
>> >> >>> >> >> >> > ago using Fourier transforms. See
>> >> >>> >> >> >> > http://www.directionsmag.com/features/convolution/129753
>> >> >>> >> >> >> > and
>> >> >>> >> >> >> > the
>> >> >>> >> >> >> > next
>> >> >>> >> >> >> > two
>> >> >>> >> >> >> > articles linked to it (they are short articles).
>> >> >>> >> >> >> > Apparently
>> >> >>> >> >> >> > this
>> >> >>> >> >> >> > transforms
>> >> >>> >> >> >> > it from an O(n^2) problem to an O(n ln n) complexity
>> >> >>> >> >> >> > problem.
>> >> >>> >> >> >>
>> >> >>> >> >> >> The approach of Bill Huber is raster-based, not vector
>> >> >>> >> >> >> based,
>> >> >>> >> >> >> making
>> >> >>> >> >> >> some things easier, at the cost of precision. The
>> >> >>> >> >> >> coordinate
>> >> >>> >> >> >> precision, however, is only needed for kernel functions
>> >> >>> >> >> >> other
>> >> >>> >> >> >> than
>> >> >>> >> >> >> uniform. In GRASS, you could get something like a
>> >> >>> >> >> >> raster-based
>> >> >>> >> >> >> density
>> >> >>> >> >> >> map by
>> >> >>> >> >> >>
>> >> >>> >> >> >> - exporting the points with v.out.ascii
>> >> >>> >> >> >> - re-importing the points with r.in.xyz method=n to get
>> >> >>> >> >> >> the
>> >> >>> >> >> >> number
>> >> >>> >> >> >> of
>> >> >>> >> >> >> points per cell
>> >> >>> >> >> >> - running a neighborhood analysis using a circular window
>> >> >>> >> >> >> with
>> >> >>> >> >> >> r.neighbors method=sum -c
>> >> >>> >> >> >>
>> >> >>> >> >> >> Optionally you could use the gauss option of r.neighbors
>> >> >>> >> >> >> to
>> >> >>> >> >> >> get
>> >> >>> >> >> >> an
>> >> >>> >> >> >> equivalent to v.kernel kernel=gaussian
>> >> >>> >> >> >>
>> >> >>> >> >> >> HTH,
>> >> >>> >> >> >>
>> >> >>> >> >> >> Markus M
>> >> >>> >> >> >>
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > I inspected v.kernel's main.c
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > (http://trac.osgeo.org/grass/browser/grass/trunk/vector/v.kernel/main.c),
>> >> >>> >> >> >> > and looks like v.kernel uses an output-centric method
>> >> >>> >> >> >> > (using
>> >> >>> >> >> >> > Bill's
>> >> >>> >> >> >> > wording)
>> >> >>> >> >> >> > of calculating the output, which seems like O(n^2)
>> >> >>> >> >> >> > complexity.
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > So I guess what I'm getting at is it appears to me that
>> >> >>> >> >> >> > the
>> >> >>> >> >> >> > algorithm
>> >> >>> >> >> >> > behind
>> >> >>> >> >> >> > GRASS GIS's v.kernel is straightforward but is a greedy
>> >> >>> >> >> >> > algorithm
>> >> >>> >> >> >> > (http://en.wikipedia.org/wiki/Greedy_algorithm), which
>> >> >>> >> >> >> > is
>> >> >>> >> >> >> > fine,
>> >> >>> >> >> >> > but
>> >> >>> >> >> >> > it
>> >> >>> >> >> >> > make
>> >> >>> >> >> >> > take a while to execute. Is this true?
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > Is there not spatial indexing I could add to the
>> >> >>> >> >> >> > dataset?
>> >> >>> >> >> >> > I've
>> >> >>> >> >> >> > done
>> >> >>> >> >> >> > various
>> >> >>> >> >> >> > Google searches on that and can't come up with anything
>> >> >>> >> >> >> > clear.
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > Aren
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > _______________________________________________
>> >> >>> >> >> >> > grass-user mailing list
>> >> >>> >> >> >> > grass-user at lists.osgeo.org
>> >> >>> >> >> >> > http://lists.osgeo.org/mailman/listinfo/grass-user
>> >> >>> >> >> >> >
>> >> >>> >> >> >
>> >> >>> >> >> >
>> >> >>> >
>> >> >>> >
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>
More information about the grass-user
mailing list