[GRASS-user] Does v.kernel have to take 16+ hours?

Aren Cambre aren at arencambre.com
Fri Dec 7 10:12:17 PST 2012


I opened a ticket: http://trac.osgeo.org/grass/ticket/1828

On Fri, Dec 7, 2012 at 10:16 AM, Markus Metz
<markus.metz.giswork at gmail.com>wrote:

> On Thu, Dec 6, 2012 at 10:19 PM, Aren Cambre <aren at arencambre.com> wrote:
> > OK, I think I get it now. After you've hit 4 SDs (or even arguably 3
> SDs!)
> > the contribution of more points to that square fades to almost nothing
> > without extreme clusters of points.
>
> Right.
> >
> > Seems like this ought to be clarified in the UI and documentation?
> Should I
> > file an enhancement request?
>
> I would rather have the option 'stddeviation' renamed to 'radius',
> kernel radius in map units, which it actually is currently for all
> kernels but gaussian, and internally adjust standard deviation instead
> of radius for the gaussian kernel. I guess that the kernel radius is
> of more importance to users than the standard deviation. Makes sense?
>
> Markus M
>
> >
> > BTW, I dropped down to a SD of 200, and it ran in 212 minutes. I'm
> trying a
> > SD of 300 now but on a grid with 1/4 as many squares as I cut the rows
> and
> > cols by half. We'll see how it does.
> >
> > Aren
> >
> >
> > On Thu, Dec 6, 2012 at 12:17 PM, Markus Metz <
> markus.metz.giswork at gmail.com>
> > wrote:
> >>
> >> On Thu, Dec 6, 2012 at 3:13 PM, Aren Cambre <aren at arencambre.com>
> wrote:
> >> > Thanks. I am using EPSG:3081, and its unit is meters. So is it the
> case
> >> > that
> >> > a SD of x on EPSG:3081 means a search radius of 4x meters? If so, why
> >> > 4x?
> >>
> >> Because the gaussian function is infinite, a cutoff of 3 to 6 SDs is
> >> usually applied when using a gaussian function as kernel density
> >> function. In theory, an alternative is to combine the infinite
> >> gaussian function with a finite function, e.g. uniform. I think
> >> historically v.kernel had only one kernel function, gaussian. The
> >> original authors decided to ask for SD and not for the search radius
> >> as input and have set the cutoff to 4 x SD where the gaussian function
> >> is reasonably close to zero.
> >>
> >> BTW, search radius = 4 x SD applies only to the gaussian kernel, for
> >> all other kernel functions the search radius is equal to SD.
> >>
> >> Markus M
> >>
> >> >
> >> > On Thu, Dec 6, 2012 at 2:29 AM, Markus Metz
> >> > <markus.metz.giswork at gmail.com>
> >> > wrote:
> >> >>
> >> >> On Thu, Dec 6, 2012 at 3:58 AM, Aren Cambre <aren at arencambre.com>
> >> >> wrote:
> >> >> > It's gotten slow again. This run will probably take more than 10
> >> >> > hours.
> >> >> > However, I am using a standard deviation of 1000. Is that what
> could
> >> >> > be
> >> >> > causing this?
> >> >>
> >> >> Yes. With a standard deviation of 1000, the search radius is now
> 4000,
> >> >> that is, for each cell a 8000x8000 box is searched. With many densely
> >> >> packed points, this can take quite some time.
> >> >>
> >> >> Markus M
> >> >>
> >> >> >
> >> >> > v.kernel input=tickets at PERMANENT output=tickets_new_heatmap_1000
> >> >> > stddeviation=1000
> >> >> > STDDEV: 1000.000000
> >> >> > RES: 18.290457 ROWS: 2370 COLS: 2650
> >> >> >
> >> >> > Writing output raster map using smooth parameter=1000.000000.
> >> >> >
> >> >> > Normalising factor=6482635.018778.
> >> >> >
> >> >> > On Sat, Nov 24, 2012 at 9:03 PM, Aren Cambre <aren at arencambre.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> I installed r53983. The v.kernel execution that took almost a day
> >> >> >> now
> >> >> >> executes in 25.5 minutes. Thank you!
> >> >> >>
> >> >> >> Aren
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Nov 23, 2012 at 12:51 PM, Markus Metz
> >> >> >> <markus.metz.giswork at gmail.com> wrote:
> >> >> >>>
> >> >> >>> On Fri, Nov 23, 2012 at 5:35 PM, Aren Cambre <
> aren at arencambre.com>
> >> >> >>> wrote:
> >> >> >>> > Thanks!
> >> >> >>> >
> >> >> >>> > I am not familiar with GRASS's release customs. Will this
> become
> >> >> >>> > part
> >> >> >>> > of a
> >> >> >>> > binary release soon, or should I just pull down the latest
> >> >> >>> > release
> >> >> >>> > in
> >> >> >>> > the
> >> >> >>> > 6.4.2 trunk? I'm assuming this has been merged into the
> trunk...
> >> >> >>>
> >> >> >>> It should be available as a binary for Windows by tomorrow in the
> >> >> >>> nightly builds [0].
> >> >> >>>
> >> >> >>> Markus M
> >> >> >>>
> >> >> >>> [0] http://wingrass.fsv.cvut.cz/grass64/
> >> >> >>>
> >> >> >>> >
> >> >> >>> > Aren
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > On Fri, Nov 23, 2012 at 7:32 AM, Markus Metz
> >> >> >>> > <markus.metz.giswork at gmail.com>
> >> >> >>> > wrote:
> >> >> >>> >>
> >> >> >>> >> On Fri, Nov 23, 2012 at 2:07 PM, Aren Cambre
> >> >> >>> >> <aren at arencambre.com>
> >> >> >>> >> wrote:
> >> >> >>> >> > Isn't taking about 10,000% too much time considered a bug?
> :-)
> >> >> >>> >>
> >> >> >>> >> Hmm, yes. v.kernel is fixed in devbr6 and relbr6 with r53982
> and
> >> >> >>> >> r53983, respectively.
> >> >> >>> >>
> >> >> >>> >> Markus M
> >> >> >>> >>
> >> >> >>> >> >
> >> >> >>> >> > On Nov 23, 2012 5:11 AM, "Markus Metz"
> >> >> >>> >> > <markus.metz.giswork at gmail.com>
> >> >> >>> >> > wrote:
> >> >> >>> >> >>
> >> >> >>> >> >> On Fri, Nov 23, 2012 at 4:14 AM, Aren Cambre
> >> >> >>> >> >> <aren at arencambre.com>
> >> >> >>> >> >> wrote:
> >> >> >>> >> >> > I'm able to reproduce reliably here. I'll email you
> details
> >> >> >>> >> >> > privately.
> >> >> >>> >> >>
> >> >> >>> >> >> Thanks. I can confirm that v.kernel takes a long time in
> >> >> >>> >> >> GRASS 6
> >> >> >>> >> >> with
> >> >> >>> >> >> the settings provided by you. It does not crash, however.
> >> >> >>> >> >>
> >> >> >>> >> >> I can speed up v.kernel in GRASS 6 to complete in 10
> minutes
> >> >> >>> >> >> instead
> >> >> >>> >> >> of 16+ hours, but I am not sure if this fix can/will go
> into
> >> >> >>> >> >> GRASS
> >> >> >>> >> >> 6.4
> >> >> >>> >> >> because by now only bugs should be fixed.
> >> >> >>> >> >>
> >> >> >>> >> >> Markus M
> >> >> >>> >> >>
> >> >> >>> >> >> >
> >> >> >>> >> >> > Aren
> >> >> >>> >> >> >
> >> >> >>> >> >> >
> >> >> >>> >> >> > On Thu, Nov 22, 2012 at 9:02 AM, Markus Metz
> >> >> >>> >> >> > <markus.metz.giswork at gmail.com>
> >> >> >>> >> >> > wrote:
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> On Sat, Nov 17, 2012 at 4:06 PM, Aren Cambre
> >> >> >>> >> >> >> <aren at arencambre.com>
> >> >> >>> >> >> >> wrote:
> >> >> >>> >> >> >> > I have a dataset of just over 700,000 incidents that
> >> >> >>> >> >> >> > happened
> >> >> >>> >> >> >> > in
> >> >> >>> >> >> >> > square-ish
> >> >> >>> >> >> >> > Texas county that's about 30 miles on each side.
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > Here's the parameters reported by v.kernel as it's
> >> >> >>> >> >> >> > executing:
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > STDDEV: 1000.000000
> >> >> >>> >> >> >> > RES: 111.419043 ROWS: 458   COLS: 447
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > Writing output raster map using smooth
> >> >> >>> >> >> >> > parameter=1000.000000.
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > Normalising factor=6482635.018778.
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > I am running this on a Windows 7 x64 machine with 8 GB
> >> >> >>> >> >> >> > RAM
> >> >> >>> >> >> >> > and
> >> >> >>> >> >> >> > an
> >> >> >>> >> >> >> > Intel
> >> >> >>> >> >> >> > Core
> >> >> >>> >> >> >> > i7 Q720 1.6 GHz with 4 physical cores. I notice that
> >> >> >>> >> >> >> > it's
> >> >> >>> >> >> >> > not
> >> >> >>> >> >> >> > multithreaded,
> >> >> >>> >> >> >> > only using 1 core.
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > It takes about 16 hours to complete. Is this correct?
> >> >> >>> >> >> >> > I'd
> >> >> >>> >> >> >> > like
> >> >> >>> >> >> >> > to
> >> >> >>> >> >> >> > use
> >> >> >>> >> >> >> > this
> >> >> >>> >> >> >> > on a dataset with closer to 5 million records, and I'm
> >> >> >>> >> >> >> > really
> >> >> >>> >> >> >> > concerned
> >> >> >>> >> >> >> > how
> >> >> >>> >> >> >> > long it may take.
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> The time required by v.kernel is a function of the
> number
> >> >> >>> >> >> >> of
> >> >> >>> >> >> >> cells
> >> >> >>> >> >> >> and
> >> >> >>> >> >> >> the input parameter stddeviation. The larger any of
> these
> >> >> >>> >> >> >> values
> >> >> >>> >> >> >> is,
> >> >> >>> >> >> >> the more time v.kernel will need. Nevertheless, I think
> >> >> >>> >> >> >> that
> >> >> >>> >> >> >> the
> >> >> >>> >> >> >> 16+
> >> >> >>> >> >> >> hours are not correct. I tested with a vector with 3
> >> >> >>> >> >> >> million
> >> >> >>> >> >> >> points
> >> >> >>> >> >> >> for a grid with 2700 rows and 1087 columns, magnitudes
> >> >> >>> >> >> >> larger
> >> >> >>> >> >> >> than
> >> >> >>> >> >> >> the
> >> >> >>> >> >> >> grid used by you. v.kernel completes in just over one
> >> >> >>> >> >> >> minute.
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > I posted my question about the 16+ hours at
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> http://gis.stackexchange.com/questions/41058/how-do-i-compute-v-kernel-maps-in-less-than-16-hours/
> .
> >> >> >>> >> >> >> > Bill Huber, who si apparently knowledgeable about
> kernel
> >> >> >>> >> >> >> > density
> >> >> >>> >> >> >> > calculations in general, posted a response, and he
> felt
> >> >> >>> >> >> >> > like a
> >> >> >>> >> >> >> > kernel
> >> >> >>> >> >> >> > density map shouldn't take much time at all. But
> digging
> >> >> >>> >> >> >> > more
> >> >> >>> >> >> >> > deeply,
> >> >> >>> >> >> >> > turns
> >> >> >>> >> >> >> > out he had come up with a kernel density calculation
> >> >> >>> >> >> >> > method
> >> >> >>> >> >> >> > over a
> >> >> >>> >> >> >> > decade
> >> >> >>> >> >> >> > ago using Fourier transforms. See
> >> >> >>> >> >> >> >
> http://www.directionsmag.com/features/convolution/129753
> >> >> >>> >> >> >> > and
> >> >> >>> >> >> >> > the
> >> >> >>> >> >> >> > next
> >> >> >>> >> >> >> > two
> >> >> >>> >> >> >> > articles linked to it (they are short articles).
> >> >> >>> >> >> >> > Apparently
> >> >> >>> >> >> >> > this
> >> >> >>> >> >> >> > transforms
> >> >> >>> >> >> >> > it from an O(n^2) problem to an O(n ln n) complexity
> >> >> >>> >> >> >> > problem.
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> The approach of Bill Huber is raster-based, not vector
> >> >> >>> >> >> >> based,
> >> >> >>> >> >> >> making
> >> >> >>> >> >> >> some things easier, at the cost of precision. The
> >> >> >>> >> >> >> coordinate
> >> >> >>> >> >> >> precision, however, is only needed for kernel functions
> >> >> >>> >> >> >> other
> >> >> >>> >> >> >> than
> >> >> >>> >> >> >> uniform. In GRASS, you could get something like a
> >> >> >>> >> >> >> raster-based
> >> >> >>> >> >> >> density
> >> >> >>> >> >> >> map by
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> - exporting the points with v.out.ascii
> >> >> >>> >> >> >> - re-importing the points with r.in.xyz method=n to get
> >> >> >>> >> >> >> the
> >> >> >>> >> >> >> number
> >> >> >>> >> >> >> of
> >> >> >>> >> >> >> points per cell
> >> >> >>> >> >> >> - running a neighborhood analysis using a circular
> window
> >> >> >>> >> >> >> with
> >> >> >>> >> >> >> r.neighbors method=sum -c
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> Optionally you could use the gauss option of r.neighbors
> >> >> >>> >> >> >> to
> >> >> >>> >> >> >> get
> >> >> >>> >> >> >> an
> >> >> >>> >> >> >> equivalent to v.kernel kernel=gaussian
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> HTH,
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> Markus M
> >> >> >>> >> >> >>
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > I inspected v.kernel's main.c
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > (
> http://trac.osgeo.org/grass/browser/grass/trunk/vector/v.kernel/main.c),
> >> >> >>> >> >> >> > and looks like v.kernel uses an output-centric method
> >> >> >>> >> >> >> > (using
> >> >> >>> >> >> >> > Bill's
> >> >> >>> >> >> >> > wording)
> >> >> >>> >> >> >> > of calculating the output, which seems like O(n^2)
> >> >> >>> >> >> >> > complexity.
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > So I guess what I'm getting at is it appears to me
> that
> >> >> >>> >> >> >> > the
> >> >> >>> >> >> >> > algorithm
> >> >> >>> >> >> >> > behind
> >> >> >>> >> >> >> > GRASS GIS's v.kernel is straightforward but is a
> greedy
> >> >> >>> >> >> >> > algorithm
> >> >> >>> >> >> >> > (http://en.wikipedia.org/wiki/Greedy_algorithm),
> which
> >> >> >>> >> >> >> > is
> >> >> >>> >> >> >> > fine,
> >> >> >>> >> >> >> > but
> >> >> >>> >> >> >> > it
> >> >> >>> >> >> >> > make
> >> >> >>> >> >> >> > take a while to execute. Is this true?
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > Is there not spatial indexing I could add to the
> >> >> >>> >> >> >> > dataset?
> >> >> >>> >> >> >> > I've
> >> >> >>> >> >> >> > done
> >> >> >>> >> >> >> > various
> >> >> >>> >> >> >> > Google searches on that and can't come up with
> anything
> >> >> >>> >> >> >> > clear.
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > Aren
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >> > _______________________________________________
> >> >> >>> >> >> >> > grass-user mailing list
> >> >> >>> >> >> >> > grass-user at lists.osgeo.org
> >> >> >>> >> >> >> > http://lists.osgeo.org/mailman/listinfo/grass-user
> >> >> >>> >> >> >> >
> >> >> >>> >> >> >
> >> >> >>> >> >> >
> >> >> >>> >
> >> >> >>> >
> >> >> >>
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20121207/c312a1bc/attachment-0001.html>


More information about the grass-user mailing list