[GRASS-user] v.generalize: does it take forever?

Sat Feb 14 20:14:11 PST 2015

On Mon, Feb 9, 2015 at 4:52 PM, Fábio Dias <fabio.dias at gmail.com> wrote:

> I switched to postgis for data storage and the v.generalize time went
> down to 130ish minutes, all processes working in parallel.
>
> I'm happy now :) thanks you guys very much.
>

Thanks for reporting this back. What about a blog post, or something like
that, on this topic? I believe there is a lot of people interested in some
benchmarks.

Vaclav

-=--=-=-
> Fábio Augusto Salve Dias
> ICMC - USP
> http://sites.google.com/site/fabiodias/
>
>
> On Tue, Jan 27, 2015 at 8:50 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
> > Hi,
> >
> > I've kept an iotop, cumulative, running while the generalization run.
> > No disk IO involved, just a couple of postgre stats. I believe the OS
> > is keeping everything in RAM cache. I don't believe the disk is a
> > bottleneck either, it is a 10 disk raid of 15k rpm disks, it's really
> > fast.
> >
> > I interrupted the processing, moved everything into postgres and
> > started over. I'm still loading the shapefiles (that I'm doing one at
> > a time), I'll start the 15 processes as soon as it is loaded. As soon
> > as that stabilizes, I'll report back.
> >
> >
> > On a related note, wouldn't it be interesting to always try to
> > simplify a line? As I understood the code, if the simplification fails
> > for any reason, the original line is used. Might be interesting to
> > have an option that makes the algorithm retry that line, with half the
> > threshold, for instance. It's kind of weird for me to see one side of
> > something really simplified while the other side really complicated :)
> >
> > F
> > -=--=-=-
> > Fábio Augusto Salve Dias
> > ICMC - USP
> > http://sites.google.com/site/fabiodias/
> >
> >
> > On Tue, Jan 27, 2015 at 7:56 PM, Markus Metz
> > <markus.metz.giswork at gmail.com> wrote:
> >> On Mon, Jan 26, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com>
> wrote:
> >>> Hi,
> >>>
> >>> The machine has 128Gb of ram. Doesn't matter what I do, I can't make a
> >>> dent on it. Even with everything cached in ram (shp files, database,
> >>> the whole lot), there is still free memory.
> >>
> >> OK, it's not RAM.
> >>
> >>>
> >>> I'm asking about the database because the behavior I'm seeing on 'top'
> >>> looks like the one you get when mutexes are involved. The processes
> >>> don't go all to 100% processing at same time (and the machine has 64
> >>> processors, so no dent there either), except for the v.in.ogr.
> >>
> >> The v.generailze processes should be at 100% while generalizing,
> >> unless the disk can not keep up with multiple simultaneous IO
> >> requests. The tables are copied only after the generalization finished
> >> (100% reached).
> >>
> >>> What it
> >>> looks like is that something is locking each process and they are
> >>> taking turns. Considering how 'lite' the sqlite appears to be, and the
> >>> weird locking behavior that was mentioned in other threads (I'm not
> >>> getting the locked message here... I did, when I was running 2
> >>> parallel v.in.ogr), isn't it likely to be the culprit? Should I change
> >>> it to a more 'non-lite' system, like postgres for instance?
> >>
> >> That could make sense when running multiple processes in parallel. An
> >> alternative would be to create a separate mapset for each process and
> >> at the end copy the results back to the main mapset.
> >>
> >> Technically, it is not possible that the new v.generalize version in
> >> trunk (G71) is slower than the old version as in relbr70 because the
> >> new version updates only those parts of the topology that actually get
> >> changed. The old version also updates components that do not get
> >> changed, this is quite time-consuming.
> >>
> >> I understand you like to go for the big nail immediately, but maybe it
> >> is worth testing first on a smaller sample?
> >>
> >> Markus M
> >>
> >>>
> >>> F
> >>> -=--=-=-
> >>> Fábio Augusto Salve Dias
> >>> ICMC - USP
> >>> http://sites.google.com/site/fabiodias/
> >>>
> >>>
> >>> On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz
> >>> <markus.metz.giswork at gmail.com> wrote:
> >>>> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz
> >>>> <markus.metz.giswork at gmail.com> wrote:
> >>>>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias at gmail.com>
> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Running r64249, with a couple of stuff in parallel using &. It seems
> >>>>>> to be considerably slower.
> >>>>>
> >>>>> Very strange. Are you using trunk or GRASS 7.0?
> >>>>
> >>>> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13
> seconds.
> >>>>
> >>>>>
> >>>>>> More than 100h, no 1% printed. To be fair,
> >>>>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize
> >>>>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
> >>>>>> running for almost 100h too. I'm loading the shps directly, as
> advised
> >>>>>> way, way back in this thread.
> >>>>>
> >>>>> What exactly do you mean with "loading shps directly"? For
> >>>>> v.generalize, you should import them with v.in.ogr.
> >>>>>
> >>>>> What about memory consumption on your system? With 10 v.generalize +
> 1
> >>>>> v.in.ogr on such a big dataset, quite a lot of memory would be used.
> >>>>>
> >>>>> Markus M
> >>>>>
> >>>>>>
> >>>>>> AFAIK, no disk is been used, the whole thing is cached (after more
> >>>>>> than 24h processing, cumulative iotop shows only a few mb
> >>>>>> written/read). I'm no longer using a ramdisk for the grassdata dir.
> >>>>>>
> >>>>>> However, it appears to be considerably slower, probably because of
> the
> >>>>>> parallel running jobs.
> >>>>>>
> >>>>>> My question then would be, considering the thread I saw about
> sqlite,
> >>>>>> should I be using something else as backend? When it starts to make
> >>>>>> sense to change it?
> >>>>>>
> >>>>>> F
> >>>>>>
> >>>>>> -=--=-=-
> >>>>>> Fábio Augusto Salve Dias
> >>>>>> ICMC - USP
> >>>>>> http://sites.google.com/site/fabiodias/
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler at osgeo.org>
> wrote:
> >>>>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com>
> wrote:
> >>>>>>> ...
> >>>>>>>> What would be the best way to do that in parallel? One mapset for
> each
> >>>>>>>> year? Can I run multiple v.generalizes on the same input with
> >>>>>>>> different outputs?
> >>>>>>>
> >>>>>>> Yes sure.
> >>>>>>>
> >>>>>>>> My first thought was to run completely separated grass processes
> for
> >>>>>>>> each simplification, but I didn't find a way to make it search
> >>>>>>>> something different than .grass / .grass70 for the configuration
> >>>>>>>> stuff....
> >>>>>>>
> >>>>>>> Maybe take a look at this approach
> >>>>>>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine
> >>>>>>>
> >>>>>>> but even sending different v.generalize jobs to background (&)
> should
> >>>>>>> work if you have enough RAM.
> >>>>>>>
> >>>>>>> markusN
> _______________________________________________
> grass-user mailing list
> grass-user at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20150214/2e9b84a6/attachment.html>