[GRASS-user] v.generalize: does it take forever?

Markus Metz markus.metz.giswork at gmail.com
Tue Jan 27 13:56:51 PST 2015


On Mon, Jan 26, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
> Hi,
>
> The machine has 128Gb of ram. Doesn't matter what I do, I can't make a
> dent on it. Even with everything cached in ram (shp files, database,
> the whole lot), there is still free memory.

OK, it's not RAM.

>
> I'm asking about the database because the behavior I'm seeing on 'top'
> looks like the one you get when mutexes are involved. The processes
> don't go all to 100% processing at same time (and the machine has 64
> processors, so no dent there either), except for the v.in.ogr.

The v.generailze processes should be at 100% while generalizing,
unless the disk can not keep up with multiple simultaneous IO
requests. The tables are copied only after the generalization finished
(100% reached).

> What it
> looks like is that something is locking each process and they are
> taking turns. Considering how 'lite' the sqlite appears to be, and the
> weird locking behavior that was mentioned in other threads (I'm not
> getting the locked message here... I did, when I was running 2
> parallel v.in.ogr), isn't it likely to be the culprit? Should I change
> it to a more 'non-lite' system, like postgres for instance?

That could make sense when running multiple processes in parallel. An
alternative would be to create a separate mapset for each process and
at the end copy the results back to the main mapset.

Technically, it is not possible that the new v.generalize version in
trunk (G71) is slower than the old version as in relbr70 because the
new version updates only those parts of the topology that actually get
changed. The old version also updates components that do not get
changed, this is quite time-consuming.

I understand you like to go for the big nail immediately, but maybe it
is worth testing first on a smaller sample?

Markus M

>
> F
> -=--=-=-
> Fábio Augusto Salve Dias
> ICMC - USP
> http://sites.google.com/site/fabiodias/
>
>
> On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz
> <markus.metz.giswork at gmail.com> wrote:
>> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz
>> <markus.metz.giswork at gmail.com> wrote:
>>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Running r64249, with a couple of stuff in parallel using &. It seems
>>>> to be considerably slower.
>>>
>>> Very strange. Are you using trunk or GRASS 7.0?
>>
>> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 seconds.
>>
>>>
>>>> More than 100h, no 1% printed. To be fair,
>>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize
>>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
>>>> running for almost 100h too. I'm loading the shps directly, as advised
>>>> way, way back in this thread.
>>>
>>> What exactly do you mean with "loading shps directly"? For
>>> v.generalize, you should import them with v.in.ogr.
>>>
>>> What about memory consumption on your system? With 10 v.generalize + 1
>>> v.in.ogr on such a big dataset, quite a lot of memory would be used.
>>>
>>> Markus M
>>>
>>>>
>>>> AFAIK, no disk is been used, the whole thing is cached (after more
>>>> than 24h processing, cumulative iotop shows only a few mb
>>>> written/read). I'm no longer using a ramdisk for the grassdata dir.
>>>>
>>>> However, it appears to be considerably slower, probably because of the
>>>> parallel running jobs.
>>>>
>>>> My question then would be, considering the thread I saw about sqlite,
>>>> should I be using something else as backend? When it starts to make
>>>> sense to change it?
>>>>
>>>> F
>>>>
>>>> -=--=-=-
>>>> Fábio Augusto Salve Dias
>>>> ICMC - USP
>>>> http://sites.google.com/site/fabiodias/
>>>>
>>>>
>>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>> ...
>>>>>> What would be the best way to do that in parallel? One mapset for each
>>>>>> year? Can I run multiple v.generalizes on the same input with
>>>>>> different outputs?
>>>>>
>>>>> Yes sure.
>>>>>
>>>>>> My first thought was to run completely separated grass processes for
>>>>>> each simplification, but I didn't find a way to make it search
>>>>>> something different than .grass / .grass70 for the configuration
>>>>>> stuff....
>>>>>
>>>>> Maybe take a look at this approach
>>>>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine
>>>>>
>>>>> but even sending different v.generalize jobs to background (&) should
>>>>> work if you have enough RAM.
>>>>>
>>>>> markusN


More information about the grass-user mailing list