[GRASS-user] v.generalize: does it take forever?

Mon Jan 26 06:54:23 PST 2015

Hi,

Running trunk. I was loading the shps into postgis then importing them
to grass. Now I'm importing the shps directly.

The machine has 128Gb of ram. Doesn't matter what I do, I can't make a
dent on it. Even with everything cached in ram (shp files, database,
the whole lot), there is still free memory.

I'm asking about the database because the behavior I'm seeing on 'top'
looks like the one you get when mutexes are involved. The processes
don't go all to 100% processing at same time (and the machine has 64
processors, so no dent there either), except for the v.in.ogr. What it
looks like is that something is locking each process and they are
taking turns. Considering how 'lite' the sqlite appears to be, and the
weird locking behavior that was mentioned in other threads (I'm not
getting the locked message here... I did, when I was running 2
parallel v.in.ogr), isn't it likely to be the culprit? Should I change
it to a more 'non-lite' system, like postgres for instance?

F
-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz
<markus.metz.giswork at gmail.com> wrote:
> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz
> <markus.metz.giswork at gmail.com> wrote:
>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>> Hi,
>>>
>>> Running r64249, with a couple of stuff in parallel using &. It seems
>>> to be considerably slower.
>>
>> Very strange. Are you using trunk or GRASS 7.0?
>
> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 seconds.
>
>>
>>> More than 100h, no 1% printed. To be fair,
>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize
>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
>>> running for almost 100h too. I'm loading the shps directly, as advised
>>> way, way back in this thread.
>>
>> What exactly do you mean with "loading shps directly"? For
>> v.generalize, you should import them with v.in.ogr.
>>
>> What about memory consumption on your system? With 10 v.generalize + 1
>> v.in.ogr on such a big dataset, quite a lot of memory would be used.
>>
>> Markus M
>>
>>>
>>> AFAIK, no disk is been used, the whole thing is cached (after more
>>> than 24h processing, cumulative iotop shows only a few mb
>>> written/read). I'm no longer using a ramdisk for the grassdata dir.
>>>
>>> However, it appears to be considerably slower, probably because of the
>>> parallel running jobs.
>>>
>>> My question then would be, considering the thread I saw about sqlite,
>>> should I be using something else as backend? When it starts to make
>>> sense to change it?
>>>
>>> F
>>>
>>> -=--=-=-
>>> Fábio Augusto Salve Dias
>>> ICMC - USP
>>> http://sites.google.com/site/fabiodias/
>>>
>>>
>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>> ...
>>>>> What would be the best way to do that in parallel? One mapset for each
>>>>> year? Can I run multiple v.generalizes on the same input with
>>>>> different outputs?
>>>>
>>>> Yes sure.
>>>>
>>>>> My first thought was to run completely separated grass processes for
>>>>> each simplification, but I didn't find a way to make it search
>>>>> something different than .grass / .grass70 for the configuration
>>>>> stuff....
>>>>
>>>> Maybe take a look at this approach
>>>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine
>>>>
>>>> but even sending different v.generalize jobs to background (&) should
>>>> work if you have enough RAM.
>>>>
>>>> markusN