[GRASS-user] v.generalize: does it take forever?

Mon Feb 9 13:52:41 PST 2015

I switched to postgis for data storage and the v.generalize time went
down to 130ish minutes, all processes working in parallel.

I'm happy now :) thanks you guys very much.
-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Tue, Jan 27, 2015 at 8:50 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
> Hi,
>
> I've kept an iotop, cumulative, running while the generalization run.
> No disk IO involved, just a couple of postgre stats. I believe the OS
> is keeping everything in RAM cache. I don't believe the disk is a
> bottleneck either, it is a 10 disk raid of 15k rpm disks, it's really
> fast.
>
> I interrupted the processing, moved everything into postgres and
> started over. I'm still loading the shapefiles (that I'm doing one at
> a time), I'll start the 15 processes as soon as it is loaded. As soon
> as that stabilizes, I'll report back.
>
>
> On a related note, wouldn't it be interesting to always try to
> simplify a line? As I understood the code, if the simplification fails
> for any reason, the original line is used. Might be interesting to
> have an option that makes the algorithm retry that line, with half the
> threshold, for instance. It's kind of weird for me to see one side of
> something really simplified while the other side really complicated :)
>
> F
> -=--=-=-
> Fábio Augusto Salve Dias
> ICMC - USP
> http://sites.google.com/site/fabiodias/
>
>
> On Tue, Jan 27, 2015 at 7:56 PM, Markus Metz
> <markus.metz.giswork at gmail.com> wrote:
>> On Mon, Jan 26, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>> Hi,
>>>
>>> The machine has 128Gb of ram. Doesn't matter what I do, I can't make a
>>> dent on it. Even with everything cached in ram (shp files, database,
>>> the whole lot), there is still free memory.
>>
>> OK, it's not RAM.
>>
>>>
>>> I'm asking about the database because the behavior I'm seeing on 'top'
>>> looks like the one you get when mutexes are involved. The processes
>>> don't go all to 100% processing at same time (and the machine has 64
>>> processors, so no dent there either), except for the v.in.ogr.
>>
>> The v.generailze processes should be at 100% while generalizing,
>> unless the disk can not keep up with multiple simultaneous IO
>> requests. The tables are copied only after the generalization finished
>> (100% reached).
>>
>>> What it
>>> looks like is that something is locking each process and they are
>>> taking turns. Considering how 'lite' the sqlite appears to be, and the
>>> weird locking behavior that was mentioned in other threads (I'm not
>>> getting the locked message here... I did, when I was running 2
>>> parallel v.in.ogr), isn't it likely to be the culprit? Should I change
>>> it to a more 'non-lite' system, like postgres for instance?
>>
>> That could make sense when running multiple processes in parallel. An
>> alternative would be to create a separate mapset for each process and
>> at the end copy the results back to the main mapset.
>>
>> Technically, it is not possible that the new v.generalize version in
>> trunk (G71) is slower than the old version as in relbr70 because the
>> new version updates only those parts of the topology that actually get
>> changed. The old version also updates components that do not get
>> changed, this is quite time-consuming.
>>
>> I understand you like to go for the big nail immediately, but maybe it
>> is worth testing first on a smaller sample?
>>
>> Markus M
>>
>>>
>>> F
>>> -=--=-=-
>>> Fábio Augusto Salve Dias
>>> ICMC - USP
>>> http://sites.google.com/site/fabiodias/
>>>
>>>
>>> On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz
>>> <markus.metz.giswork at gmail.com> wrote:
>>>> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz
>>>> <markus.metz.giswork at gmail.com> wrote:
>>>>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Running r64249, with a couple of stuff in parallel using &. It seems
>>>>>> to be considerably slower.
>>>>>
>>>>> Very strange. Are you using trunk or GRASS 7.0?
>>>>
>>>> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 seconds.
>>>>
>>>>>
>>>>>> More than 100h, no 1% printed. To be fair,
>>>>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize
>>>>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
>>>>>> running for almost 100h too. I'm loading the shps directly, as advised
>>>>>> way, way back in this thread.
>>>>>
>>>>> What exactly do you mean with "loading shps directly"? For
>>>>> v.generalize, you should import them with v.in.ogr.
>>>>>
>>>>> What about memory consumption on your system? With 10 v.generalize + 1
>>>>> v.in.ogr on such a big dataset, quite a lot of memory would be used.
>>>>>
>>>>> Markus M
>>>>>
>>>>>>
>>>>>> AFAIK, no disk is been used, the whole thing is cached (after more
>>>>>> than 24h processing, cumulative iotop shows only a few mb
>>>>>> written/read). I'm no longer using a ramdisk for the grassdata dir.
>>>>>>
>>>>>> However, it appears to be considerably slower, probably because of the
>>>>>> parallel running jobs.
>>>>>>
>>>>>> My question then would be, considering the thread I saw about sqlite,
>>>>>> should I be using something else as backend? When it starts to make
>>>>>> sense to change it?
>>>>>>
>>>>>> F
>>>>>>
>>>>>> -=--=-=-
>>>>>> Fábio Augusto Salve Dias
>>>>>> ICMC - USP
>>>>>> http://sites.google.com/site/fabiodias/
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>>>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>>>> ...
>>>>>>>> What would be the best way to do that in parallel? One mapset for each
>>>>>>>> year? Can I run multiple v.generalizes on the same input with
>>>>>>>> different outputs?
>>>>>>>
>>>>>>> Yes sure.
>>>>>>>
>>>>>>>> My first thought was to run completely separated grass processes for
>>>>>>>> each simplification, but I didn't find a way to make it search
>>>>>>>> something different than .grass / .grass70 for the configuration
>>>>>>>> stuff....
>>>>>>>
>>>>>>> Maybe take a look at this approach
>>>>>>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine
>>>>>>>
>>>>>>> but even sending different v.generalize jobs to background (&) should
>>>>>>> work if you have enough RAM.
>>>>>>>
>>>>>>> markusN