[GRASS-user] v.generalize: does it take forever?

Fábio Dias fabio.dias at gmail.com
Tue Jan 27 14:50:41 PST 2015


Hi,

I've kept an iotop, cumulative, running while the generalization run.
No disk IO involved, just a couple of postgre stats. I believe the OS
is keeping everything in RAM cache. I don't believe the disk is a
bottleneck either, it is a 10 disk raid of 15k rpm disks, it's really
fast.

I interrupted the processing, moved everything into postgres and
started over. I'm still loading the shapefiles (that I'm doing one at
a time), I'll start the 15 processes as soon as it is loaded. As soon
as that stabilizes, I'll report back.


On a related note, wouldn't it be interesting to always try to
simplify a line? As I understood the code, if the simplification fails
for any reason, the original line is used. Might be interesting to
have an option that makes the algorithm retry that line, with half the
threshold, for instance. It's kind of weird for me to see one side of
something really simplified while the other side really complicated :)

F
-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/


On Tue, Jan 27, 2015 at 7:56 PM, Markus Metz
<markus.metz.giswork at gmail.com> wrote:
> On Mon, Jan 26, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>> Hi,
>>
>> The machine has 128Gb of ram. Doesn't matter what I do, I can't make a
>> dent on it. Even with everything cached in ram (shp files, database,
>> the whole lot), there is still free memory.
>
> OK, it's not RAM.
>
>>
>> I'm asking about the database because the behavior I'm seeing on 'top'
>> looks like the one you get when mutexes are involved. The processes
>> don't go all to 100% processing at same time (and the machine has 64
>> processors, so no dent there either), except for the v.in.ogr.
>
> The v.generailze processes should be at 100% while generalizing,
> unless the disk can not keep up with multiple simultaneous IO
> requests. The tables are copied only after the generalization finished
> (100% reached).
>
>> What it
>> looks like is that something is locking each process and they are
>> taking turns. Considering how 'lite' the sqlite appears to be, and the
>> weird locking behavior that was mentioned in other threads (I'm not
>> getting the locked message here... I did, when I was running 2
>> parallel v.in.ogr), isn't it likely to be the culprit? Should I change
>> it to a more 'non-lite' system, like postgres for instance?
>
> That could make sense when running multiple processes in parallel. An
> alternative would be to create a separate mapset for each process and
> at the end copy the results back to the main mapset.
>
> Technically, it is not possible that the new v.generalize version in
> trunk (G71) is slower than the old version as in relbr70 because the
> new version updates only those parts of the topology that actually get
> changed. The old version also updates components that do not get
> changed, this is quite time-consuming.
>
> I understand you like to go for the big nail immediately, but maybe it
> is worth testing first on a smaller sample?
>
> Markus M
>
>>
>> F
>> -=--=-=-
>> Fábio Augusto Salve Dias
>> ICMC - USP
>> http://sites.google.com/site/fabiodias/
>>
>>
>> On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz
>> <markus.metz.giswork at gmail.com> wrote:
>>> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz
>>> <markus.metz.giswork at gmail.com> wrote:
>>>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Running r64249, with a couple of stuff in parallel using &. It seems
>>>>> to be considerably slower.
>>>>
>>>> Very strange. Are you using trunk or GRASS 7.0?
>>>
>>> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 seconds.
>>>
>>>>
>>>>> More than 100h, no 1% printed. To be fair,
>>>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize
>>>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is
>>>>> running for almost 100h too. I'm loading the shps directly, as advised
>>>>> way, way back in this thread.
>>>>
>>>> What exactly do you mean with "loading shps directly"? For
>>>> v.generalize, you should import them with v.in.ogr.
>>>>
>>>> What about memory consumption on your system? With 10 v.generalize + 1
>>>> v.in.ogr on such a big dataset, quite a lot of memory would be used.
>>>>
>>>> Markus M
>>>>
>>>>>
>>>>> AFAIK, no disk is been used, the whole thing is cached (after more
>>>>> than 24h processing, cumulative iotop shows only a few mb
>>>>> written/read). I'm no longer using a ramdisk for the grassdata dir.
>>>>>
>>>>> However, it appears to be considerably slower, probably because of the
>>>>> parallel running jobs.
>>>>>
>>>>> My question then would be, considering the thread I saw about sqlite,
>>>>> should I be using something else as backend? When it starts to make
>>>>> sense to change it?
>>>>>
>>>>> F
>>>>>
>>>>> -=--=-=-
>>>>> Fábio Augusto Salve Dias
>>>>> ICMC - USP
>>>>> http://sites.google.com/site/fabiodias/
>>>>>
>>>>>
>>>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>>> ...
>>>>>>> What would be the best way to do that in parallel? One mapset for each
>>>>>>> year? Can I run multiple v.generalizes on the same input with
>>>>>>> different outputs?
>>>>>>
>>>>>> Yes sure.
>>>>>>
>>>>>>> My first thought was to run completely separated grass processes for
>>>>>>> each simplification, but I didn't find a way to make it search
>>>>>>> something different than .grass / .grass70 for the configuration
>>>>>>> stuff....
>>>>>>
>>>>>> Maybe take a look at this approach
>>>>>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine
>>>>>>
>>>>>> but even sending different v.generalize jobs to background (&) should
>>>>>> work if you have enough RAM.
>>>>>>
>>>>>> markusN


More information about the grass-user mailing list