[GRASS-user] v.generalize: does it take forever?

Wed Jan 14 06:54:09 PST 2015

Hello,

Hopefully my last question regarding v.generalize and speeding up the process.

Context:

I have multiple years of data that need to be generalized. For each
year, I need a number of different generalizations (specific number
TBD).

Question:

What would be the best way to do that in parallel? One mapset for each
year? Can I run multiple v.generalizes on the same input with
different outputs?

My first thought was to run completely separated grass processes for
each simplification, but I didn't find a way to make it search
something different than .grass / .grass70 for the configuration
stuff....

Thanks again

F
-=--=-=-
Fábio Augusto Salve Dias
ICMC - USP
http://sites.google.com/site/fabiodias/

On Sun, Jan 11, 2015 at 8:32 PM, Markus Metz
<markus.metz.giswork at gmail.com> wrote:
> On Sat, Jan 10, 2015 at 7:23 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>> I have optimized the GRASS vector library in trunk r64032 and added
>>> another topology check to v.generalize in trunk r64033. The profile of
>>> v.generalize now shows that it is limited by disk I/O speed (on my
>>> laptop with a standard laptop-like spinning HDD), which means that the
>>> algorithms are, under the test conditions, close to their optimum.
>>> This picture might change as soon as you use a high-performance server
>>> or a SSD.
>>
>>
>> Then I should do a profile on my current setup.
>
> I have updated v.generalize again in trunk r64067. Please test the
> latest version.
>
>>
>>> [...] the Terraclass
>>> shapefiles are full of errors. If you want to fix these errors, this
>>> will take some time.
>>
>> You know this dataset? The errors are really bugging me. It is, mostly
>> due to the process/tools they usually use. We have passed over the
>> request for a more topologically correct approach. Maybe on the next
>> iteration. But I'll create another thread asking advice regarding
>> these errors shortly :)
>
> I know the Terraclass dataset a bit. I used some tiles for testing. I
> was not able to import any of my test tiles without errors (after
> years of thinking about the conversion of non-topological vectors to
> topological vectors). Terraclass data are based on PRODES data, which
> I know pretty well. The PRODES classification also comes as shapfiles
> which are also full of errors, but these I managed to remove by
> carefully choosing the snapping threshold for v.in.ogr.
>
>> By not previously dissolving and further doing v.clean tool=break the
>> original data, I've reduced the processing time from more than 30h for
>> 1% to 24h to 11%. With the latest release, 9% in 18h.
>
> 9% in 18h seems promising.
>
>>
>> However, this whole thing got me thinking about you said on an early message:
>>
>>> The check_topo function can not be executed in parallel because 1)
>>> topology must not be modified for several boundaries in parallel, 2)
>>> data are written to disk, and disk IO is by nature not parallel.
>>
>> Well, disk IO, there's not much we can do about it.
>
> We can here and there sometimes reduce disk IO (which I did in some of
> my recent changes).
>
> Markus M