[GRASS-user] v.generalize: does it take forever?

Markus Metz markus.metz.giswork at gmail.com
Sun Jan 18 15:19:53 PST 2015


On Sun, Jan 11, 2015 at 11:32 PM, Markus Metz
<markus.metz.giswork at gmail.com> wrote:
> On Sat, Jan 10, 2015 at 7:23 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>> I have optimized the GRASS vector library in trunk r64032 and added
>>> another topology check to v.generalize in trunk r64033. The profile of
>>> v.generalize now shows that it is limited by disk I/O speed (on my
>>> laptop with a standard laptop-like spinning HDD), which means that the
>>> algorithms are, under the test conditions, close to their optimum.
>>> This picture might change as soon as you use a high-performance server
>>> or a SSD.
>>
>>
>> Then I should do a profile on my current setup.
>
> I have updated v.generalize again in trunk r64067. Please test the
> latest version.
>
>>
>>> [...] the Terraclass
>>> shapefiles are full of errors. If you want to fix these errors, this
>>> will take some time.
>>
>> You know this dataset? The errors are really bugging me. It is, mostly
>> due to the process/tools they usually use. We have passed over the
>> request for a more topologically correct approach. Maybe on the next
>> iteration. But I'll create another thread asking advice regarding
>> these errors shortly :)
>
> I know the Terraclass dataset a bit. I used some tiles for testing. I
> was not able to import any of my test tiles without errors (after
> years of thinking about the conversion of non-topological vectors to
> topological vectors). Terraclass data are based on PRODES data, which
> I know pretty well. The PRODES classification also comes as shapfiles
> which are also full of errors, but these I managed to remove by
> carefully choosing the snapping threshold for v.in.ogr.
>
>> By not previously dissolving and further doing v.clean tool=break the
>> original data, I've reduced the processing time from more than 30h for
>> 1% to 24h to 11%. With the latest release, 9% in 18h.
>
> 9% in 18h seems promising.

As of trunk r64234, the simplification itself should be done within
minutes (heavy optimization, only updating those parts of the vector
topology that actually get changed). Please test.

Markus M

>
>>
>> However, this whole thing got me thinking about you said on an early message:
>>
>>> The check_topo function can not be executed in parallel because 1)
>>> topology must not be modified for several boundaries in parallel, 2)
>>> data are written to disk, and disk IO is by nature not parallel.
>>
>> Well, disk IO, there's not much we can do about it.
>
> We can here and there sometimes reduce disk IO (which I did in some of
> my recent changes).
>
> Markus M


More information about the grass-user mailing list