[GRASS-user] v.generalize: does it take forever?

Markus Metz markus.metz.giswork at gmail.com
Sun Jan 11 14:32:55 PST 2015


On Sat, Jan 10, 2015 at 7:23 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>> I have optimized the GRASS vector library in trunk r64032 and added
>> another topology check to v.generalize in trunk r64033. The profile of
>> v.generalize now shows that it is limited by disk I/O speed (on my
>> laptop with a standard laptop-like spinning HDD), which means that the
>> algorithms are, under the test conditions, close to their optimum.
>> This picture might change as soon as you use a high-performance server
>> or a SSD.
>
>
> Then I should do a profile on my current setup.

I have updated v.generalize again in trunk r64067. Please test the
latest version.

>
>> [...] the Terraclass
>> shapefiles are full of errors. If you want to fix these errors, this
>> will take some time.
>
> You know this dataset? The errors are really bugging me. It is, mostly
> due to the process/tools they usually use. We have passed over the
> request for a more topologically correct approach. Maybe on the next
> iteration. But I'll create another thread asking advice regarding
> these errors shortly :)

I know the Terraclass dataset a bit. I used some tiles for testing. I
was not able to import any of my test tiles without errors (after
years of thinking about the conversion of non-topological vectors to
topological vectors). Terraclass data are based on PRODES data, which
I know pretty well. The PRODES classification also comes as shapfiles
which are also full of errors, but these I managed to remove by
carefully choosing the snapping threshold for v.in.ogr.

> By not previously dissolving and further doing v.clean tool=break the
> original data, I've reduced the processing time from more than 30h for
> 1% to 24h to 11%. With the latest release, 9% in 18h.

9% in 18h seems promising.

>
> However, this whole thing got me thinking about you said on an early message:
>
>> The check_topo function can not be executed in parallel because 1)
>> topology must not be modified for several boundaries in parallel, 2)
>> data are written to disk, and disk IO is by nature not parallel.
>
> Well, disk IO, there's not much we can do about it.

We can here and there sometimes reduce disk IO (which I did in some of
my recent changes).

Markus M


More information about the grass-user mailing list