[GRASS-user] v.generalize: does it take forever?

Markus Metz markus.metz.giswork at gmail.com
Fri Jan 9 13:56:52 PST 2015


On Sun, Jan 4, 2015 at 10:45 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
> As promised, profile of v.generalize, as of r63952.
> (The data might not be exactly the same, I might have run v.clean somewhere).

Thanks for your thorough code analysis!

My initial guess was wrong, Vect_line_intersection2() is not the
limiting factor. The R tree is also used to feed
Vect_line_intersection2(), but here it seems to be no bottleneck. The
limit was Vect_rewrite_line() and the functions called by it.

I have optimized the GRASS vector library in trunk r64032 and added
another topology check to v.generalize in trunk r64033. The profile of
v.generalize now shows that it is limited by disk I/O speed (on my
laptop with a standard laptop-like spinning HDD), which means that the
algorithms are, under the test conditions, close to their optimum.
This picture might change as soon as you use a high-performance server
or a SSD.

The speed improvement is non-linear: for small datasets as in the
official GRASS datasets, you won't notice a difference. For one tile
of Terraclass, the processing speed should be about 2-4 times faster
than before. For the full Terraclass dataset, the processing speed
could be >10 times faster than before. You will need to wait until say
10% of the processing has been reached in order to estimate the total
processing time. Simplifying each line takes its own time, therefore
the processing time of 100% is not equal to 100 x the processing time
of 1%.

Another user has applied v.generalize to NLCD2011 and it took nearly 2
months. Your dataset is probably a bit smaller, but the Terraclass
shapefiles are full of errors. If you want to fix these errors, this
will take some time.

I recommend to test the new v.generalize first on a subregion of
Terraclass. Only if the processing speed and the results are
acceptable, proceed with the full dataset. Otherwise, please report.

Markus M

>
> I still have the raw profiles, if anyone wants them.
>
> F
> -=--=-=-
> Fábio Augusto Salve Dias
> http://sites.google.com/site/fabiodias/
>
>
> On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>> Attached is pdf generated with google-perf of v.generalize, using
>> g7b4. I'm running it again for trunk.
>> -=--=-=-
>> Fábio Augusto Salve Dias
>> http://sites.google.com/site/fabiodias/
>>
>>
>> On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
>> <markus.metz.giswork at gmail.com> wrote:
>>> On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>
>>>> I fussed about the v.generalize code, thinking about pthread
>>>> parallelization. The geometry part of the code is *really* fast and
>>>> can be easily parallelized so it can run even faster. But, according
>>>> to the profile google-perf gave me, the real bottleneck is inside the
>>>> check_topo function (which uses static vars and inserts a new line
>>>> into the vector, not only checks if it breaks topo - got stuck a while
>>>> in there due to the misnomer). More specifically in the Rtree function
>>>> used to check if one line intersects other lines.
>>>>
>>>
>>> The function used in check_topo is Vect_line_intersection() which does
>>> much more than just testing for intersections. The process could be
>>> made much faster if Vect_line_check_intersection() would be modified
>>> such that connections by end points are ignored. But I don't know if
>>> this would break other modules or other functionality.
>>>
>>> Markus M


More information about the grass-user mailing list