[GRASS-user] v.generalize: does it take forever?

Fábio Dias fabio.dias at gmail.com
Wed Jan 7 08:12:55 PST 2015


Another interesting update....
I believed that doing a dissolve before generalizing would speed up
the process, because it would remove a lot of edges. The data is very
segmented, stitching would be the right term, I suppose.

Turns out, that belief is really wrong. The really expensive part of
the code is checking if the new line intersect with other lines. To
reduce the comparisons, it check the bounding boxes.
By dissolving, I turned all the small lines into really, really big
ones. Then all bounding boxes intercept and the algorithm does a whole
lot more comparisons....

750 minutes of processing, 5% progress, reduction method.

F


-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/


On Tue, Jan 6, 2015 at 3:48 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
> Original:
> GRASS 7.1.svn (brasil):~ > v.info map=tc10
>  +----------------------------------------------------------------------------+
>  | Name:            tc10                                                      |
>  | Mapset:          terraclass                                                |
>  | Location:        brasil                                                    |
>  | Database:        /home/externo/fabioasd/grass                              |
>  | Title:                                                                     |
>  | Map scale:       1:1                                                       |
>  | Name of creator: fabioasd                                                  |
>  | Organization:                                                              |
>  | Source date:     Sat Jan  3 23:38:40 2015                                  |
>  | Timestamp (first layer): none                                              |
>  |----------------------------------------------------------------------------|
>  | Map format:      native                                                    |
>  |----------------------------------------------------------------------------|
>  |   Type of map: vector (level: 2)                                           |
>  |                                                                            |
>  |   Number of points:       0               Number of centroids:  5323741    |
>  |   Number of lines:        0               Number of boundaries: 12889264   |
>  |   Number of areas:        5573197         Number of islands:    1332382    |
>  |                                                                            |
>  |   Map is 3D:              No                                               |
>  |   Number of dblinks:      1                                                |
>  |                                                                            |
>  |   Projection: Latitude-Longitude                                           |
>  |                                                                            |
>  |               N:   5:16:18.443667N    S:  18:02:29.687783S                 |
>  |               E:  43:59:58.760386W    W:  73:59:29.009623W                 |
>  |                                                                            |
>  |   Digitization threshold: 0                                                |
>  |   Comment:                                                                 |
>  |                                                                            |
>  +----------------------------------------------------------------------------+
>
>
> After dissolve:
>
> +----------------------------------------------------------------------------+
>  | Name:            tc10d                                                     |
>  | Mapset:          terraclass                                                |
>  | Location:        brasil                                                    |
>  | Database:        /home/externo/fabioasd/grass                              |
>  | Title:                                                                     |
>  | Map scale:       1:1                                                       |
>  | Name of creator: fabioasd                                                  |
>  | Organization:                                                              |
>  | Source date:     Sat Jan  3 23:38:40 2015                                  |
>  | Timestamp (first layer): none                                              |
>  |----------------------------------------------------------------------------|
>  | Map format:      native                                                    |
>  |----------------------------------------------------------------------------|
>  |   Type of map: vector (level: 2)                                           |
>  |                                                                            |
>  |   Number of points:       0               Number of centroids:  5120039    |
>  |   Number of lines:        0               Number of boundaries: 12641473   |
>  |   Number of areas:        5369494         Number of islands:    1366321    |
>  |                                                                            |
>  |   Map is 3D:              No                                               |
>  |   Number of dblinks:      1                                                |
>  |                                                                            |
>  |   Projection: Latitude-Longitude                                           |
>  |                                                                            |
>  |               N:   5:16:18.443667N    S:  18:02:29.687783S                 |
>  |               E:  43:59:58.760386W    W:  73:59:29.009623W                 |
>  |                                                                            |
>  |   Digitization threshold: 0                                                |
>  |   Comment:                                                                 |
>  |                                                                            |
>  +----------------------------------------------------------------------------+
> -=--=-=-
> Fábio Augusto Salve Dias
> http://sites.google.com/site/fabiodias/
>
>
> On Mon, Jan 5, 2015 at 5:49 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>> Just for further reference, the v.dissolve takes around 24h in this
>> dataset. I'll post the v.info of both as soon as it is finished.
>>
>> Any other ideas? I have a fairly powerful server at my disposal, but
>> I'm out of ideas...
>> -=--=-=-
>> Fábio Augusto Salve Dias
>> http://sites.google.com/site/fabiodias/
>>
>>
>> On Sun, Jan 4, 2015 at 7:45 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>> As promised, profile of v.generalize, as of r63952.
>>> (The data might not be exactly the same, I might have run v.clean somewhere).
>>>
>>> I still have the raw profiles, if anyone wants them.
>>>
>>> F
>>> -=--=-=-
>>> Fábio Augusto Salve Dias
>>> http://sites.google.com/site/fabiodias/
>>>
>>>
>>> On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>> Attached is pdf generated with google-perf of v.generalize, using
>>>> g7b4. I'm running it again for trunk.
>>>> -=--=-=-
>>>> Fábio Augusto Salve Dias
>>>> http://sites.google.com/site/fabiodias/
>>>>
>>>>
>>>> On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz
>>>> <markus.metz.giswork at gmail.com> wrote:
>>>>> On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <fabio.dias at gmail.com> wrote:
>>>>>>
>>>>>> I fussed about the v.generalize code, thinking about pthread
>>>>>> parallelization. The geometry part of the code is *really* fast and
>>>>>> can be easily parallelized so it can run even faster. But, according
>>>>>> to the profile google-perf gave me, the real bottleneck is inside the
>>>>>> check_topo function (which uses static vars and inserts a new line
>>>>>> into the vector, not only checks if it breaks topo - got stuck a while
>>>>>> in there due to the misnomer). More specifically in the Rtree function
>>>>>> used to check if one line intersects other lines.
>>>>>>
>>>>>
>>>>> The function used in check_topo is Vect_line_intersection() which does
>>>>> much more than just testing for intersections. The process could be
>>>>> made much faster if Vect_line_check_intersection() would be modified
>>>>> such that connections by end points are ignored. But I don't know if
>>>>> this would break other modules or other functionality.
>>>>>
>>>>> Markus M


More information about the grass-user mailing list