[GRASS-dev] Re: [GRASS-user] GRASS 7 vector topology format changed

Markus Metz markus.metz.giswork at googlemail.com
Sun Jul 3 09:02:57 EDT 2011


Hamish wrote:
> Markus Metz wrote:
>> the GRASS 7 vector topology format changed a bit. I have
>> removed redundant information (bounding boxes) from vector
>> topology
>
> Hi,
>
> just wondering how redundant that is.. for point data completely,
> but for a polygon of 500,000 vertices (forest boundary or the
> coastline of Florida) knowing the bbox before you touch the data
> array can be a huge speed up. Many of PostGIS's fns work that
> way IIRC.
>
The bounding boxes were redundant because they are also stored in the
spatial index, i.e. if you want to get the bounding box of an area of
500,000 vertixes distributed over 1000 boundaries, you do not need to
read 1000 boundaries but fetch the corresponding box from the spatial
index. For this reason and purpose, I have implemented this
functionality in the spatial index. IOW, the bounding boxes are not
gone, they are still there, but no longer stored in two different
locations, only in one location.

> so are per-feature bounding boxes completely gone? or just some
> double-storing of them, or redundant data stored within them?
>
Double-storing is gone (for points, it was 4 times storing the same
box = 8 times storing the point's coordinates).

> are small vectors (ie points) sped up at the cost of worse
> performance of scattered large non-point datasets?
>
No, the new, reduced format performs better with larger datasets. For
small datasets, there should be not much of a difference in terms of
speed, only in terms of memory requirements.

> is time to run d.vect in a sub-region of the overall map a good
> way to test the performance difference?
>
Maybe, but d.vect is not very efficient: it goes through all features,
checks if a feature is inside the current region and reads e.g. every
area and the area's isles twice instead of only once.

v.what, v.build, v.in.* are also good to test performance differences.
Note that because of substantially reduced memory requirements,
modules may fail with out-of-memory errors in 6.x but complete
successfully in 7. Also note that database operations can mask effects
of changed topology management because database management can be the
main time consuming factor (e.g. [r|v].in.lidar).

>
> (hoping this means we can have the best of both worlds!)
>
That's the aim.

Markus M


More information about the grass-dev mailing list