[GRASS-dev] [GRASS GIS] #542: grass7 vector libraries modifications

GRASS GIS trac at osgeo.org
Mon Mar 30 10:46:21 EDT 2009

#542: grass7 vector libraries modifications
 Reporter:  mmetz        |       Owner:  grass-dev at lists.osgeo.org
     Type:  enhancement  |      Status:  new                      
 Priority:  minor        |   Milestone:  7.0.0                    
Component:  Vector       |     Version:  svn-trunk                
 Keywords:               |    Platform:  All                      
      Cpu:  All          |  
 I want to suggest some more profound changes to the vector model for
 grass7. These changes would affect topology, spatial index and maybe
 category index, but not the coor file. That means that there will be
 limited forward/backward compatibility: topology would need to be rebuilt
 before vectors can be accessed. Vector modules would not need to be
 rewritten, but more efficient library functions could be made available.

 My general idea/complaint is that the current topology layout is not
 tailored towards vector object types; instead several (very) different
 types (points, lines, boundaries, centroids, faces, kernels) are stored in
 the same structure. Working with one particular type is a bit inefficient
 because the desired type has to be selected out of everything stored in
 this universal structure every single time. I am sure that a lot of time
 and space can be safed with a redesigned topology layout and vector
 libraries that make use of it. As an example, what I want to get rid of is

 for (line = 0; line < nlines; line++) {
    if (!Vect_line_alive(Map, line))
    type = Vect_read_line(map, points, cats, line);
    if (!(type & otype))
    /* process line */

 The whole coor file is read, in the worst case e.g. just to get the few
 centroids in it. This can not always be avoided or changed, but could
 often be replaced with e.g.

 for (centroid = 1; centroid < ncentroids; centroid++) {
    /* process centroid */

 The current implementation has some consequences of which I am not sure if
 they are actually desired. E.g. when cleaning a vector with tool=snap
 (snapping vertices of lines and boundaries), lines and boundaries may be
 snapped together at the same time: a boundary may be snapped to a line and
 vice versa. Maybe this is sometimes desired, but maybe this should be
 avoided? Another example is removing duplicates: currently it is possible
 to do that for points and centroids together, and if there are a point and
 a centroid with identical coordinates, one of them is deleted (random

 With the changes I have in mind, the size of support structures should
 generally go down, most for point datasets, least for areas. Massive point
 datasets like LIDAR could be easier processed on level 2 with topology,
 because support structures for massive point datasets would be reduced in
 size by about 70% (rough estimates: spatial index reduced down to 25%,
 topology reduced down to 40%).

 There are however some problems with my suggestions: 1) IMHO nobody should
 decide on that alone, 2) the coding is too much for one person alone, e.g.
 I can't do all that without help, 3) I'm not really a programmer, 4) I
 don't know enough about vector geometry algorithms.

 Below are more technical details:

 == Status quo ==

 the coor file holds lines (better: primitives) of types[[BR]]
 face (3D boundary, not yet implemented)[[BR]]
 kernel (3D centroid, not yet implemented)[[BR]]

 structures derived from these types are[[BR]]
 edges (3D areas, not yet implemented)[[BR]]
 volumes (3D shapes, not yet implemented)[[BR]]
 holes (3D volumes within volumes, like isles in areas, not yet

 topology holds information about[[BR]]

 where lines can be points, lines, boundaries, centroids, faces, or kernels


 points, lines, boundaries, centroids, faces, kernels are obviously
 different things, but the current topology layout squeezes all of them
 into the same structure with information about:
 start node (assigned for all types, but not needed for points, centroids,
 end node (used for lines and boundaries, otherwise unused)[[BR]]
 area to left (for boundary, area for centroid, unused for all other
 area to right (for boundary, unused for all other types)[[BR]]
 3D bounding box (completely redundant for points, centroids,
 offset (into coor file)[[BR]]
 type (point, line, boundary, centroid, face, or kernel)

 == Proposed new layout ==

 the coor file would hold the same types as before. To avoid confusion, all
 coordinate strings would be referred to as primitives (like in the output
 of current v.build), but that's just naming. IMHO anything but line is
 fine. A line can be a line or boundary or point or ... is too
 philosophical for my taste.

 topology would have a separate data structure for each of[[BR]]
 nodes (only needed for lines, boundaries, and faces)[[BR]]

 An additional small data structure would be needed that would be a boiled
 down replacement of current P_Line with information about primitives.

 Similarly, a separate spatial index would be created for each type
 separately, instead of lumping all points, lines, boundaries, centroids,
 faces, and kernels into the same spatial index. It is more efficient with
 regard to time and space if separate spatial indices are maintained.

 I'm reaching limits on what I can change in the vector libs without
 breaking compatibility, and I'm sometimes getting frustrated with the
 waste of time and space for large vectors. IIUR grass7 is an opportunity
 to introduce changes like these, so I hope to initiate a discussion and
 for more ideas on how to improve grass vector handling.


 Markus M

Ticket URL: <http://trac.osgeo.org/grass/ticket/542>
GRASS GIS <http://grass.osgeo.org>

More information about the grass-dev mailing list