[GRASS5] Problems with vector import, and a suggestion

David D Gray ddgray at armadce.demon.co.uk
Thu Feb 7 10:22:10 EST 2002

Aleksey Naumov wrote:

 > Hi GRASS developers,
 > I just killed 3 days to import a rather simple (86 polygons, 6 holes) 
 > file into GRASS. In the process I lost some hair :-), but hopefully also
 > gained some insight which I would like to share with anyone who is 
willing to
 > read this...
 > I am using GRASS 5.0, the HEAD branch from CVS, compiled on Feb 4 2002.
 > 1. I first tried v.in.shape (and v.in.shape.pg). v.in.shape failed 
for both a
 > line and a polygon shape files. It generated a lot of messages like:
 >  	WARNING: line 5466 label: 217 matched another label: 455.
 > 	Failed to attach an attribute (category 471) to a line.
 > but, more importantly, the geometry was completely scrambled for both 
 > and polygon shapes.

v.in.shape has been till now really an experimental module whose
development has been driven by on-going attempts to deal with the many
errors in linework that is so common in most of the desktop formats that
are based on whole polygon coverages, eg. ESRI shapefile and MapInfo
MIF. That isn't to say that the errors caused in using these modules is
always or solely due to bad files. They have been hacked and hacked over
as we have found, and had to deal with, new layers of problems in the
structure of shape files, etc. (Question: should we bother with trying 
to fix
bad linework?). So by now the existing modules are really in an 
unmaintainable state. The good news is that replacements in the stable 
branch for the v.*.shape and v.*.mif modules are nearly ready, and 
should be available for the next snapshot in the pre-release sequence 
for GRASS5.0. These will probably at first just duplicate the errors in 
the current modules, but it should be much easier to debug and maintain 

 > 2. Next I tried v.in.arc (and v.in.arc.pg). I had to do some work in 
 > to convert polygon shape file to correct coverage, then UNGENERATE 
lines and
 > polygon labels. Here I got correct geometry, but polygon ids (dig_att 
 > were screwed up:

Using Arc/Info is the critical step here. Ungen' files are in a 
structured topological format, like GRASS files, so import should be a 
simple matter. Don't know why it doesn't handle atts properly. maybe we 
need to look at this as well.

 > 3. Finally, "m.in.e00" did the job for me. It created correct 
geometry AND
 > built correct dig_att file. Together with "pg.in.dbf" for the associated
 > attribute file (ARC's .PAT file saved in TABLES with INFODBASE, then 
 > and re-exported in ArcView) I now got a complete vector file with 
data. I am
 > finally able to map the attributes as described in the GRASS/Postgres
 > tutorial

Again a direct transfer from the A/I format which is similar to GRASS.

 > Resume and suggestions
 > It's been quite a hair-pulling experience. Of course, it's possible 
that I
 > did some things wrong, but I tried these and related commands 
 > etc) in many different ways and quite a few times. I am trying to 
think what
 > to do to make this sort of vector import easier...
 > Here are some suggestions. I apologize in advance if I missed 
something or
 > am completely off on something -- let me know.
 > 1. My experience has been that quality of modules varies greatly. 
Some work
 > fine, some are buggy and some just do not work. What makes the situation
 > worse for the user is that modules seem to overlap and duplicate each 
 > and it's not clear which one to use (for example in my case, besides 
the 5
 > modules mentioned above there are also v.import and v.in.arc.poly --
 > confusing to say the least!)

These stem from early attempts at integration or to deal with problems 
arising from the polygon coverages which were a new format at the time.

 > In the long run modules with similar functionality will have to be 
 > some discarded. In the meantime, it seems a useful clean-up strategy 
would be
 > to:
 > 	(a) Establish sort of a standard (e.g. for GRASS 5.1) -- a set of
 > requirements that modules have to comply with (coding standards, 
 > up-to-date and detailed help page, etc.)
 > 	(b) Select a few most useful modules and pull them up to this standard
 > 	(c) Identify those modules that conform to the standard in the help 
 > They will be seen as reliable, get more testing, while others may be
 > merged/upgraded gradually.

It is timely you should raise these points. There is a plan, beginning 
to take shape now for GRASS 5.1, to move much of the processing 
functionality into library routines, and just have the 'modules' as high 
level interfaces that integrate these functions to perform specific 
tasks. It has been suggested also to have the modules written in a 
scripting or 'macro' language like Python, for easier development.

Standards relating to such things as options are also being developed.

 > 2. Some sort of functional listing of modules in the help pages would be
 > nice, maybe even based on classification used for TclTk interface.
 > 3. No specific suggestion here, just complaining :-) about handling 
of vector
 > attributes. Associating vectors with attributes through point markers 
used in
 > dig_att files just seems too difficult and unnatural to me. May be I am
 > missing something here... I don't know how it's done in 5.1.

It is, admittedly, a weak procedure. It just used to be standard, and 
was traditionally used by Arc/Info, which was the standard GRASS had to 
look to for compatibilty with its main competitors. This by the way is 
why you get the :

WARNING: line 5466 label: 217 matched another label: 455

warnings. The labels get muddled up with the wrong lines/points/areas if 
something goes wrong. And a bad line can contaminate in this way many 
good ones, so you can get a silent error and not notice.

GRASS 5.1 does away with this, and codes the categories (or at least 
indices) into the main binary file that contains the vector lines. The 
actual data is stored in a RDBMS. GRASS has the in-built dbmi interface 
that will be the default.

We should retain an ability to remain compatible with older versions and 
still have the ability to apply area points and their attributes, as the 
idea is that an `area point' is a representative point of the interior 
region of a polygon. Say v.in/out.atts.

The lamest choice of all however would be to go down the road of 
co-sequencing as MIF/MID and shapefile do. This is unstable as a means 
for transferring data (ie an interchange format), and disastrous when 
used as the main method of data storage in your application.


More information about the grass-dev mailing list