[GRASS5] Re: Vector points and topology

Wed Jul 6 09:18:18 EDT 2005

On Mon, Jul 04, 2005 at 11:25:31AM +0200, Radim Blazek wrote:
> On 7/2/05, Helena Mitasova <hmitaso at unity.ncsu.edu> wrote:
> > On Jul 1, 2005, at 4:29 AM, Radim Blazek wrote:
> > 
> > > There is no big difference between points on level 1 and sites, is it?
> > 
> > I read your 2002 paper to find out more about the vector data structure
> > and from that
> > I am guessing that there probably is something more in the points on
> > level 1, for example importing a
> > 
> > 64000 point file:  id x y z
> > leads to a 2.019MB  old asci sites file x|y|#cat %z
> > 
> > but when imported by
> > v.in.ascii -zt .... x=2 y=3 z=4 cat=1
> > it gets
> > 0.770MB cidx
> > 2.375MB coor
> > 3.097MB topo
> > 
> > So I assume that coor would be the level1 and topo would be added for
> > level 2.
> 
> Yes.
> 
> > what else is in the coor file that makes the binary file larger than
> > the original asci?
> >  From your paper I understand that there is much more than just
> > coordinates in this file
> 
> 1. In the coor file is stored also: 'line' (element) type, number 
>     of attributes and layer number for each category.
> 2. Coordinates in binary file are stored as double (8bytes) while 
>     in text file a coordinate can take less or more space.

I have added this to:

 lib/vector/vector_arch.dox

However, it would be much better to have that added by a person
who actually knows it :-)

> > for lines, but do we need it for points too?
> 
> Yes.
> 
> > Is the topology file useful for the point data
> 
> Offten it is not. I did no pay special attention to large point datasets.
> It is a pity that you did not come with these problems 2 years ago.

Please keep in mind that such large data sets (almost) didn't exist/were
unavailable two years ago.

Instead of complaining, why not trying to find a solution?

Example: to spend 230 minutes on the export of 100k points into a
SHAPE file isn't at its best. But we need a suggestion where to search
for the problem. And 100k points isn't that large, neither 3 mio points.
500 mio points are large.

> I have to repeat my suggestion, don't build topology if you dont need that
> (add -b option in v.in.ascii) and write modules using vectors on level 1
> (without topology) reading lines by Vect_read_next_line().

For a prototype this may be viable. Later on, once understood, the
GIS should do the job.

> > (we certainly need spatial index for working with the site data but as
> > I understand it it is created
> > on the fly but not stored, so that would not be part of the topology
> > file). If the topology file
> > makes running other modules faster than it should be there, disk space
> > usually is not
> > a problem and then just  the build function needs to be modified for
> > point data
> > so that it does not eat up the memory. But if we don't need the topo
> > file for anything for point data
> > then we can skip the build function for point data and the other
> > modules should be
> > modified to read point data at level 1?
> 
> Either you need the topology or not, we cannot build topology 
> without points. Topology for points is necessary for example 
> for network modules because the graph is built using topology 
> informations about lines and points.

Aha :-) I have added this piece of important knowledge to
vector_arch.dox as well.

> > > Which vector modules do you need to work with LIDAR?
> > 
> > this is not that much about just the lidar data - it is about point
> > files in general, below is a message that
> > I just got from Jaro and they do not work with lidar (I assume that
> > they have digitized contours
> > or photogrammetric data). And there are other types of sensors or
> > models that produce millions
> > of points, so all vector modules that support point data need to be
> > able to handle them in a reasonable
> > way. I created a list but it is practically everything except for
> > modules that deal with
> > networks and polygon operations.
> > GRASS6 is really great, everybody is impressed so we just need to get
> > this point data issue
> > resolved.
> 
> Some modules can probably use vector on level 1 and sequential read.
> I wrote most of the modules so that they use level 2 and random access
> because it is more simple and less work. 
> 
> The 'topology' structure does not store only the topology but also
> 'line' bounding box and line offset in coor file (index). 
> The existing spatial index is using line ID in 'topology' structure
> to identify lines in 'coor' file. Currently it is not possile to build 
> spatial index without topology.

Also added to vector_arch.dox.
The more knowledge in this file, the less questions on the list (IMHO).

> I think that the problem with 
> with memory consumption is in the spatial index. Can you
> verify how much memory GRASS spatial index is using 
> comparing to other spatial indexes (try 3D index)?
> May be it is possible to lower memory consumption tuning 
> MAXCARD (number of branches) in spatial index. Can anybody try that?
> 
> It is necessary to verify which modules can be changed to sequential read 
> also because of OGR formats (currently v.external). Some OGR
> formats are very slow in random access, e.g. PostGIS and it would 
> be useful to use sequential read.
> 
> When I was writing new vector I have 
> discovered that vector processing is just bunch of exceptions.
> GRASS 6 was written first so that it works for all data types and 
> various tasks. I do not work with large point datasets, so I did not
> add 'exceptions' for large point files. Unfortunately nobody who needs 
> to process such data joined the development. 

In order to change the situation, please document your knowledge.
Otherwise it's rather impossible to join the development...

Thanks

 Markus