[GRASS5] Re: Vector points and topology

Markus Neteler neteler at itc.it
Wed Jul 6 09:18:18 EDT 2005


On Mon, Jul 04, 2005 at 11:25:31AM +0200, Radim Blazek wrote:
> On 7/2/05, Helena Mitasova <hmitaso at unity.ncsu.edu> wrote:
> > On Jul 1, 2005, at 4:29 AM, Radim Blazek wrote:
> > 
> > > There is no big difference between points on level 1 and sites, is it?
> > 
> > I read your 2002 paper to find out more about the vector data structure
> > and from that
> > I am guessing that there probably is something more in the points on
> > level 1, for example importing a
> > 
> > 64000 point file:  id x y z
> > leads to a 2.019MB  old asci sites file x|y|#cat %z
> > 
> > but when imported by
> > v.in.ascii -zt .... x=2 y=3 z=4 cat=1
> > it gets
> > 0.770MB cidx
> > 2.375MB coor
> > 3.097MB topo
> > 
> > So I assume that coor would be the level1 and topo would be added for
> > level 2.
> 
> Yes.
> 
> > what else is in the coor file that makes the binary file larger than
> > the original asci?
> >  From your paper I understand that there is much more than just
> > coordinates in this file
> 
> 1. In the coor file is stored also: 'line' (element) type, number 
>     of attributes and layer number for each category.
> 2. Coordinates in binary file are stored as double (8bytes) while 
>     in text file a coordinate can take less or more space.

I have added this to:

 lib/vector/vector_arch.dox

However, it would be much better to have that added by a person
who actually knows it :-)

 
> > for lines, but do we need it for points too?
> 
> Yes.
> 
> > Is the topology file useful for the point data
> 
> Offten it is not. I did no pay special attention to large point datasets.
> It is a pity that you did not come with these problems 2 years ago.

Please keep in mind that such large data sets (almost) didn't exist/were
unavailable two years ago.

Instead of complaining, why not trying to find a solution?

Example: to spend 230 minutes on the export of 100k points into a
SHAPE file isn't at its best. But we need a suggestion where to search
for the problem. And 100k points isn't that large, neither 3 mio points.
500 mio points are large.

> I have to repeat my suggestion, don't build topology if you dont need that
> (add -b option in v.in.ascii) and write modules using vectors on level 1
> (without topology) reading lines by Vect_read_next_line().

For a prototype this may be viable. Later on, once understood, the
GIS should do the job.

> > (we certainly need spatial index for working with the site data but as
> > I understand it it is created
> > on the fly but not stored, so that would not be part of the topology
> > file). If the topology file
> > makes running other modules faster than it should be there, disk space
> > usually is not
> > a problem and then just  the build function needs to be modified for
> > point data
> > so that it does not eat up the memory. But if we don't need the topo
> > file for anything for point data
> > then we can skip the build function for point data and the other
> > modules should be
> > modified to read point data at level 1?
> 
> Either you need the topology or not, we cannot build topology 
> without points. Topology for points is necessary for example 
> for network modules because the graph is built using topology 
> informations about lines and points.

Aha :-) I have added this piece of important knowledge to
vector_arch.dox as well.

> > > Which vector modules do you need to work with LIDAR?
> > 
> > this is not that much about just the lidar data - it is about point
> > files in general, below is a message that
> > I just got from Jaro and they do not work with lidar (I assume that
> > they have digitized contours
> > or photogrammetric data). And there are other types of sensors or
> > models that produce millions
> > of points, so all vector modules that support point data need to be
> > able to handle them in a reasonable
> > way. I created a list but it is practically everything except for
> > modules that deal with
> > networks and polygon operations.
> > GRASS6 is really great, everybody is impressed so we just need to get
> > this point data issue
> > resolved.
> 
> Some modules can probably use vector on level 1 and sequential read.
> I wrote most of the modules so that they use level 2 and random access
> because it is more simple and less work. 
> 
> The 'topology' structure does not store only the topology but also
> 'line' bounding box and line offset in coor file (index). 
> The existing spatial index is using line ID in 'topology' structure
> to identify lines in 'coor' file. Currently it is not possile to build 
> spatial index without topology.

Also added to vector_arch.dox.
The more knowledge in this file, the less questions on the list (IMHO).

> I think that the problem with 
> with memory consumption is in the spatial index. Can you
> verify how much memory GRASS spatial index is using 
> comparing to other spatial indexes (try 3D index)?
> May be it is possible to lower memory consumption tuning 
> MAXCARD (number of branches) in spatial index. Can anybody try that?
> 
> It is necessary to verify which modules can be changed to sequential read 
> also because of OGR formats (currently v.external). Some OGR
> formats are very slow in random access, e.g. PostGIS and it would 
> be useful to use sequential read.
> 
> When I was writing new vector I have 
> discovered that vector processing is just bunch of exceptions.
> GRASS 6 was written first so that it works for all data types and 
> various tasks. I do not work with large point datasets, so I did not
> add 'exceptions' for large point files. Unfortunately nobody who needs 
> to process such data joined the development. 

In order to change the situation, please document your knowledge.
Otherwise it's rather impossible to join the development...

Thanks

 Markus
 




More information about the grass-dev mailing list