[GRASS5] Re: Vector points and topology

Mon Jul 4 05:25:31 EDT 2005

On 7/2/05, Helena Mitasova <hmitaso at unity.ncsu.edu> wrote:
> On Jul 1, 2005, at 4:29 AM, Radim Blazek wrote:
> 
> > There is no big difference between points on level 1 and sites, is it?
> 
> I read your 2002 paper to find out more about the vector data structure
> and from that
> I am guessing that there probably is something more in the points on
> level 1, for example importing a
> 
> 64000 point file:  id x y z
> leads to a 2.019MB  old asci sites file x|y|#cat %z
> 
> but when imported by
> v.in.ascii -zt .... x=2 y=3 z=4 cat=1
> it gets
> 0.770MB cidx
> 2.375MB coor
> 3.097MB topo
> 
> So I assume that coor would be the level1 and topo would be added for
> level 2.

Yes.

> what else is in the coor file that makes the binary file larger than
> the original asci?
>  From your paper I understand that there is much more than just
> coordinates in this file

1. In the coor file is stored also: 'line' (element) type, number 
    of attributes and layer number for each category.
2. Coordinates in binary file are stored as double (8bytes) while 
    in text file a coordinate can take less or more space.

> for lines, but do we need it for points too?

Yes.

> Is the topology file useful for the point data

Offten it is not. I did no pay special attention to large point datasets.
It is a pity that you did not come with these problems 2 years ago.

I have to repeat my suggestion, don't build topology if you dont need that
(add -b option in v.in.ascii) and write modules using vectors on level 1
(without topology) reading lines by Vect_read_next_line().

> (we certainly need spatial index for working with the site data but as
> I understand it it is created
> on the fly but not stored, so that would not be part of the topology
> file). If the topology file
> makes running other modules faster than it should be there, disk space
> usually is not
> a problem and then just  the build function needs to be modified for
> point data
> so that it does not eat up the memory. But if we don't need the topo
> file for anything for point data
> then we can skip the build function for point data and the other
> modules should be
> modified to read point data at level 1?

Either you need the topology or not, we cannot build topology 
without points. Topology for points is necessary for example 
for network modules because the graph is built using topology 
informations about lines and points.

> > Which vector modules do you need to work with LIDAR?
> 
> this is not that much about just the lidar data - it is about point
> files in general, below is a message that
> I just got from Jaro and they do not work with lidar (I assume that
> they have digitized contours
> or photogrammetric data). And there are other types of sensors or
> models that produce millions
> of points, so all vector modules that support point data need to be
> able to handle them in a reasonable
> way. I created a list but it is practically everything except for
> modules that deal with
> networks and polygon operations.
> GRASS6 is really great, everybody is impressed so we just need to get
> this point data issue
> resolved.

Some modules can probably use vector on level 1 and sequential read.
I wrote most of the modules so that they use level 2 and random access
because it is more simple and less work. 

The 'topology' structure does not store only the topology but also
'line' bounding box and line offset in coor file (index). 
The existing spatial index is using line ID in 'topology' structure
to identify lines in 'coor' file. Currently it is not possile to build 
spatial index without topology. I think that the problem with 
with memory consumption is in the spatial index. Can you
verify how much memory GRASS spatial index is using 
comparing to other spatial indexes (try 3D index)?
May be it is possible to lower memory consumption tuning 
MAXCARD (number of branches) in spatial index. Can anybody try that?

It is necessary to verify which modules can be changed to sequential read 
also because of OGR formats (currently v.external). Some OGR
formats are very slow in random access, e.g. PostGIS and it would 
be useful to use sequential read.

When I was writing new vector I have 
discovered that vector processing is just bunch of exceptions.
GRASS 6 was written first so that it works for all data types and 
various tasks. I do not work with large point datasets, so I did not
add 'exceptions' for large point files. Unfortunately nobody who needs 
to process such data joined the development. 

> Thank you both for looking into this,
> 
> Helena
> 
> ------------------------------------------------------------------------
> Subject: Re: [Fwd: Re: [GRASS5] r.random broken]
> Date: Fri, 01 Jul 2005 12:16:45 +0200
> From: Tomas Cebecauer <tomas.cebecauer at savba.sk>
> To: Jaro Hofierka <hofierka at geomodel.sk>
> CC: Marcel Suri <marcel.suri at jrc.it>
> 
> Koza migrujem na GRASS5.3.  :-(

Good approach to move things forward.

Radim