[postgis-users] PostGIS and LIDAR Massive Point Sets

Thu Feb 19 06:47:25 PST 2004

We were wondering why a sample 200MB text file of 8 million points 
ballooned into over a gigabyte when loaded into postgres.   I for one 
would vote for making two data structures: one for point data and one 
general for all other types.  Point data is somewhat unique in its 
simplicity and commonality.  There is also the potential to represent 
grids using stored point data. 

With this current generation of remote sensing devices, data resolution 
is so high that an overflight of one county (Napa, ca, us) of raw LIDAR 
data is 160 gigabytes.  I don't look forward to causing that to go up 
over a terrabyte after loading.  This is only one county, one project.  
After a hundred of these kind of projects....  

Databases are really the best way to store this data, but if there isn't 
an efficient way to do it, I think we may stick to flat binary files or 
loading the data into a simple non-spatial table with a fixed 
projection.   So, for point data, a space/performance efficient custom 
structure would have clear benefit, at least from my point of view :)

Collin Bode
GIS Informatics Researcher, UCB

Paul Ramsey wrote:

> David Blasby wrote:
>
>> The WKB representation for a 2d point has 5 bytes of overhead (endian 
>> flag and a 4-byte type tag) for a total of 21 bytes.  WKB of a 3d 
>> point also has 5 bytes of overhead, for a total of 29 bytes.  
>> Postgresql has a further 4 bytes of overhead.
>
>
> You know, one of the reasons WKB (and PostGIS internals) structures 
> carry extra overhead is because they are general structures -- all 
> geometry types are represented with the same structure. Which is kind 
> of wasteful when you think that PostgreSQL itself is also taking 
> overhead to describe the object type.  If we actually had the full 7 
> geometry types in PostgreSQL as full fledged types we could have the 
> optimal storage for each (why store a bounding box for a point? it's a 
> *point*!). The main argument against this is a code maintenance one -- 
> index bindings for each type, instead for just one.
>
> It is a hard call, since the only reasons to do this kind of reworking 
> are potential performance increases, and we have no quantification of 
> performance gains to expect. The only thing we *do* know for sure is 
> that this kind of optimization would save some disk space.
>
> P.
>