[postgis-users] PostGIS and LIDAR Massive Point Sets
collin
collin at socrates.Berkeley.EDU
Thu Feb 19 06:47:25 PST 2004
We were wondering why a sample 200MB text file of 8 million points
ballooned into over a gigabyte when loaded into postgres. I for one
would vote for making two data structures: one for point data and one
general for all other types. Point data is somewhat unique in its
simplicity and commonality. There is also the potential to represent
grids using stored point data.
With this current generation of remote sensing devices, data resolution
is so high that an overflight of one county (Napa, ca, us) of raw LIDAR
data is 160 gigabytes. I don't look forward to causing that to go up
over a terrabyte after loading. This is only one county, one project.
After a hundred of these kind of projects....
Databases are really the best way to store this data, but if there isn't
an efficient way to do it, I think we may stick to flat binary files or
loading the data into a simple non-spatial table with a fixed
projection. So, for point data, a space/performance efficient custom
structure would have clear benefit, at least from my point of view :)
Collin Bode
GIS Informatics Researcher, UCB
Paul Ramsey wrote:
> David Blasby wrote:
>
>> The WKB representation for a 2d point has 5 bytes of overhead (endian
>> flag and a 4-byte type tag) for a total of 21 bytes. WKB of a 3d
>> point also has 5 bytes of overhead, for a total of 29 bytes.
>> Postgresql has a further 4 bytes of overhead.
>
>
> You know, one of the reasons WKB (and PostGIS internals) structures
> carry extra overhead is because they are general structures -- all
> geometry types are represented with the same structure. Which is kind
> of wasteful when you think that PostgreSQL itself is also taking
> overhead to describe the object type. If we actually had the full 7
> geometry types in PostgreSQL as full fledged types we could have the
> optimal storage for each (why store a bounding box for a point? it's a
> *point*!). The main argument against this is a code maintenance one --
> index bindings for each type, instead for just one.
>
> It is a hard call, since the only reasons to do this kind of reworking
> are potential performance increases, and we have no quantification of
> performance gains to expect. The only thing we *do* know for sure is
> that this kind of optimization would save some disk space.
>
> P.
>
More information about the postgis-users
mailing list