[postgis-users] Storage efficiency of point and line data
Michael Graff
explorer at flame.org
Mon Nov 4 14:26:58 PST 2002
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hmm, so it almost looks like I should consider just storing the data
in a different format, and using a BOX3D to store the actual bounding
box. For 47 million data points, with about another 15 to 20 million
to be added, the overhead is pretty high. But, that's for another
day, since I need the index more than the space.
Thanks,
- --Michael
Paul Ramsey <pramsey at refractions.net> writes:
> typedef struct
> {
> int32 size; // postgres variable-length type requirement
> int32 SRID; // spatial reference system id
> double offsetX; // for precision grid (future improvement)
> double offsetY; // for precision grid (future improvement)
> double scale; // for precision grid (future improvement)
> int32 type; // this type of geometry
> bool is3d; // true if the points are 3d (only for output)
> BOX3D bvol; // bounding volume of all the geo objects
> int32 nobjs; // how many sub-objects in this object
> int32 objType[1]; // type of object
> int32 objOffset[1]; // offset (in bytes) into this structure where
> // the object is located
> char objData[1]; // store for actual objects
>
> } GEOMETRY;
>
> There's the structure, so above and beyond the actual ordinates, we
> are storing about 100 bytes of metadata. A bit more fluffy than a
> shapefile, but not alot. Admittedly though, when storing single points
> (or two point lines), it is a pretty massive overhead.
>
> Michael Graff wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > It seems there is a large overhead to storing point and line data in
> > a geometry type. mem_size() returns 172 bytes for a two-point line,
> > and goes up by 24 bytes per additional point. Returning the data in
> > binary form seems to show only 6 bytes per point, so perhaps this
> > is twice the actual storage.
> > I thought about storing only the bounding boxes in a table, and
> > storing the actual shape in a flat binary file (probably storing
> > each lat/long pair as a pair of 32-bit signed integers) but it
> > turns out that wouldn't be a huge win, as most of the data I have
> > consists of 2 points:
> > cnt | points | size - ----------+--------+-------
> > 23333966 | 2 | 172
> > 6789516 | 3 | 196
> > 3712433 | 4 | 220
> > 2438493 | 5 | 244
> > 1749440 | 6 | 268
> > 1346119 | 7 | 292
> > 976198 | 8 | 316
> > 806865 | 9 | 340
> > 658199 | 10 | 364
> > Is the storage format fairly efficient, and I'm simply storing a
> > whole
> > lot of data?
> >
>
>
> --
> __
> /
> | Paul Ramsey
> | Refractions Research
> | Email: pramsey at refractions.net
> | Phone: (250) 885-0632
> \_
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (NetBSD)
Comment: See http://www.flame.org/~explorer/pgp for my keys
iD8DBQE9xvQxl6Nz7kJWYWYRAiJ/AJ9dvYL/kBYOqFmJBN2gpAmM6mkgOACeK40j
hwjjEvjMAD181o7gVekNdRU=
=KKtj
-----END PGP SIGNATURE-----
More information about the postgis-users
mailing list