[postgis-users] Storage efficiency of point and line data

Michael Graff explorer at flame.org
Mon Nov 4 14:26:58 PST 2002


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hmm, so it almost looks like I should consider just storing the data
in a different format, and using a BOX3D to store the actual bounding
box.  For 47 million data points, with about another 15 to 20 million
to be added, the overhead is pretty high.  But, that's for another
day, since I need the index more than the space.

Thanks,

- --Michael

Paul Ramsey <pramsey at refractions.net> writes:

> typedef struct
> {
>    int32  size;     // postgres variable-length type requirement
>    int32  SRID;     // spatial reference system id
>    double offsetX;  // for precision grid (future improvement)
>    double offsetY;  // for precision grid (future improvement)
>    double scale;    // for precision grid (future improvement)
>    int32  type;     // this type of geometry
>    bool   is3d;     // true if the points are 3d (only for output)
>    BOX3D  bvol;     // bounding volume of all the geo objects
>    int32  nobjs;    // how many sub-objects in this object
>    int32  objType[1];   // type of object
>    int32  objOffset[1]; // offset (in bytes) into this structure where
>                         // the object is located
>    char   objData[1];   // store for actual objects
> 
> } GEOMETRY;
> 
> There's the structure, so above and beyond the actual ordinates, we
> are storing about 100 bytes of metadata. A bit more fluffy than a
> shapefile, but not alot. Admittedly though, when storing single points
> (or two point lines), it is a pretty massive overhead.
> 
> Michael Graff wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > It seems there is a large overhead to storing point and line data in
> > a geometry type.  mem_size() returns 172 bytes for a two-point line,
> > and goes up by 24 bytes per additional point.  Returning the data in
> > binary form seems to show only 6 bytes per point, so perhaps this
> > is twice the actual storage.
> > I thought about storing only the bounding boxes in a table, and
> > storing the actual shape in a flat binary file (probably storing
> > each lat/long pair as a pair of 32-bit signed integers) but it
> > turns out that wouldn't be a huge win, as most of the data I have
> > consists of 2 points:
> >    cnt    | points | size  - ----------+--------+-------
> >  23333966 |      2 |   172
> >   6789516 |      3 |   196
> >   3712433 |      4 |   220
> >   2438493 |      5 |   244
> >   1749440 |      6 |   268
> >   1346119 |      7 |   292
> >    976198 |      8 |   316
> >    806865 |      9 |   340
> >    658199 |     10 |   364
> > Is the storage format fairly efficient, and I'm simply storing a
> > whole
> > lot of data?
> >
> 
> 
> -- 
>        __
>       /
>       | Paul Ramsey
>       | Refractions Research
>       | Email: pramsey at refractions.net
>       | Phone: (250) 885-0632
>       \_
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (NetBSD)
Comment: See http://www.flame.org/~explorer/pgp for my keys

iD8DBQE9xvQxl6Nz7kJWYWYRAiJ/AJ9dvYL/kBYOqFmJBN2gpAmM6mkgOACeK40j
hwjjEvjMAD181o7gVekNdRU=
=KKtj
-----END PGP SIGNATURE-----




More information about the postgis-users mailing list