[postgis-devel] Light WeightLight Weight Geometry (LWGEOM) Proposal
David Blasby
dblasby at refractions.net
Fri Feb 20 12:22:36 PST 2004
As per our current discussions, I'm proposing a new Light-Weight
Geometry. This will be 'as-well-as' the current PostGIS so no one will
lose anything. I'm not going to back-port this, so we'll only support
it in postgresql 7.4+.
Disk Representation (serialized form)
int32 size; //postgresql variable-length requirement
char type; // this type (see below)
<data>
Where the 8-byte 'type' is defined bit-wise as:
xSBDtttt
WHERE
x = unused
S = 4 byte SRID attached (0= not attached (-1), 1= attached)
B = bounding box attached (0=no, 1=yes) (32 bytes)
D = dimentionality (0=2d, 1=3d)
tttt = actual type (as per the WKB type):
enum wkbGeometryType {
wkbPoint = 1,
wkbLineString = 2,
wkbPolygon = 3,
wkbMultiPoint = 4,
wkbMultiLineString = 5,
wkbMultiPolygon = 6,
wkbGeometryCollection = 7
};
In general, data will be exactly like the 3d-extended WKB representation
(except there's no endian flag and the WKB type is defined as above).
The bounding box flag is an optional component for large geometries -
for small (<1000 point geometries) it will not be present. This allows
for small storage of small geometries (where bounding boxes can be
quickly calculated on-the-fly) but enhanced performance for large
geometries. This will probably be a compile-time option.
Examples (c.f. OGC SF SQL defintion of WKB (section 3.3.2.6))
-------------------------------------------------------------
A. 2D point w/o bounding box
<int32> size = 21 bytes
<char> type: S=0,B=0,D=0, tttt= 1
<double> X
<double> Y
B. 3D point w/o bounding box
<int32> size = 29 bytes
<char> type: S=0,B=0,D=1, tttt= 1
<double> X
<double> Y
<double> Z
C. 2D point WITH bounding box
(you would never put on with a points, but this is just an example)
<int32> size = 53 bytes
<char> type: S=0,B=1,D=0, tttt= 1
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<double> X
<double> Y
D. 2D line String w/o bounding box
<int32> size = npoints*16 + 9
<char> type: S=0,B=0,D=0, tttt= 2
<uint32> npoints
<double> X1
<double> Y1
<double> X2
<double> Y2
...
E. 2D line String with bounding box
<int32> size = npoints*16 + 9 + 32
<char> type: S=0,B=1,D=0, tttt= 2
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> npoints
<double> X1
<double> Y1
<double> X2
<double> Y2
...
F. 3D polygon w/o bounding box
NOTE: I havent explicitly put in the ogcLinearRing
<int32> size =
<char> type: S=0,B=1,D=0, tttt= 3
<uint32> nrings
<uint32> npoints in ring1
<double> X1
<double> Y1
<double> X2
<double> Y2
...
<uint32> npoints in ring3
<double> X1
<double> Y1
<double> X2
<double> Y2
...
...
G. 2d multilines string w/o bounding boxes
NOTE: this is like the OGC spec - we duplicate type info
in the sub-geometries. This is arguably not a good idea,
but it does allow us to treat all the multi* and
geometrycollection types equivelently.
It also allows us to represent GeometryCollections of
GeometryCollections (which postgis doesnt support).
<int32> size =
<char> type: S=0,B=0,D=0, tttt= 5
<uint32> nlines
<char> type: S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 1
<double> X1
<double> Y1
<double> X2
<double> Y2
....
<char> type: S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 2
<double> X1
<double> Y1
<double> X2
<double> Y2
....
G. 2d multilines string with main bounding box
<int32> size =
<char> type: S=0,B=0,D=0, tttt= 5
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> nlines
<char> type: S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 1
<double> X1
<double> Y1
<double> X2
<double> Y2
....
<char> type: S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 2
<double> X1
<double> Y1
<double> X2
<double> Y2
....
G. 2d multilines string with main and sub bounding boxes
NOTE: since our types are defined recursive manner, this
type is possible. I dont think we should construct them in
general.
<int32> size =
<char> type: S=0,B=0,D=0, tttt= 5
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> nlines
<char> type: S=0,B=1,D=0, tttt= 2
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> npoints in line 1
<double> X1
<double> Y1
<double> X2
<double> Y2
....
<char> type: S=0,B=1,D=0, tttt= 2
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> npoints in line 2
<double> X1
<double> Y1
<double> X2
<double> Y2
....
H. 2D point w/o bounding box (with SRID)
<int32> size = 25 bytes
<char> type: S=1,B=0,D=0, tttt= 1
<int32> SRID
<double> X
<double> Y
I. 3D point w/o bounding box (with SRID)
<int32> size = 33 bytes
<char> type: S=1,B=0,D=1, tttt= 1
<int32> SRID
<double> X
<double> Y
<double> Z
J. 2D point WITH bounding box (with SRID)
(you would never put on with a points, but this is just an example)
(note: SRID comes before bounding box)
<int32> size = 57 bytes
<char> type: S=1,B=1,D=0, tttt= 1
<int32> SRID
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<double> X
<double> Y
Other notes:
Cannonical form will be very much like the current WKB type (ie. looks
like '000000FF001A..'). This means your pg_dumps will look strange, but
you'll find it faster and there will not be any numberic drift as you
move to and from WKT.
We'll need to write a WKB to LWGEOM and LWGEOM to WKB. Since LWGEOM is
very close to WKB, this should be simple.
To get to WKT, we can do a LWGEOM->WKB->PostGIS GEOMETRY->WKB. For
parsing WKT, we can WKT->PostGIS GEOMETRY->WKB->LWGEOM. The PostGIS
conversion functions already exist, so this will be very easy.
We'll also need to write something to convert a LWGEOM to a bounding box
plus all the indexing support functions (based on BOX2DFLOAT4s).
One of the design issues I have with PostGIS is that all the analysis
functions deal directly with the serialized GEOMETRY form. This makes
them more complex and difficult to maintain. I suggest we have soup-up
versions of the current PostGIS geometry types (i.e. POLYGON3D,
LINESTRING3D, POINT3D) for LWGEOM (i.e. LW_POLYGON, LW_LINE, LW_POINT)
which would hide things like 2d vs 3d and make construction easier.
What think?
dave
More information about the postgis-devel
mailing list