[postgis-users] Light WeightLight Weight Geometry (LWGEOM) Proposal

Fri Feb 20 12:22:36 PST 2004

As per our current discussions, I'm proposing a new Light-Weight 
Geometry.  This will be 'as-well-as' the current PostGIS so no one will 
lose anything.  I'm not going to back-port this, so we'll only support 
it in postgresql 7.4+.

Disk Representation (serialized form)

int32 size; //postgresql variable-length requirement
char  type; // this type (see below)
<data>

Where the 8-byte 'type' is defined bit-wise as:

xSBDtttt

WHERE
	x = unused
	S = 4 byte SRID attached (0= not attached (-1), 1= attached)
	B = bounding box attached (0=no, 1=yes) (32 bytes)
	D = dimentionality (0=2d, 1=3d)
	tttt = actual type (as per the WKB type):

	enum wkbGeometryType {
		wkbPoint = 1,
		wkbLineString = 2,
		wkbPolygon = 3,
		wkbMultiPoint = 4,
		wkbMultiLineString = 5,
		wkbMultiPolygon = 6,
		wkbGeometryCollection = 7
	};

In general, data will be exactly like the 3d-extended WKB representation 
(except there's no endian flag and the WKB type is defined as above).

The bounding box flag is an optional component for large geometries - 
for small (<1000 point geometries) it will not be present.  This allows 
for small storage of small geometries (where bounding boxes can be 
quickly calculated on-the-fly) but enhanced performance for large 
geometries.  This will probably be a compile-time option.

Examples (c.f. OGC SF SQL defintion of WKB (section 3.3.2.6))
-------------------------------------------------------------

A. 2D point w/o bounding box

<int32> size  = 21 bytes
<char> type:  S=0,B=0,D=0, tttt= 1
<double> X
<double> Y

B. 3D point w/o bounding box

<int32> size = 29 bytes
<char> type:  S=0,B=0,D=1, tttt= 1
<double> X
<double> Y
<double> Z

C. 2D point WITH bounding box
     (you would never put on with a points, but this is just an example)

<int32> size = 53 bytes
<char> type:  S=0,B=1,D=0, tttt= 1
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<double> X
<double> Y

D. 2D line String w/o bounding box

<int32> size = npoints*16 + 9
<char> type:  S=0,B=0,D=0, tttt= 2
<uint32> npoints
<double> X1
<double> Y1
<double> X2
<double> Y2
...

E. 2D line String with bounding box

<int32> size = npoints*16 + 9 + 32
<char> type:  S=0,B=1,D=0, tttt= 2
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> npoints
<double> X1
<double> Y1
<double> X2
<double> Y2
...

F. 3D polygon w/o bounding box
    NOTE: I havent explicitly put in the ogcLinearRing

<int32> size =
<char> type:  S=0,B=1,D=0, tttt= 3
<uint32> nrings
<uint32> npoints in ring1
<double> X1
<double> Y1
<double> X2
<double> Y2
...
<uint32> npoints in ring3
<double> X1
<double> Y1
<double> X2
<double> Y2
...
...

G. 2d multilines string w/o bounding boxes

     NOTE: this is like the OGC spec - we duplicate type info
           in the sub-geometries.  This is arguably not a good idea,
           but it does allow us to treat all the multi* and
           geometrycollection types equivelently.
           It also allows us to represent GeometryCollections of
           GeometryCollections (which postgis doesnt support).

<int32> size =
<char> type:  S=0,B=0,D=0, tttt= 5
<uint32> nlines
<char> type:  S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 1
<double> X1
<double> Y1
<double> X2
<double> Y2
....
<char> type:  S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 2
<double> X1
<double> Y1
<double> X2
<double> Y2
....

G. 2d multilines string with main bounding box
<int32> size =
<char> type:  S=0,B=0,D=0, tttt= 5
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> nlines
<char> type:  S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 1
<double> X1
<double> Y1
<double> X2
<double> Y2
....
<char> type:  S=0,B=0,D=0, tttt= 2
<uint32> npoints in line 2
<double> X1
<double> Y1
<double> X2
<double> Y2
....

G. 2d multilines string with main and sub bounding boxes
     NOTE: since our types are defined recursive manner, this
           type is possible.  I dont think we should construct them in
           general.

<int32> size =
<char> type:  S=0,B=0,D=0, tttt= 5
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> nlines
<char> type:  S=0,B=1,D=0, tttt= 2
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> npoints in line 1
<double> X1
<double> Y1
<double> X2
<double> Y2
....
<char> type:  S=0,B=1,D=0, tttt= 2
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<uint32> npoints in line 2
<double> X1
<double> Y1
<double> X2
<double> Y2
....

H. 2D point w/o bounding box (with SRID)

<int32> size  = 25 bytes
<char> type:  S=1,B=0,D=0, tttt= 1
<int32> SRID
<double> X
<double> Y

I. 3D point w/o bounding box (with SRID)

<int32> size = 33 bytes
<char> type:  S=1,B=0,D=1, tttt= 1
<int32> SRID
<double> X
<double> Y
<double> Z

J. 2D point WITH bounding box (with SRID)
     (you would never put on with a points, but this is just an example)
     (note: SRID comes before bounding box)

<int32> size = 57 bytes
<char> type:  S=1,B=1,D=0, tttt= 1
<int32> SRID
<double> xmin
<double> ymin
<double> xmax
<double> ymax
<double> X
<double> Y

Other notes:

Cannonical form will be very much like the current WKB type (ie. looks 
like '000000FF001A..').  This means your pg_dumps will look strange, but 
you'll find it faster and there will not be any numberic drift as you 
move to and from WKT.

We'll need to write a WKB to LWGEOM and LWGEOM to WKB.  Since LWGEOM is 
very close to WKB, this should be simple.

To get to WKT, we can do a LWGEOM->WKB->PostGIS GEOMETRY->WKB.  For 
parsing WKT, we can WKT->PostGIS GEOMETRY->WKB->LWGEOM.  The PostGIS 
conversion functions already exist, so this will be very easy.

We'll also need to write something to convert a LWGEOM to a bounding box 
plus all the indexing support functions (based on BOX2DFLOAT4s).

One of the design issues I have with PostGIS is that all the analysis 
functions deal directly with the serialized GEOMETRY form.  This makes 
them more complex and difficult to maintain.  I suggest we have soup-up 
versions of the current PostGIS geometry types (i.e. POLYGON3D, 
LINESTRING3D, POINT3D) for LWGEOM (i.e. LW_POLYGON, LW_LINE, LW_POINT) 
which would hide things like 2d vs 3d and make construction easier.

What think?

dave