[postgis-users] LWGEOM -- initial lwgeom.h file

Tue Mar 2 10:22:39 PST 2004

Ralph,

Thanks for your thoughts and comments.

I like the LWLINE, LWPOINT, and LWPOLY types because they make all the 
other functions much easier to read and write.  The current postgis has 
POINT3D, LINE3D, and POLYGON3D.  You have a point about re-allocation 
(which I'll address below) of the points.

Unfortunately, you cannot just stick a pointer into the serialized 
form's points.  The reason for this is memory alignment.  Lets look at a 
  simple serialized form example:

2D line String

<int32> size = ...
<char> type:  S=0,D=0, tttt= 2
<uint32> npoints (3)
<double> X0
<double> Y0
<double> X1
<double> Y1
<double> X2
<double> Y2

And the equivelent C struct:

typedef struct
{
	int32   size;
	char    type;
         uint32  npoints;
         POINT2D points[3];
} three_point_line;

You'd think you could just cast the serialized form into the 
three_point_line type.  Unfortunately, you can not.  The actual 
three_point_line type looks like more like this: (note - intel machines 
are 4-byte aligned and solaris is 8-byte aligned)

typedef struct
{
	int32   size;
	char    type;

         byte    junk1;  // intel and solaris
         byte    junk2;  // intel and solaris
         byte    junk3;  // intel and solaris

         uint32  npoints;  // properly aligned

         byte    junk4;  // solaris only
         byte    junk5;  // solaris only
         byte    junk6;  // solaris only
         byte    junk7;  // solaris only

         POINT2D points[3]; // properly aligned

} three_point_line;

In the serialied form, X0 is 9 bytes into the structure.  If you try 
something like this in solaris, you'll immediatly segfault due to 
miss-alignment:

    *((double *) &serialized_form[9])

The solution to this is either force the structure to be memory aligned 
(this is what postgis currently does) - but then you're wasting space in 
the database, or you can copy the points to a new structure that is 
properly aligned (which is what I proposed for the lwgeom) - but then 
you're wasting time copying.

Using the serialized form's points directly means *all* the functions 
have to be aware of 2D and 3D points, leading to the functions being 
twice as complex as they need to be.

There is another alternative.  We can abstract the point list so it 
handles the 2d/3d distinction and alignment issues.

typedef struct
{
	char  *serialized_pointlist; // probably missaligned. 2d or 3d
	char  is3d; // true if these are 3d points	
	int32 npoints
}  POINTARRAY;

We can form one of these by pointing directly into a portion of the 
serialized form.  We can easily add functions like:

// copies a point from the point array into the parameter point
// will set point's z=0 (or NaN) if pa is 2d
// NOTE: point is a real POINT3D *not* a pointer
extern void getPoint(POINTARRAY pa, int n, POINT3D point);

Doing this means we dont waste any memory and we abstract all our point 
lists behind a single interface.

I'm a little confused as to what you mean by having only one type and 
being able to use one bounding box function.  Could you explain a little 
more?
How is the bounding box finding function going to compute the bounding 
box of a multilinestring object and a polygon object without having 
functions that work on lines, point, and polygons?

> I am not sure I understand why bounding box can not be calculated and 
> stored when a geometry goes over a given size? Then the above function 
> can copy when one exists and calculate if not.

I think putting the bounding box inside the geometry isnt all that 
helpful.  Its only helpful for geometries with a large number of points. 
  After 125 2d points (or 80 3d points), the geometry will be TOASTed 
anyways (see my message on TOASTing).  This means that time to take it 
off the disk is already quite high (pull placeholder from main table, 
lookup TOAST info in the toast table, pull TOASTed tuple from disk), so 
the computation time is very low in comparision.  The time it takes to 
compute the bounding box on a 120,000 point polygon is very very small - 
esp in comparision to taking 1000 pages off the disk.

I had orginally thought the  bounding box inside the geometry would be 
helpful, but I'm skeptical now.  NOTE: the index will contain the 
bounding box.

> While we are talking about this, I suggest a standard flex bison/parser 
> for WKT, the parser can pretty easily output a LW_GEOM with a bounding 
> box when it exceeds a given threashold.  I would be happy to put this 
> together.

WOOT!  This would be great!  WOOT!

Have to run,
dave