[postgis-devel] Re: [postgis-users] LWGEOM -- initial lwgeom.h file

Wed Mar 3 17:22:21 PST 2004

David Blasby wrote:

> Ralph,
>
> Thanks for your thoughts and comments.
>
> I like the LWLINE, LWPOINT, and LWPOLY types because they make all the 
> other functions much easier to read and write.  The current postgis 
> has POINT3D, LINE3D, and POLYGON3D.  You have a point about 
> re-allocation (which I'll address below) of the points.
>
> Unfortunately, you cannot just stick a pointer into the serialized 
> form's points.  The reason for this is memory alignment.  Lets look at 
> a  simple serialized form example:

I see what you mean, excuse my x86ed ness.  I sometimes forget.

> Using the serialized form's points directly means *all* the functions 
> have to be aware of 2D and 3D points, leading to the functions being 
> twice as complex as they need to be.
>

I think you should be able to get 3d points from 2d but you should still 
be able to get 2d points also.  So you have the option of writing a fast 
2d only function. 

> There is another alternative.  We can abstract the point list so it 
> handles the 2d/3d distinction and alignment issues.
>
> typedef struct
> {
>     char  *serialized_pointlist; // probably missaligned. 2d or 3d
>     char  is3d; // true if these are 3d points   
>     int32 npoints
> }  POINTARRAY;
>
We can form one of these by pointing directly into a portion of the 
serialized form.  We can easily add functions like:

>
> // copies a point from the point array into the parameter point
> // will set point's z=0 (or NaN) if pa is 2d
> // NOTE: point is a real POINT3D *not* a pointer
> extern void getPoint(POINTARRAY pa, int n, POINT3D point);
>
> Doing this means we dont waste any memory and we abstract all our 
> point lists behind a single interface.
>
This is really where I was trying to go - no allocations / 
deallocations, less code, you just get a point copied onto the stack.  
And you can get a 2d or 3d point.   The point array can also be 
allocated on the stack. 

>
> I'm a little confused as to what you mean by having only one type and 
> being able to use one bounding box function.  Could you explain a 
> little more?
> How is the bounding box finding function going to compute the bounding 
> box of a multilinestring object and a polygon object without having 
> functions that work on lines, point, and polygons?

If there is only one type (LW_GEOM) then there only needs to be one 
bounding box function - internally it must know how to calculate the box 
for the different geometries, but the programmer only has one function 
to bother with.

>> I am not sure I understand why bounding box can not be calculated and 
>> stored when a geometry goes over a given size? Then the above 
>> function can copy when one exists and calculate if not.
>
And the penny drops about the earlier email.

> I think putting the bounding box inside the geometry isnt all that 
> helpful.  Its only helpful for geometries with a large number of 
> points.  After 125 2d points (or 80 3d points), the geometry will be 
> TOASTed anyways (see my message on TOASTing).  This means that time to 
> take it off the disk is already quite high (pull placeholder from main 
> table, lookup TOAST info in the toast table, pull TOASTed tuple from 
> disk), so the computation time is very low in comparision.  The time 
> it takes to compute the bounding box on a 120,000 point polygon is 
> very very small - esp in comparision to taking 1000 pages off the disk.
>
> I had orginally thought the  bounding box inside the geometry would be 
> helpful, but I'm skeptical now.  NOTE: the index will contain the 
> bounding box. 

The main ideas I was trying to convey.

1. Be able to do things without heap allocation and deallocation

Discussed above.

2. That is should be possible to only use one type LW_GEOM, perhaps 
there is some 'context' that is able to be initialized for speed, and 
that is passed around.  This could store an error state.

example

double line_length2d(LW_GEOM_CONTEXT *line)
{
    int    i;
    POINT2D    frm, to;
    double    dist = 0.0;

       //Some end thing here can say - Expected a 2d line
      if (  VERIFY_DATATYPE(line,LINE2D) )
             return 0.0;

 int num_points = LINE_NUMPOINTS(line);

   if ( num_points <2 )
        return 0.0;    //must have >1 point to make sense

   LINE2D_GETPOINT(line,0,&frm)

    for (i=1; i<num_points;i++)
    {
       LINE2D_GETPOINT(line,i,&to)

        dist += sqrt( ( (frm->x - to->x)*(frm->x - to->x) )  +
                    ( (frm->y - to->y)*(frm->y - to->y) ) );

        frm = to;
    }
    return dist;
}

Finally before the return to postgres there is a macro called something 
like RETURN_ERROR which returns from the function with an error message 
if one is set in the context.

eg

Datum some_func(PG_FUNCTION_ARGS)
{
     LW_GEOM *gem = (GEOMETRY *) PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
     LW_GEOM_CONTEXT context;
     LW_INIT_CONTEXT(geom,& context );

    //Do some  processing
    double retval = line_length2d(context)

    RETURN_ERROR(context); //Returns only if there is an error and 
returns the message

    PG_RETURN_FLOAT8(retval );
}

The context and marcos can change and the code should keep working.  
They can also have parts that are conditional on architecture is necessary.

Anyway - just another view on it all.

Ralph