Speeding up mapping applications by changing data structures

Tue Oct 26 08:24:08 PDT 1999

Your points are well taken. I chose the more structured approach for a
couple of reasons:

  - readablity. I actually made the jump from the ESRI xyxyxyxy way of
    doing things when implementing new clipping functionality in version
    3.2. Working with the xyxyxyxy list isn't hard, but dealing with parts
    arrays was nasty. The code now is a good bit more readable and is
    much easier to re-visit after a few months.

  - compatibility. I've always envisioned tapping in to OGC or OGDI
    resources as a means to extend the MapServer. It's pretty clear that
    higher level abstractions of a feature would be necessary to make
    that integration simpler.

I've contemplated the switch you outline below a couple of times, but haven't
done it. What really changes is the definition of a point- from a structure to an
2 element array. It could be done, a couple hundred changes, but all straight
forward. Probably could do it with 2 sed commands. I'd be interested in the
performance benefits.

Steve

Stephen Lime
Internet Applications Analyst
MIS Bureau - MN DNR

(651) 297-2937
steve.lime at dnr.state.mn.us

>>> Cameron Shorter <cshorter at optusnet.com.au> 10/24 3:49 AM >>>
I recently heard a speach from one of the silicon graphics guys, who
explained, among other things, where technology is heading, and what that
means for writing programs.

CPU instruction speeds are increasing faster than RAM access speeds.  Hence,
with old machines, an access to RAM took one CPU cycle, now RAM accesses take
lots of CPU cycles.
Also, it is faster to access a block of RAM than it is to access the same
ammount of data from a number of different locations, partly because CPUs now
make use of memory look ahead.

What this means, is that programs should use pointers sparingly, and I belive
this explains Steve's comment that the Shapelib 1.2 algorithms are
significantly slower than Shapelib 1.1.

The shape object in shapefil.h is copied below:

typedef struct
{
    int		nSHPType;

    int		nShapeId; /* -1 is unknown/unassigned */

    int		nParts;
    int		*panPartStart;
    int		*panPartType;

    int		nVertices;
    double	*padfX;
    double	*padfY;
    double	*padfZ;
    double	*padfM;

    double	dfXMin;
    double	dfYMin;
    double	dfZMin;
    double	dfMMin;

    double	dfXMax;
    double	dfYMax;
    double	dfZMax;
    double	dfMMax;
} SHPObject;

Note that the X,Y,Z,M coordinates are in seperate NULL terminated lists.  When
a vertex is accessed, 4 seperate pointers need to be resolved.  The ESRI
shapefile record stucture stores the data much more efficently as
(X,Y,Z,M),(X,Y,Z,M),...

Steve,
Your structures for the next release are better, however I belive they can be
improved also.  As I understand it, an object is of the form:
  (line1, line2, line3, ...)
which expands out to:
  (nPoints,(x,y),(x,y),..) , (nPoints,(x,y),(x,y),..), ...

I suspect you could improve efficiency slightly if the "nPoints" is moved out
into another array similar to the ESRI structure, although improvements would
probably be marginal.

Steve and Frank,
To help the structure of supporting packages, it would be good if coordinates
are writting as an array - ( c[0],c[1],c[2],c3 ) instead of (x,y,z,m).

It would then be easier to write generic procedures which can handle 2, 3, or
4 dimentional coordinates.

Lastly,
I realise that data structures effects everything that depends upon it, and
hence a lot of work, so I'm not expecting you to change everything.