Speeding up mapping applications by changing data structures

Cameron Shorter cshorter at optusnet.com.au
Sun Oct 24 01:49:08 PDT 1999


I recently heard a speach from one of the silicon graphics guys, who
explained, among other things, where technology is heading, and what that
means for writing programs.

CPU instruction speeds are increasing faster than RAM access speeds.  Hence,
with old machines, an access to RAM took one CPU cycle, now RAM accesses take
lots of CPU cycles.
Also, it is faster to access a block of RAM than it is to access the same
ammount of data from a number of different locations, partly because CPUs now
make use of memory look ahead.

What this means, is that programs should use pointers sparingly, and I belive
this explains Steve's comment that the Shapelib 1.2 algorithms are
significantly slower than Shapelib 1.1.

The shape object in shapefil.h is copied below:

typedef struct
{
    int		nSHPType;

    int		nShapeId; /* -1 is unknown/unassigned */

    int		nParts;
    int		*panPartStart;
    int		*panPartType;
    
    int		nVertices;
    double	*padfX;
    double	*padfY;
    double	*padfZ;
    double	*padfM;

    double	dfXMin;
    double	dfYMin;
    double	dfZMin;
    double	dfMMin;

    double	dfXMax;
    double	dfYMax;
    double	dfZMax;
    double	dfMMax;
} SHPObject;

Note that the X,Y,Z,M coordinates are in seperate NULL terminated lists.  When
a vertex is accessed, 4 seperate pointers need to be resolved.  The ESRI
shapefile record stucture stores the data much more efficently as
(X,Y,Z,M),(X,Y,Z,M),...

Steve,
Your structures for the next release are better, however I belive they can be
improved also.  As I understand it, an object is of the form:
  (line1, line2, line3, ...)
which expands out to:
  (nPoints,(x,y),(x,y),..) , (nPoints,(x,y),(x,y),..), ...

I suspect you could improve efficiency slightly if the "nPoints" is moved out
into another array similar to the ESRI structure, although improvements would
probably be marginal.

Steve and Frank,
To help the structure of supporting packages, it would be good if coordinates
are writting as an array - ( c[0],c[1],c[2],c3 ) instead of (x,y,z,m).

It would then be easier to write generic procedures which can handle 2, 3, or
4 dimentional coordinates.

Lastly,
I realise that data structures effects everything that depends upon it, and
hence a lot of work, so I'm not expecting you to change everything.



More information about the MapServer-users mailing list