Performance of Scanning a Quadtree Index

Wed Aug 29 09:15:25 PDT 2007

Brock Anderson wrote:
> Hi List,
> 
> I ran into a curious situation with a quadtree (.qix) index on a 
> shapefile.  Basically the issue is that performance of scanning the 
> index to fetch features is not as good as I expect.  Some details:
> 
> I have a shapefile data set with 4 million polygons fairly normally 
> distributed around British Columbia, Canada.  I used 'shptree' to create 
> a spatial index on that data.  I have a very simple layer in my mapfile 
> pointing at the data.  Minimal styling, no labels, etc.
> 
> I then make a simple WMS request to fetch exactly 1000 features (limited 
> by a bbox) from the layer.  Mapserver take about 500ms. Seems a bit high.
> Geoserver can draw the exact same data, using the *same .qix* index in 
> about 150ms.  Naturally I made every effort to keep the comparison 
> fair.  No reprojection in either case, nearly identical styling, etc.
> 
> As a further comparison I noticed that Mapserver and Geoserver are 
> nearly equal for fetching/drawing 1000 features from a smaller data set 
> of just 10,000.  Response time there is more like 120ms.
> So on large shapefile data sets Geoserver's index scanning seems to be 
> substantially faster.  Are there any map file options that might improve 
> performance?  Could it be that Geoserver simply has a faster 
> implementation for traversing the quadtree?

Brock,

I haven't dug into this, but I think MapServer pulls the features from
the shapefile in the order they are identified traversing the spatial
index.  This *could* mean relatively random fetching depending on the
spatial coherence (or lack thereof) of the records in the shapefile.

If (and I haven't actually done any investigation) GeoServer uses the
spatial index to build a list of feature ids, sorts that and then
fetches them in the order they appear in the shapefile then GeoServer
might pull the data in a much more "in order" form from disk which
can give much better read performance.

I stress this is not something I did any investigation into.  It is
just a wild theory that might explain it.

BTW, OGR also uses .qix files, and does build one bit list which it
sorts exactly to avoid random fetching.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | President OSGeo, http://osgeo.org