[postgis-devel] Caching Double-based Boxes

Fri Nov 25 19:50:24 PST 2011

Super Double-Box Search-Off     25Nov11 -dbb
====================================================================

"A quick estimate of performance of PostGIS 2 with
  geometry bounding boxes as doubles vs as floats"

--
All tests performed on Linux 2.6.38 x86_64
PostgreSQL 9.0.5
  shared_buffers = 2400MB
  temp_buffers = 64MB
  work_mem = 512MB
  maintenance_work_mem = 200MB

--
The Test  -  TIGER 2010 California

TIGER provides many possibilities for tests.. The one I chose here
uses PLACE (a large'ish MULTIPOLY) and EDGE (MULTILINE)

A random sampling of 100 EDGES in California shows the
following characteristics:
--------------------------
select st_npoints(the_geom) from ca_edges order by random() limit 100;

min,median,mean,max    stdev
2, 5.15, 16.333, 212   34.23

sorted, the first dozen entries are 2, the last are the following
[ ... 36,37,43,44,51,57,58,68,68,79,206,206,212]

I reasoned that although many EDGES are trivially short, there
exist substantial numbers of non-trivial EDGES, so they would be ok for 
a test. 

Next, pick some large PLACE geometries.. 
 From the Dozen largest (by area) PLACE rows, I picked 4 well-known cities

now, the test (with indexes in place I trust)

----------
select count(*) from ca_edges where st_intersects( the_geom,
(select st_envelope(the_geom) from ca_place where name = 'City_Name') );
----------

Results:

place_name,trunk_cnt,trunk_time,dblbox_cnt,dblbox_time
Bakersfield,39691,24.230,39691,25.275
Fresno,36527,24.217,36527,25.194
Oakland,44404,24.395,44404,25.680
Lancaster,16566,23.983,16566,25.194

-----------------------------------------
preliminary conclusions

the search times were almost the same for each of the 4 cases
and I did not detect change with multiple runs of the same query
(though I could have tested that more, too)

search times were fast enough such that I believe indexes are being used
(I did not check all indexes carefully, but the database setup was the same
for both test contexts, so if they missed something, both tests likely missed
in the same way)

the result counts for both trunk and dblbox match, as a sanity check

performance hit is around 4% in each case

=======

caveat: I could have misjudged and chosen a poor test case for some
obscure-to-me reason.. hopefully, this is useful and informative

best regards from San Francisco, California

Brian Hamlin
GeoCal
OSGeo California Chapter
maplabs at light42.com