[Mapserver-users] PostGIS / Shapefile Performance Question
Thomas, Cord
cthomas at rand.org
Wed Jan 15 10:54:03 PST 2003
Okay,
could someone simply explain the difference between an "index of shapes" as
provided by ArcView indexing of the shape field (i assume this is what you
all mean) and a spatial index (which i understand to be some sort of fixed
or flexible grid structure grouping sets of shapes into the grids).
so, i sort of understand - but basically - WHAT IS THE ESRI SHAPE INDEX?
Cord
-----Original Message-----
From: Lowther, David W [mailto:dlowther at ou.edu]
Sent: Wednesday, January 15, 2003 10:45 AM
To: 'Paul Ramsey'; mapserver-users at lists.gis.umn.edu
Subject: RE: [Mapserver-users] PostGIS / Shapefile Performance Question
Paul,
Thanks for the excellent info. This is exactly what I was trying to
understand.
I am using the indexing on the shapefiles as provided by shpindex. I haven't
tested without this index, but I haven't had any reason to doubt that the
people that recommend using the indices KNOW what they are suggesting and
why.
It's interesting that ESRI has provided a means (within Arcview) to index
shapefile shapes and attributes, but not to index shapefiles spatially. I
had never thought about that. I guess I always assumed it was "built in"
because ESRI is such a thoughtful vendor especially when it comes to
performance of their products. (Choke, cough, gasp - I almost couldn't even
write it as a joke...)
And the reasons you've listed for using PostGIS are exactly the reasons I
would prefer it to shapefiles. In some instances, the expression on a
shapefile just won't cut it. And the enterprise-wide database solution in
the holy grail.
We host the data warehouse for Oklahoma. Datasets come in from a variety of
agencies (in shapefile format) via FTP and are automatically uploaded to an
OpenGIS compliant SQLServer database that we developed before there WERE
OpenGIS compliant solutions out there. We've got a CGI that will build
shapefiles from the database on the fly for the area a user desires. The CGI
would also map directly from the database, but mapserver blows our CGI's
performance away so I switched. Now we have a little more piecemeal
solution: the shapefiles for mapserver, the postgis for layers that require
it for filtering, our database / CGI for building shapefiles for download,
etc - but its all worth it for the performance.
I had (and probably still have) hopes of migrating all of this over to
PostGIS. It wouldn't take much to change our CGI over to PostGIS's way of
storing (or recalling) geometries. It does seem that the open source
community would benefit from this effort. Actually all it would be is a
Win32 based pgsql2shp that would allow you to filter output based on another
shape. We would use this to allow users to download layers based on their
map's visible extent and other nifty things like that.
Anyway, I've gone on a bit unsolicitedly here.
Again, thanks for the info.
Dave Lowther
-----Original Message-----
From: Paul Ramsey [mailto:pramsey at refractions.net]
Sent: Wednesday, January 15, 2003 12:12 PM
To: Lowther, David W; mapserver-users at lists.gis.umn.edu
Subject: Re: [Mapserver-users] PostGIS / Shapefile Performance Question
David,
Early in the development of the PostGIS / Mapserver connector we did
some benchmarking of PostGIS against Shape files.
Shape files will be faster than PostGIS for simple map drawing
applications in almost every case. The rendering step is going to be
the same regardless of data source. That leaves data access, and an
indexed shapefile will always have slightly lower overhead than an
indexed spatial table for a simple spatial bounding rectangle query. We
found that the speed difference was lowest what the number of features
was smallest. Ie, for drawing a map with only 3 features, selected out
of a table of 300000, the PostGIS layer took less than 10% longer (on a
scale measured in 1/100s of a second, mind you :). For drawing maps
with more (several thousand) features, the PostGIS overhead got as high
as 20-30%.
(Note that all the statements above assume you have build an index on
your shape files. It is interesting to note that ESRI has never put out
a means of spatially indexing shape files, and as a result there is a
kind of collective brain melt in our field which says "shape files are
'too slow' for web mapping". This is true with ArcIMS (no spatial index)
but not true with Mapserver.)
So why use PostGIS at all?
Several reasons:
- large shape file archives can be hard to manage if the data changes
regularly
- if you have an interactive site which allows online updates then
concurrent shapefile writing could cause data corruption as well as
indexes going out of sync with the underlying data
- you can do complex multi-table queries much faster PostGIS than with
shape files
- you can do attribute-based queries much faster with PostGIS than with
shape files (because shape files lack an index on the attributes)
- you can use your PostGIS/PostgreSQL system as a full corporate data
repository, storing your business attributes and spatial objects in the
same data schema, managing the different aspects of the data with many
different tools, using standard access methods like JDBC and ODBC
Lowther, David W wrote:
> Are there certain situations in which the access to PostGIS might be
quicker
> then a shapefile, say when zoomed in closely or zoomed way out or when
doing
> a point based query?
No. In the very-zoomed-in case the performance will be almost identical,
but never faster.
> Is there a point where the number of features in a layer would cause
PostGIS
> or shapefiles to perform better?
The PostGIS r-tree index might end up more balanced than the shape file
quadtree for certain kinds of spatial data. At larger archive sizes it
is possible that this might result in a noticable performance win. I
cannot give a concrete example however.
> What if I put a monster of a machine in place as the postgres server?
Could
> I build a postgres server that would be as fast as shapefiles local to
> mapserver?
Well, if you give your postgis database more oomf to read through the
data, you might make it faster than your poor little mapserver
read-and-render machine, but it hardly seems fair to make the
comparison. If you are buying a monster machine you could just run the
read-and-render mapserver on it, and your shape files would still be faster.
> What happens as the application scales? If I saw traffic like mapquest.com
> or something would shapefiles be faster than PostGIS?
Properly laid out shape files, with a tiling system and spatial indexes,
should always be faster. If your data changes regularly such a layout
might not be managable however, or might be most easily managed with a
hybrid system (store working data in PostGIS and snap out a copy for
mapserver to read-and-render from on a nightly basis).
> Sorry if this seems irrelevant or silly line of questions. I just have a
> conflict between the convenience / queryability of PostGIS and the speed
of
> shapefiles.
It is not irrelevant at all. Understanding the tradeoffs (and there
*are* tradeoffs for both options) is the core of good systems design. I
hope I have provided some useful information.
Paul
--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey at refractions.net
| Phone: (250) 885-0632
\_
_______________________________________________
Mapserver-users mailing list
Mapserver-users at lists.gis.umn.edu
http://lists.gis.umn.edu/mailman/listinfo/mapserver-users
More information about the MapServer-users
mailing list