[mapserver-commits] r8053 - trunk/docs/howto

Sun Nov 23 14:19:35 EST 2008

Author: hobu
Date: 2008-11-23 14:19:35 -0500 (Sun, 23 Nov 2008)
New Revision: 8053

Modified:
   trunk/docs/howto/optimizevector.txt
Log:
convert to ReST.  Thanks crschmidt\!

Modified: trunk/docs/howto/optimizevector.txt
===================================================================

--- trunk/docs/howto/optimizevector.txt	2008-11-23 18:48:14 UTC (rev 8052)
+++ trunk/docs/howto/optimizevector.txt	2008-11-23 19:19:35 UTC (rev 8053)
@@ -1 +1,92 @@
-<h2>Splitting your data</h2>If you find yourself making several layers, all of them using the same dataset but filtering to only usse some of the records, you could probably do it better. If the criteria are static, one approach is to pre-split the data.<br /><br />The <i>ogr2ogr</i> utility can select on certain features from a datasource, and save them to a new data source. Thus, you can split your dataset into several smaller ones that are already effectively filtered, and remove the FILTER statement.<br /><br /><br /><h2>Shapefiles</h2>Use <i>shptree</i> to generate a spatial index on your shapefile. This is quick and easy ("shptree foo.shp") and generates a .qix file. Mapserver will automagically detect an index and use it.<br /><br />Note: Tileindex shapefiles can be indexed with shptree.<br /><br />Mapserver also comes with the <i>sortshp</i> utility. This reorganizes a shapefile, sorting it according to the values in one of its columns. If you're commonly filtering by criteria and it's almost always by a specific column, this can make the process slightly more efficient.<br /><br />Although shapefiles are a very fast data format, PostGIS is pretty speedy as well, especially if you use indexes well and have memory to throw at caching.<br /><br /><br /><h2>PostGIS</h2><p>The single biggest boost to performance is indexing. Make sure that there's a GIST index on the geometry column, and each record should also have an indexed primary key. If you used shp2pgsql, then these statements should create the necessary indexes:<br /></p><pre>ALTER TABLE table ADD PRIMARY KEY (gid);</pre><pre>CREATE INDEX table_the_geom ON table (the_geom) USING GIST;</pre><br />PostgreSQL also supports reorganizing the data in a table, such that it's physically sorted by the index. This allows PostgreSQL to be much more efficient in reading the indexed data. Use the CLUSTER command, e.g.<br /><br /><pre>CLUSTER the_geom ON table;</pre><br />Then there are numerous optimizations one can perform on the database server itself, aside from the geospatial component. The easiest is to increase <i>max_buffers</i> in the <i>postgresql.conf</i> file, which allows PostgreSQL to use more memory for caching. More information can be found at the &lt;a href="http://www.postgresql.org/"&gt;PostgreSQL website&lt;/a&gt;<br /><br /><br /><h2>Databases in General (PostGIS, Oracle, MySQL)</h2>By default, Mapserver opens and closes a new database connection for each database-driven layer in the mapfile. If you have several layers reading from the same database, this doesn't make a lot of sense. And with some databases (Oracle) establishing connections takes enough time that it can become significant.<br /><br /><br />Try adding this line to your database layers:<br /><pre>PROCESSING "CLOSE_CONNECTION=DEFER"</pre><br />This causes Mapserver to not close the database connection for each layer until after it has finished processing the mapfile (Why is this not the default? Who knows?) and this may shave a few seconds off of map generation times.<br /><br />
\ No newline at end of file
+*****************************************************************************
+ Optimizing vector data sources 
+*****************************************************************************
+
+:Author:        HostGIS 
+:Revision: $Revision$
+:Date: $Date$
+:Last Updated: 2008/08/08
+
+.. sectnum::
+
+.. contents:: Table of Contents
+    :depth: 2
+    :backlinks: top
+
+
+Splitting your data
+-------------------
+If you find yourself making several layers, all of them using the same dataset
+but filtering to only use some of the records, you could probably do it
+better. If the criteria are static, one approach is to pre-split the data.
+
+The *ogr2ogr* utility can select on certain features from a datasource, and
+save them to a new data source. Thus, you can split your dataset into several
+smaller ones that are already effectively filtered, and remove the FILTER
+statement.
+
+Shapefiles
+----------
+Use *shptree* to generate a spatial index on your shapefile. This is quick and
+easy ("shptree foo.shp") and generates a .qix file. Mapserver will
+automagically detect an index and use it.
+
+Note: Tileindex shapefiles can be indexed with shptree.
+
+MapServer also comes with the *sortshp* utility. This reorganizes a shapefile,
+sorting it according to the values in one of its columns. If you're commonly
+filtering by criteria and it's almost always by a specific column, this can
+make the process slightly more efficient.
+
+Although shapefiles are a very fast data format, PostGIS is pretty speedy as
+well, especially if you use indexes well and have memory to throw at caching.
+
+PostGIS
+-------
+
+The single biggest boost to performance is indexing. Make sure that there's a
+GIST index on the geometry column, and each record should also have an indexed
+primary key. If you used shp2pgsql, then these statements should create the
+necessary indexes:
+
+:: 
+
+  ALTER TABLE table ADD PRIMARY KEY (gid);
+  CREATE INDEX table_the_geom ON table (the_geom) USING GIST;
+
+PostgreSQL also supports reorganizing the data in a table, such that it's
+physically sorted by the index. This allows PostgreSQL to be much more
+efficient in reading the indexed data. Use the CLUSTER command, e.g.
+
+:: 
+
+    CLUSTER the_geom ON table;
+
+Then there are numerous optimizations one can perform on the database server
+itself, aside from the geospatial component. The easiest is to increase
+*max_buffers* in the *postgresql.conf* file, which allows PostgreSQL to use
+more memory for caching. More information can be found at the `PostgreSQL
+website`_.
+
+Databases in General (PostGIS, Oracle, MySQL)
+---------------------------------------------
+
+By default, Mapserver opens and closes a new database connection for each
+database-driven layer in the mapfile. If you have several layers reading from
+the same database, this doesn't make a lot of sense. And with some databases
+(Oracle) establishing connections takes enough time that it can become
+significant.
+
+Try adding this line to your database layers:
+
+:: 
+
+    PROCESSING "CLOSE_CONNECTION=DEFER"
+
+This causes Mapserver to not close the database connection for each layer until
+after it has finished processing the mapfile and this may shave a few seconds
+off of map generation times.
+
+.. #### rST Link Section ####
+
+.. _`PostgreSQL website`: http://www.postgresql.org/