[postgis-tickets] [SCM] PostGIS branch master updated. 3.1.0alpha3-43-ga8534df

git at osgeo.org git at osgeo.org
Tue Dec 8 10:08:21 PST 2020


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "PostGIS".

The branch, master has been updated
       via  a8534dfd6a05a507bd948d6e25f1680f36230c2a (commit)
      from  b5b5c303b1fba5f146982e152b971798df443e7f (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit a8534dfd6a05a507bd948d6e25f1680f36230c2a
Author: Martin Davis <mtnclimb at gmail.com>
Date:   Tue Dec 8 10:08:14 2020 -0800

    Improve Doc Building Indexes

diff --git a/doc/using_postgis_dataman.xml b/doc/using_postgis_dataman.xml
index b1085b8..8e6e935 100644
--- a/doc/using_postgis_dataman.xml
+++ b/doc/using_postgis_dataman.xml
@@ -2037,7 +2037,7 @@ WHERE
 	<sect3>
 	  <title>Using the Shapefile Dumper</title>
 
-	  <para>The <filename>pgsql2shp</filename> table dumper connects directly
+	  <para>The <filename>pgsql2shp</filename> table dumper connects
 	  to the database and converts a table (possibly defined by a query) into
 	  a shape file. The basic syntax is:</para>
 
@@ -2135,27 +2135,47 @@ WHERE
 	<title>Building Spatial Indexes</title>
 
 	<para>Indexes make using a spatial database for large data sets
-	possible. Without indexing, any search for a feature would require a
+	possible. Without indexing, a search for features would require a
 	sequential scan of every record in the database. Indexing speeds up
-	searching by organizing the data into a search tree which can be quickly
-	traversed to find a particular record. PostgreSQL supports sevaral kinds of
-	spatial indexes: B-Tree indexes, GiST, BRIN and SP-GiST indexes.</para>
+	searching by organizing the data into a structure which can be quickly
+	traversed to find records.
+    </para>
+    <para>The B-tree index method commonly used for attribute data
+    is not very useful for spatial data, since it only supports storing and querying
+    data in a single dimension.
+    Data such as geometry which has 2 or more dimensions)
+    requires an index method that supports range query across all the data dimensions.
+    (That said, it is possible to effectively index so-called XY data using a B-tree
+    and explict range searches.)
+    One of the main advantages of PostgreSQL for spatial data handling is that it offers several kinds of
+	indexes which work well for multi-dimensional data: GiST, BRIN and SP-GiST indexes.</para>
 
 	<itemizedlist>
 	  <listitem>
-		<para>B-Trees are used for data which can be sorted along one axis;
-		for example, numbers, letters, dates. Spatial data can be sorted along
-		a space-filling curve, Z-order curve or Hilbert curve.
-        However, this kind of index does not allow speeding up spatial operations
-        which involve comparisons of spatial extent. </para>
-	  </listitem>
-
-	  <listitem>
-		<para>GiST (Generalized Search Trees) indexes break up data into
+		<para><emphasis role="bold">GiST (Generalized Search Tree)</emphasis> indexes break up data into
 		"things to one side", "things which overlap", "things which are
 		inside" and can be used on a wide range of data-types, including GIS
 		data. PostGIS uses an R-Tree index implemented on top of GiST to index
-		GIS data.</para>
+		spatial data. GiST is the most commonly-used and versatile spatial index method,
+        and offers very good query performance.
+        </para>
+	  </listitem>
+
+	  <listitem>
+		<para><emphasis role="bold">BRIN (Block Range Index)</emphasis> indexes operate by summarizing
+        the spatial extent of ranges of table records.
+        Search is done via a scan of the ranges.
+        BRIN is only appropriate for use for some kinds of data
+        (spatially sorted, with infrequent or no update).
+        But it provides much faster index create time, and much smaller index size.
+        </para>
+	  </listitem>
+
+	  <listitem>
+		<para><emphasis role="bold">SP-GiST (Space-Partitioned Generalized Search Tree)</emphasis>
+        is a generic index method that supports partitioned search trees
+        such as quad-trees, k-d trees, and radix trees (tries).
+        </para>
 	  </listitem>
 	</itemizedlist>
 
@@ -2192,74 +2212,142 @@ WHERE
 	</sect3>
 
 	<sect3 id="brin_indexes">
-	  <title>BRIN Indexes</title>
-
-	    <para>BRIN stands for "Block Range Index" and is a generic form of
-	    indexing that has been introduced in PostgreSQL 9.5. BRIN is a lossy kind
-	    of index, and its main usage is to provide a compromise for both read and
-	    write performance. Its primary goal is to handle very large tables for
-	    which some of the columns have some natural correlation with their
-	    physical location within the table. In addition to GIS indexing, BRIN is
-	    used to speed up searches on various kinds of regular or irregular data
-	    structures (integer, arrays etc).</para>
-
-	    <para>Once a GIS data table exceeds a few thousand rows, you will want
-	    to build an index to speed up spatial searches of the data (unless all
-	    your searches are based on attributes, in which case you'll want to
-      build a normal index on the attribute fields). GiST indexes are really
-      performant as long as their size doesn't exceed the amount of RAM
-      available for the database, and as long as you can afford the storage
-      size, and the penalty in write workload. Otherwise, BRIN index can be
-      considered as an alternative. </para>
-
-      <para>The idea of a BRIN index is to store only the bounding box enclosing
-      all the geometries contained in all the rows in a set of table blocks,
-      called a range.  Obviously, this indexing method will only be efficient
-      if the data is physically ordered in a way where the resulting bounding
-      boxes for block ranges will be mutually exclusive. The resulting index
-      will be really small, but will be less efficient than a GiST index in
-      many cases.</para>
-
-	  <para>Building a BRIN index is way less intensive than building a GiST
-	  index. It's quite common to build a BRIN index ten times faster
-	  than a GiST index would have required. As a BRIN index only stores one
-	  bounding box for one to many table blocks, it's pretty common to consume
-	  up to a thousand times less disk space for this kind of index.</para>
-
-      <para>You can choose the number of blocks to summarize in a range. If you
-      decrease this number, the index will be bigger but will probably help to
-      get better performance.</para>
-
-	  <para>The syntax for building a BRIN index on a "geometry" column is as
-	  follows:</para>
+	<title>BRIN Indexes</title>
+
+    <para>BRIN stands for "Block Range Index". It is an general-purpose
+    <ulink url="https://www.postgresql.org/docs/current/brin.html">index method</ulink> introduced in PostgreSQL 9.5.
+    BRIN is a <emphasis>lossy</emphasis>
+    index method, meaning that a a secondary check is required to confirm
+    that a record matches a given search condition
+    (which is the case for all provided spatial indexes).
+    It provides much faster index creation and much smaller index size,
+    with reasonable read performance.
+    Its primary purpose is to support indexing very large tables
+    on columns which have a correlation with their
+    physical location within the table. In addition to spatial indexing,
+    BRIN can speed up searches on various kinds of attribute data
+    structures (integer, arrays etc).</para>
+
+    <para>Once a spatial table exceeds a few thousand rows, you will want
+    to build an index to speed up spatial searches of the data.
+    GiST indexes are very performant as long as their size doesn't exceed the amount of RAM
+    available for the database, and as long as you can afford the index storage
+    size, and the cost of index update on write. Otherwise, for very large tables BRIN index can be
+    considered as an alternative.</para>
+
+    <para>A BRIN index stores the bounding box enclosing
+    all the geometries contained in the rows in a contiguous set of table blocks,
+    called a <emphasis>block range</emphasis>.
+    When executing a query using the index the block ranges are scanned to
+    find the ones that intersect the query extent.
+    This is efficient only if the data is physically ordered so that the bounding
+    boxes for block ranges have minimal overlap (and ideally are mutually exclusive).
+    The resulting index is very small in size,
+    but is typically less performant for read than a GiST index over the same data.</para>
+
+    <para>Building a BRIN index is much less CPU-intensive than building a GiST index.
+    It's common to find that a BRIN index is ten times faster to build
+    than a GiST index over the same data. And because a BRIN index stores only one
+    bounding box for each range of table blocks, it's common to use
+    up to a thousand times less disk space than a GiST index.</para>
+
+    <para>You can choose the number of blocks to summarize in a range. If you
+    decrease this number, the index will be bigger but will probably provide
+    better performance.</para>
+
+    <para>For BRIN to be effective, the table data should be stored in
+    a physical order which minimizes the amount of block extent overlap.
+    It may be that the data is already sorted appropriately
+    (for instance, if it is loaded from another dataset that is already sorted in spatial order).
+    Otherwise, this can be accomplished by sorting the data by a one-dimensional spatial key.
+    One way to do this is to create a new table sorted by the geometry values
+    (which in recent PostGIS versions uses an efficient Hilbert curve ordering):
+    </para>
+
+    <para><programlisting>
+CREATE TABLE table_sorted AS
+   SELECT * FROM table  ORDER BY geom;
+</programlisting></para>
+
+    <para>Alternatively, data can be sorted in-place by using a GeoHash as a (temporary) index,
+    and clustering on that index:
+    </para>
+
+    <para><programlisting>
+CREATE INDEX idx_temp_geohash ON table
+    USING btree (ST_GeoHash( ST_Transform( geom, 4326 ), 20));
+CLUSTER table USING idx_temp_geohash;
+</programlisting></para>
+
+
+    <para>The syntax for building a BRIN index on a <code>geometry</code> column is:</para>
+
+    <para><programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geome_col] ); </programlisting></para>
+
+    <para>The above syntax builds a 2D index.  To build a 3D-dimensional index, use this syntax:</para>
+
+    <programlisting>
+CREATE INDEX [indexname] ON [tablename]
+    USING BRIN ([geome_col] brin_geometry_inclusion_ops_3d);</programlisting>
+
+    <para>You can also get a 4D-dimensional index using the 4D operator class:</para>
+
+    <programlisting>
+CREATE INDEX [indexname] ON [tablename]
+    USING BRIN ([geome_col] brin_geometry_inclusion_ops_4d);</programlisting>
+
+    <para>The above commands use the default number of blocks in a range, which is 128.
+    To specify the number of blocks to summarise in a range, use this syntax</para>
+
+    <para><programlisting>
+CREATE INDEX [indexname] ON [tablename]
+    USING BRIN ( [geome_col] ) WITH (pages_per_range = [number]); </programlisting></para>
+
+    <para>Keep in mind that a BRIN index only stores one index
+    entry for a large number of rows.  If your table stores geometries with
+    a mixed number of dimensions, it's likely that the resulting index will
+    have poor performance.  You can avoid this performance penalty by
+    choosing the operator class with the least number of dimensions of the
+    stored geometries
+    </para>
+
+    <para>The <code>geography</code> datatype is supported for BRIN indexing. The
+    syntax for building a BRIN index on a geography column is:</para>
+
+    <para><programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geog_col] ); </programlisting></para>
+
+    <para>The above syntax builds a 2D-index for geospatial objects on the spheroid. </para>
+
+    <para>Currently, only "inclusion support" is provided, meaning
+    that just the <varname>&&</varname>, <varname>~</varname> and
+    <varname>@</varname> operators can be used for the 2D cases (for both
+    <code>geometry</code> and <code>geography</code>), and just the <varname>&&&</varname>
+    operator for 3D geometries.
+    There is currently no support for kNN searches.</para>
+
+    <para>An important difference between BRIN and other index types is that the database does not
+    maintain the index dynamically.  Changes to spatial data in the table
+    are simply appended to the end of the index.  This will cause index search performance to
+    degrade over time.  The index can be updated by performing a <code>VACUUM</code>,
+    or by using a special function <code>brin_summarize_new_values(regclass)</code>.
+    For this reason BRIN may be most appropriate for use with data that is read-only,
+    or only rarely changing. For more information refer to the
+    <ulink url="https://www.postgresql.org/docs/current/brin-intro.html#BRIN-OPERATION">manual</ulink>.
+    </para>
+
+    <para>To summarize using BRIN for spatial data:
+    </para>
+
+    <itemizedlist>
+    <listitem><para>Index build time is very fast, and index size is very small.</para></listitem>
+    <listitem><para>Index query time is slower than GiST, but can still be very acceptable.</para></listitem>
+    <listitem><para>Requires table data to be sorted in a spatial ordering.</para></listitem>
+    <listitem><para>Requires manual index maintenance.</para></listitem>
+    <listitem><para>Most appropriate for very large tables,
+    with low or no overlap (e.g. points),
+    and which are static or change infrequently.</para></listitem>
+   </itemizedlist>
 
-	  <para><programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geometryfield] ); </programlisting></para>
-	  <para>The above syntax will always build a 2D-index.  To get a 3D-dimensional index, you can create one using this syntax</para>
-	  <programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ([geometryfield] brin_geometry_inclusion_ops_3d);</programlisting>
-      <para>You can also get a 4D-dimensional index using the 4D operator class</para>
-	  <programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ([geometryfield] brin_geometry_inclusion_ops_4d);</programlisting>
-      <para>These above syntaxes will use the default number or block in a range, which is 128. To specify the number of blocks you want to summarise in a range, you can create one using this syntax</para>
-      <para><programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geometryfield] ) WITH (pages_per_range = [number]); </programlisting></para>
-      <para>Also, keep in mind that a BRIN index will only store one index
-        value for a large number of rows.  If your table stores geometries with
-        a mixed number of dimensions, it's likely that the resulting index will
-        have poor performance.  You can avoid this drop of performance by
-        choosing the operator class whith the least number of dimensions of the
-        stored geometries
-      </para>
-
-          <para>Also the "geography" datatype is supported for BRIN indexing. The
-          syntax for building a BRIN index on a "geography" column is as follows:</para>
-
-          <para><programlisting>CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geographyfield] ); </programlisting></para>
-          <para>The above syntax will always build a 2D-index for geospatial objects on the spheroid. </para>
-
-          <para>Currently, just the "inclusion support" is considered here, meaning
-          that just <varname>&&</varname>, <varname>~</varname> and
-          <varname>@</varname> operators can be used for the 2D cases (both for
-          "geometry" and for "geography"), and just the <varname>&&&</varname>
-          operator can be used for the 3D geometries. There is no support
-          for kNN searches at the moment.</para>
 	</sect3>
 
 	<sect3 id="spgist_indexes">
@@ -2274,7 +2362,7 @@ WHERE
 		routing, ip routing, substring search, etc. </para>
 
     <para>As it is the case for GiST indexes, SP-GiST indexes are lossy, in the
-		sense that they store the bounding box englobing the spatial objects.
+		sense that they store the bounding box enclosing spatial objects.
 		SP-GiST indexes can be considered as an alternative to GiST indexes. The
 		performance tests reveal that SP-GiST indexes are especially beneficial
 		when there are many overlapping objects, that is, with so-called

-----------------------------------------------------------------------

Summary of changes:
 doc/using_postgis_dataman.xml | 254 ++++++++++++++++++++++++++++--------------
 1 file changed, 171 insertions(+), 83 deletions(-)


hooks/post-receive
-- 
PostGIS


More information about the postgis-tickets mailing list