Pushing bbox and filter constraints into PostGIS subquery
Stephen Woodbridge
woodbri at SWOODBRIDGE.COM
Fri Sep 21 20:50:16 EDT 2007
Steve and Steve,
I already did something like this for Oracle spatial but
fsimon at univali.br has not applied the patch to svn yet.
http://trac.osgeo.org/mapserver/ticket/2129
It is critical the we allow additional control over these things in out
database connection. In the Oracle case (check the query the DBA came up
with) we got a very significant performance improvement after I applied
the patch attached to the ticket.
THis is a generic implementation that also could be applied to PostGIS.
And I would like to see the filter extension applied to both database
providers.
-Steve
Steve Lime wrote:
> Hi Steve: We love suggestions where folks offer to roll up their sleeves!
>
> Personally I think it's a necessary next step (pushing filter conditions to the RDBMS)
> for OGC filter support. I've not thought about implementation but would like to see
> support for all supported databases (PostGIS, Oracle and SDE). Getting it working
> for one would likely serve for a model for others. I've been kicking the idea around
> to move CGI attribute queries to use OGC filters so a modification like this would have
> benefits beyond just WFS (MapScript would benefit too). Assefa should really have to
> chime in though since the filter support was his.
>
> Perhaps the next step is to start putting an RFC together?
>
> Steve
>
>>>> Steve Marshall <smarshall at WSI.COM> 09/21/07 1:14 PM >>>
> I have a suggestion for improving performance of PostGIS queries for layers
> that use subselect statements. I don't see anything in the archives related
> to this issue, so I thought I'd post the idea and see if anyone has any
> thoughts about it.
>
> The PostGIS DATA parameter allows users to specify data sources as either a
> single table or as a sub-query that may join several tables. Queries are
> then built from this DATA statement into SQL of the following form
> (simplified from mappostgis.c code to avoid some minor flavor differences):
>
> DECLARE mycursor BINARY CURSOR FOR
> SELECT <columns_wanted>
> FROM <data_source>
> WHERE (<filter_constraints>) AND (<geom_column> && <bounding_box>)
>
> In the case where <data_source> is a simple table, the filter constraints
> and bounding box constraints are used in the initial query planning.
> However, if the <data_source> is a subquery, the query ends up with this form:
>
> DECLARE mycursor BINARY CURSOR FOR
> SELECT <columns_wanted>
> FROM (<sub_query>) AS subquery
> WHERE (<filter_constraints>) AND (<geom_column> && <bounding_box>)
>
> The result is that the subselect is executed without the filter_constraints
> or the bounding box constraints. These constraints are applied to the
> results of the subquery as a secondary step.
>
> While this process gives the correct results, it can lead to suboptimal
> performance. For example, if one is querying for data in a small region
> from a global dataset, the bounding box constraint could greatly reduce the
> number of rows that need to be inspected. In general, the PostgreSQL query
> engine will nearly always do a better job of optimization if more
> constraints can be given to the initial query.
>
> I was wondering if anyone had considered trying to push the filter and
> bounding box constraints inside of the subquery? This would require
> spliting the subquery into pieces and merging in the filter and bounding box
> constraints, resulting in something like this:
>
> DECLARE mycursor BINARY CURSOR FOR
> SELECT <columns_wanted>
> FROM (<subquery_select_columns_from_tables>
> WHERE (<filter_constraints>) AND (<geom_column> && <bounding_box>)
> AND <subquery_constraints_order_and_limit>) as subquery
>
> Here <subquery_select_columns_from_tables> would have the beginning part of
> the subquery (i.e. SELECT, FROM, and all the table joins), and
> <subquery_constraints_order_and limit> would the end of the subquery (e.g.
> any WHERE-clause constraints, ORDER BY, LIMIT, and OFFSET clauses)
>
> I could see that this might not work properly if the subquery had a "GROUP
> BY" statement. In such cases, the original filter and bounding box would be
> operating on the aggregated rows, while this new statement would operated on
> pre-aggegated rows. However, we could circumvent this optimization if we
> detect the presence of a GROUP BY clause in the subquery. We could do
> similar checks to avoid other troublesome SQL, to ensure optmized and
> unoptimized queries always return the same results.
>
> Is there some reason that pushing the filtering into the subquery would be a
> bad thing to do? Looking at the code in mappostgis.c (prepare_database
> function), it does not look like it would be too hard to implement. Are
> there other insertion points that would need to be supported?
>
> If the community is interested in this enhancement, I'd be willing to take a
> crack at it.
>
> Steve Marshall
More information about the mapserver-dev
mailing list