[postgis-users] Query runtime comparison for two datasets using '&&' geometric operator

Paul Ramsey pramsey at cleverelephant.ca
Tue Sep 29 13:51:08 PDT 2009


Your rows in the second query do look a lot wider.

(a) if you can, upgrade to the latest pgsql 8.4 and postgis 1.4,
there's big speed benefits to both
(b) if you have even a moderately recent postgis, try using

st_intersects(a.the_geom, b.the_geom)

instead of && and distance() = 0

again, there's more power to be had there.

Paul

On Tue, Sep 29, 2009 at 1:35 PM, Ted Spradley <tspradley at snoogems.com> wrote:
>
> Hi All!
>
> An elementary question from a new user.  Please be patient with my
> understanding!
>
> I am running the well known query to find adjacent polygons that first
> compares
> polygon bounding boxes, then uses the distance function to (I think) compare
> the
> actual polygon vertices of the result of the bounding box portion of the
> query.
>
> Query_1 and Query_2 below are, I believe, identical queries. Query_1 has a
> runtime
> of 71.423 ms, and Query_2 a runtime of 36236.674 ms.
>
> What is the difference between the two queries?  Is the resolution of
> Query_1.data,
> 0.000278, a resolution producing roughly twice as many points as the
> resolution of
> Query_2.data, 0.000458 or do I have that backward?
>
> I have found references in the archives to the width of a row.  Is this
> likely referring
> to the width of the geometry column, meaning number of coordinate pairs?
> How does
> one do a count query on a column to find the number of pairs that a row
> contains?
>
> Thank you!
> Ted S.
>
> Thank you
>
>
> ##################      Query 1   ####################################
>
> Data:
> Spatial_Reference_Information:
>  Horizontal_Coordinate_System_Definition:
>    Geographic:
>      Latitude_Resolution: 0.000278
>      Longitude_Resolution: 0.000278
>      Geographic_Coordinate_Units: Decimal degrees
>    Geodetic_Model:
>      Horizontal_Datum_Name: North American Datum of 1983
>      Ellipsoid_Name: GRS1980
>      Semi-major_Axis: 6378137
>      Denominator_of_Flattening_Ratio: 298.257222
>
> Query:
> project1=# SELECT DISTINCT a.county, a.state
> FROM countyp020 AS a, countyp020 AS b
> WHERE b.county='Bexar County'
> AND b.state='TX'
> AND a.wkb_geometry && b.wkb_geometry
> AND distance(a.wkb_geometry, b.wkb_geometry) = 0;
>                       county                       | state
> ----------------------------------------------------+-------
>  Comal County                                       | TX
>  Bandera County                                     | TX
>  Guadalupe County                                   | TX
>  Atascosa County                                    | TX
>  Wilson County                                      | TX
>  Kendall County                                     | TX
>  Medina County                                      | TX
>  Bexar County                                       | TX
> (8 rows)
>
> Query Analyse Explain:
> project1=# EXPLAIN ANALYSE SELECT DISTINCT a.county, a.state
> FROM countyp020 AS a, countyp020 AS b
> WHERE b.county='Bexar County'
> AND b.state='TX'
> AND a.wkb_geometry && b.wkb_geometry
> AND distance(a.wkb_geometry, b.wkb_geometry) = 0;
>                                                                    QUERY
> PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=20.23..20.24 rows=1 width=45) (actual
> time=70.894..70.902 rows=8 loops=1)
>   ->  Nested Loop  (cost=4.27..20.22 rows=1 width=45) (actual
> time=17.118..70.697 rows=8 loops=1)
>         Join Filter: (distance(a.wkb_geometry, b.wkb_geometry) = 0::double
> precision)
>         ->  Bitmap Heap Scan on countyp020 b  (cost=4.27..11.94 rows=1
> width=8285) (actual time=0.080..0.083 rows=1 loops=1)
>               Recheck Cond: (county = 'Bexar County'::bpchar)
>               Filter: (state = 'TX'::bpchar)
>               ->  Bitmap Index Scan on idx_countyp020_county
> (cost=0.00..4.27 rows=2 width=0) (actual time=0.062..0.062 rows=1 loops=1)
>                     Index Cond: (county = 'Bexar County'::bpchar)
>         ->  Index Scan using countyp020_geom_idx on countyp020 a
> (cost=0.00..8.27 rows=1 width=8330) (actual time=9.529..10.248 rows=10
> loops=1)
>               Index Cond: (a.wkb_geometry && b.wkb_geometry)
>  Total runtime: 71.423 ms
> (11 rows)
> ##################      End Query 1   #################################
>
> ##################      Query 2   ####################################
>
> Data:
>        <spref>
>                <horizsys>
>                        <geograph>
>                                <latres>0.000458</latres>
>                                <longres>0.000458</longres>
>                                <geogunit>Decimal degrees</geogunit>
>                        </geograph>
>                <geodetic>
>                                <horizdn>North American Datum of 1983 in the 48 contiguous states, the
> District of Columbia, Alaska, Hawaii, Puerto Rico, the Virgin Islands of the
> United States, and the Pacific Island Areas.</horizdn>
>                                <ellips>Geodetic Reference System 80</ellips>
>                                <semiaxis>6378137</semiaxis>
>                                <denflat>298257</denflat>
>                        </geodetic>
>                </horizsys>
>        </spref>
>
> Query:
> tiger=# SELECT DISTINCT a.name, a.statefp
> FROM tl_2008_us_county AS a, tl_2008_us_county AS b
> WHERE b.name='Bexar'
> AND b.statefp='48'
> AND a.the_geom && b.the_geom
> AND distance(a.the_geom, b.the_geom) = 0;
>   name    | statefp
> -----------+---------
>  Bandera   | 48
>  Guadalupe | 48
>  Wilson    | 48
>  Atascosa  | 48
>  Kendall   | 48
>  Bexar     | 48
>  Comal     | 48
>  Medina    | 48
> (8 rows)
>
> Query Explain Analyse:
> tiger=# EXPLAIN ANALYSE SELECT DISTINCT a.name, a.statefp
> FROM tl_2008_us_county AS a, tl_2008_us_county AS b
> WHERE b.name='Bexar'
> AND b.statefp='48'
> AND a.the_geom && b.the_geom
> AND distance(a.the_geom, b.the_geom) = 0;
>
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=16.56..16.57 rows=1 width=11) (actual
> time=36236.168..36236.176 rows=8 loops=1)
>   ->  Nested Loop  (cost=0.00..16.56 rows=1 width=11) (actual
> time=3.990..36236.075 rows=8 loops=1)
>         Join Filter: (distance(a.the_geom, b.the_geom) = 0::double
> precision)
>         ->  Index Scan using indx_tl_county_name on tl_2008_us_county b
> (cost=0.00..8.27 rows=1 width=154180) (actual time=0.060..0.064 rows=1
> loops=1)
>               Index Cond: ((name)::text = 'Bexar'::text)
>               Filter: ((statefp)::text = '48'::text)
>         ->  Index Scan using indx_tl_county_geo on tl_2008_us_county a
> (cost=0.00..8.27 rows=1 width=154191) (actual time=3.259..5.932 rows=10
> loops=1)
>               Index Cond: (a.the_geom && b.the_geom)
>  Total runtime: 36236.674 ms
> (9 rows)
>
> tiger=# SET enable_seqscan TO off;              ... didn't help
>
> ##################      End Query 2   #################################
>
> -----
> PGSQL 8.2.4 / (PostGIS Version 1.3 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 )
> Proj4 -4.6.1 / gdal 1.6.1 / GD 2.0.34
> Linux/Apache/2.2.8 (Fedora)
> --
> View this message in context: http://www.nabble.com/Query-runtime-comparison-for-two-datasets-using-%27--%27-geometric-operator-tp25670401p25670401.html
> Sent from the PostGIS - User mailing list archive at Nabble.com.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list