[postgis-users] difference in query performance due to the inclusion of a polygon geometry field

Mon Dec 3 01:50:49 PST 2012

Dear users,
I apologize for cross-posting since I already sent this request to the
PostgreSQL Mailing List. But given the answers probably this is the
right mailing list for our problem.
 We are facing with a strange efficiency problem.

We have two tables:
1) the first largest (30000 rows) contains a timestamp field (named
"tempo_calcolo") and other fields.
2) the second contains only few rows (150), a PostGIS geometric POLYGON
field (named "the_geom") and other fields.

The query we want to execute is a join between the two table based on a
common field, filtered by the timestamp field and ordered by the same
field.

Including and excluding the PostGIS geometric POLYGON field (that is not
used neither in the joining or in the filtering) the execution times are
really different.

Without the geometric field it takes around 86 ms !!
With the geometric field it takes around 14000 ms !! 

These are the EXPLAIN ANALYSE results obtained when we perform the query
without the geometric field:
____________________________
"Sort  (cost=1217.29..1240.40 rows=9246 width=133) (actual
time=2.964..3.095 rows=1474 loops=1)"
"  Sort Key: storico_zone_allertamento.tempo_calcolo"
"  Sort Method:  quicksort  Memory: 440kB"
"  InitPlan 1 (returns $0)"
"    ->  Result  (cost=0.00..0.02 rows=1 width=0) (actual
time=0.022..0.022 rows=1 loops=1)"
"  ->  Hash Join  (cost=6.01..608.21 rows=9246 width=133) (actual
time=0.143..1.916 rows=1474 loops=1)"
"        Hash Cond: (storico_zone_allertamento.gid_fk =
zone_allertamento.gid)"
"        ->  Index Scan using
index_storico_zone_allertamento_tempo_calcolo on
storico_zone_allertamento  (cost=0.00..475.06 rows=9246 width=112)
(actual time=0.042..0.440 rows=1474 loops=1)"
"              Index Cond: (tempo_calcolo > $0)"
"        ->  Hash  (cost=4.34..4.34 rows=134 width=25) (actual
time=0.090..0.090 rows=134 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"              ->  Seq Scan on zone_allertamento  (cost=0.00..4.34
rows=134 width=25) (actual time=0.004..0.035 rows=134 loops=1)"
"Total runtime: 3.285 ms"
__________________________

These are the EXPLAIN ANALYSE results obtained when we perform the query
with the geometric field:
___________________________
"Sort  (cost=776019.29..776042.40 rows=9246 width=196111) (actual
time=3.039..3.163 rows=1474 loops=1)"
"  Sort Key: storico_zone_allertamento.tempo_calcolo"
"  Sort Method:  quicksort  Memory: 567kB"
"  InitPlan 1 (returns $0)"
"    ->  Result  (cost=0.00..0.02 rows=1 width=0) (actual
time=0.009..0.009 rows=1 loops=1)"
"  ->  Hash Join  (cost=6.01..608.21 rows=9246 width=196111) (actual
time=0.139..1.957 rows=1474 loops=1)"
"        Hash Cond: (storico_zone_allertamento.gid_fk =
zone_allertamento.gid)"
"        ->  Index Scan using
index_storico_zone_allertamento_tempo_calcolo on
storico_zone_allertamento  (cost=0.00..475.06 rows=9246 width=112)
(actual time=0.029..0.431 rows=1474 loops=1)"
"              Index Cond: (tempo_calcolo > $0)"
"        ->  Hash  (cost=4.34..4.34 rows=134 width=196003) (actual
time=0.091..0.091 rows=134 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 19kB"
"              ->  Seq Scan on zone_allertamento  (cost=0.00..4.34
rows=134 width=196003) (actual time=0.006..0.038 rows=134 loops=1)"
"Total runtime: 3.355 ms"
__________________________

It seems that the difference is primary in the "width" and then in the
"cost" of the Sort.

A similar task performed on two similar tables (both larger than the
previous ones!!) but including and excluding a PostGIS geometric POINT
(!!) field instead of POLYGON one, doesn't result in significant query
execution time difference.

Do you have please any suggestion to improve our query efficiency??

In the PostgreSQL mailing Tom Lane suggest that the problem could be due
to:
_____________________
        (1) time to fetch the wide geometry values from the table's
        TOAST table
        (2) time to convert the geometry values to text form
        (3) time to transmit the larger volume of data to the client
        (4) client-side processing time
        None of these costs are expended in an EXPLAIN ANALYZE, which is
        why the time reported for that doesn't change materially.
____________________

We think that the first is the most probable cause of the high query
execution time, because the query is a simple join( without, we think,
conversion from geometry to text), the delay is confirmed also using
psql on the server, the client side processing time should be really
small since data are really few.

Can you suggest a way to solve this problem?
Any suggestion is really appreciated !!!!

Thanks in advance

Ivan and Mauro

-- 
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: ivan.marchesini at irpi.cnr.it
        ivan.marchesini at gmail.com
fax (mailfax): +39 1782092534
jabber: geoivan73 at jabber.org
skype: geoivan73
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20121203/196fa2da/attachment.pgp>