[postgis-users] Spatial Join Index Use

Thu Jan 7 11:45:45 PST 2021

Hi everyone,

I have GPS trajectory data in LineStringM format (GeoLife data formatted as
described in Anita Graser's blog post). I am trying to find trajectories
that are close to each other via ST_DWithin and ST_CPAWithin.

I have pre-filtered the trajectories with respect to time-ranges
(tstzrange) and saved results in another table named `tf`. This table has
two fields id1 and id2 corresponding to the pairs that satisfy the temporal
filtering (overlap via &&) requirements.

Then I apply ST_DWithin to find trajectories within 0.1 degrees of each
other. Ideally this would be the second pre-filtering step before finally
refining with the spatiotemporal function CPAWithin. However, I cannot seem
to get postgis to use the available indexes. I ran the following query:

EXPLAIN ANALYZE
SELECT t1.id, t2.id
FROM tf
JOIN traj2 t1 ON t1.id = tf.id1
JOIN traj2 t2 ON t2.id = tf.id2
WHERE ST_DWithin(t1.track, t2.track,0.1)

I have the following indexes on the GeoLife trajectory table called `traj2`:
Indexes:
    "idx_traj2_ep_geog" gist (geography(end_point))
    "idx_traj2_ep_geom" gist (end_point)
    "idx_traj2_id" btree (id)
    "idx_traj2_sp_geog" gist (geography(start_point))
    "idx_traj2_sp_geom" gist (start_point)
    "idx_traj2_time_range" gist (time_range)
    "idx_traj2_time_track2d" gist (track)
    "idx_traj2_time_track2d_geog" gist (geography(track))
    "idx_traj2_time_track3d" gist (track gist_geometry_ops_nd)
    "idx_traj2_tracker" btree (tracker)
Access method: heap
Options: parallel_workers=8

and the following on `tf`:
Indexes:
    "idx_tf_id1" btree (id1)
    "idx_tf_id2" btree (id2)
Access method: heap
Options: parallel_workers=8

I have run VACUUM ANALYZE before running EXPLAIN ANALYZE and the results
are as follows:

"Gather  (cost=38404.10..557039.81 rows=23555 width=8) (actual
time=182.494..362202.326 rows=48253 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Hash Join  (cost=37404.10..553684.31 rows=9815 width=8)
(actual time=217.140..361763.181 rows=16084 loops=3)"
"        Hash Cond: (tf.id1 = t1.id)"
"        Join Filter: st_dwithin(t1.track, t2.track, '0.1'::double
precision)"
"        Rows Removed by Join Filter: 4931"
"        ->  Parallel Hash Join  (cost=870.07..7265.31 rows=26270
width=32026) (actual time=3.649..10.452 rows=21016 loops=3)"
"              Hash Cond: (t2.id = tf.id2)"
"              ->  Parallel Seq Scan on traj2 t2  (cost=0.00..6003.79
rows=7779 width=32022) (actual time=0.003..2.695 rows=6223 loops=3)"
"              ->  Parallel Hash  (cost=541.70..541.70 rows=26270 width=8)
(actual time=3.452..3.452 rows=21016 loops=3)"
"                    Buckets: 65536  Batches: 1  Memory Usage: 3008kB"
"                    ->  Parallel Seq Scan on tf  (cost=0.00..541.70
rows=26270 width=8) (actual time=0.008..0.930 rows=21016 loops=3)"
"        ->  Parallel Hash  (cost=6003.79..6003.79 rows=7779 width=32022)
(actual time=132.408..132.408 rows=6223 loops=3)"
"              Buckets: 4096  Batches: 16  Memory Usage: 2688kB"
"              ->  Parallel Seq Scan on traj2 t1  (cost=0.00..6003.79
rows=7779 width=32022) (actual time=121.500..123.018 rows=6223 loops=3)"
"Planning Time: 3.335 ms"
"JIT:"
"  Functions: 60"
"  Options: Inlining true, Optimization true, Expressions true, Deforming
true"
"  Timing: Generation 3.102 ms, Inlining 79.410 ms, Optimization 173.707
ms, Emission 110.965 ms, Total 367.184 ms"
"Execution Time: 362206.853 ms"

My current settings are
shared_buffers: 2GB, work_mem: 64MB, maintenance_work_mem: 512MB,
effective_cache_size: 4GB.
(I am on a Ubuntu 18.04 laptop machine with 16 GB RAM and i7 2.30 GHz 16
core CPU) (I have the following postgis installation: POSTGIS="3.1.0
5e2af69" [EXTENSION] PGSQL="120" GEOS="3.7.1-CAPI-1.11.1 27a5e771"
PROJ="Rel. 4.9.3, 15 August 2016" LIBXML="2.9.4" LIBJSON="0.12.1"
LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" )

I have also tried setting STORAGE EXTERNAL as per Paul Ramsay's blog post
on spatial joins to no avail.

As a final note, I have SET parallel_workers=8 but query planner seems to
be stuck with 2 workers launched/planned.

I am fairly new to GIS and postgis altogether, so I might be missing
something very simple and straightforward. I would be grateful if you guys
can point me in the right direction.

Best regards,
Temir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210107/ace0841a/attachment.html>