[postgis-users] Spatial Join Index Use

Paul Ramsey pramsey at cleverelephant.ca
Thu Jan 7 11:53:03 PST 2021


Without more analysis if your data, it's hard to say, but an initial guess is that 0.1 degrees is actually quite a large distance so the st_dwithin join condition just isn't selective enough to run indexed in one of the inner filters. If you're *sure* you want it run first, you could try hauling it out into a CTE.

P

> On Jan 7, 2021, at 11:45 AM, Temir Karakurum <temirkarakurum at gmail.com> wrote:
> 
> Hi everyone,
> 
> I have GPS trajectory data in LineStringM format (GeoLife data formatted as described in Anita Graser's blog post). I am trying to find trajectories that are close to each other via ST_DWithin and ST_CPAWithin.
> 
> I have pre-filtered the trajectories with respect to time-ranges (tstzrange) and saved results in another table named `tf`. This table has two fields id1 and id2 corresponding to the pairs that satisfy the temporal filtering (overlap via &&) requirements.
> 
> Then I apply ST_DWithin to find trajectories within 0.1 degrees of each other. Ideally this would be the second pre-filtering step before finally refining with the spatiotemporal function CPAWithin. However, I cannot seem to get postgis to use the available indexes. I ran the following query:
> 
> EXPLAIN ANALYZE
> SELECT t1.id, t2.id
> FROM tf
> JOIN traj2 t1 ON t1.id = tf.id1
> JOIN traj2 t2 ON t2.id = tf.id2
> WHERE ST_DWithin(t1.track, t2.track,0.1)
> 
> I have the following indexes on the GeoLife trajectory table called `traj2`:
> Indexes:
>     "idx_traj2_ep_geog" gist (geography(end_point))
>     "idx_traj2_ep_geom" gist (end_point)
>     "idx_traj2_id" btree (id)
>     "idx_traj2_sp_geog" gist (geography(start_point))
>     "idx_traj2_sp_geom" gist (start_point)
>     "idx_traj2_time_range" gist (time_range)
>     "idx_traj2_time_track2d" gist (track)
>     "idx_traj2_time_track2d_geog" gist (geography(track))
>     "idx_traj2_time_track3d" gist (track gist_geometry_ops_nd)
>     "idx_traj2_tracker" btree (tracker)
> Access method: heap
> Options: parallel_workers=8
> 
> and the following on `tf`:
> Indexes:
>     "idx_tf_id1" btree (id1)
>     "idx_tf_id2" btree (id2)
> Access method: heap
> Options: parallel_workers=8
> 
> I have run VACUUM ANALYZE before running EXPLAIN ANALYZE and the results are as follows:
> 
> "Gather  (cost=38404.10..557039.81 rows=23555 width=8) (actual time=182.494..362202.326 rows=48253 loops=1)"
> "  Workers Planned: 2"
> "  Workers Launched: 2"
> "  ->  Parallel Hash Join  (cost=37404.10..553684.31 rows=9815 width=8) (actual time=217.140..361763.181 rows=16084 loops=3)"
> "        Hash Cond: (tf.id1 = t1.id)"
> "        Join Filter: st_dwithin(t1.track, t2.track, '0.1'::double precision)"
> "        Rows Removed by Join Filter: 4931"
> "        ->  Parallel Hash Join  (cost=870.07..7265.31 rows=26270 width=32026) (actual time=3.649..10.452 rows=21016 loops=3)"
> "              Hash Cond: (t2.id = tf.id2)"
> "              ->  Parallel Seq Scan on traj2 t2  (cost=0.00..6003.79 rows=7779 width=32022) (actual time=0.003..2.695 rows=6223 loops=3)"
> "              ->  Parallel Hash  (cost=541.70..541.70 rows=26270 width=8) (actual time=3.452..3.452 rows=21016 loops=3)"
> "                    Buckets: 65536  Batches: 1  Memory Usage: 3008kB"
> "                    ->  Parallel Seq Scan on tf  (cost=0.00..541.70 rows=26270 width=8) (actual time=0.008..0.930 rows=21016 loops=3)"
> "        ->  Parallel Hash  (cost=6003.79..6003.79 rows=7779 width=32022) (actual time=132.408..132.408 rows=6223 loops=3)"
> "              Buckets: 4096  Batches: 16  Memory Usage: 2688kB"
> "              ->  Parallel Seq Scan on traj2 t1  (cost=0.00..6003.79 rows=7779 width=32022) (actual time=121.500..123.018 rows=6223 loops=3)"
> "Planning Time: 3.335 ms"
> "JIT:"
> "  Functions: 60"
> "  Options: Inlining true, Optimization true, Expressions true, Deforming true"
> "  Timing: Generation 3.102 ms, Inlining 79.410 ms, Optimization 173.707 ms, Emission 110.965 ms, Total 367.184 ms"
> "Execution Time: 362206.853 ms"
> 
> My current settings are 
> shared_buffers: 2GB, work_mem: 64MB, maintenance_work_mem: 512MB, effective_cache_size: 4GB.
> (I am on a Ubuntu 18.04 laptop machine with 16 GB RAM and i7 2.30 GHz 16 core CPU) (I have the following postgis installation: POSTGIS="3.1.0 5e2af69" [EXTENSION] PGSQL="120" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 4.9.3, 15 August 2016" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" )
> 
> I have also tried setting STORAGE EXTERNAL as per Paul Ramsay's blog post on spatial joins to no avail.
> 
> As a final note, I have SET parallel_workers=8 but query planner seems to be stuck with 2 workers launched/planned.
> 
> I am fairly new to GIS and postgis altogether, so I might be missing something very simple and straightforward. I would be grateful if you guys can point me in the right direction.
> 
> Best regards,
> Temir
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users



More information about the postgis-users mailing list