[postgis-users] Nearest Neighbor on Large Datasets

Spencer Gardner spgardner at HNTB.com
Mon Apr 8 14:03:56 PDT 2013


It took me about 10 minutes (I didn't note the exact time but I can re-run if you'd like a more exact figure). I'd rather not wait the 21 hours to run EXPLAIN ANALYZE on my old query but I would guess that it was having table scan issues despite the presence of the spatial indices. This would explain the vast difference in query time.

Spencer

-----Original Message-----
From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Pierre Racine
Sent: Monday, April 08, 2013 3:48 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Nearest Neighbor on Large Datasets

How long does it take now?

> -----Original Message-----
> From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users- 
> bounces at lists.osgeo.org] On Behalf Of Spencer Gardner
> Sent: Monday, April 08, 2013 4:41 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Nearest Neighbor on Large Datasets
> 
> Thanks! That did the trick. I had previously tried a slightly 
> different version of your solution but it was far less efficient.
> 
> 
> 
> From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users- 
> bounces at lists.osgeo.org] On Behalf Of Pierre Racine
> Sent: Monday, April 08, 2013 10:41 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Nearest Neighbor on Large Datasets
> 
> 
> 
> Try this:
> 
> 
> 
> SELECT grid.pk_uid, point.pk_uid AS el_id, point.elevation
> 
> FROM grid_rail_lines grid, elev_rail_combined point
> 
> WHERE point.pk_uid  = (SELECT point.pk_uid id2
> 
>         FROM grid_rail_lines point
> 
>         ORDER BY point.geom <-> grid.geom
> 
>         LIMIT 1)
> 
> ORDER BY grid.pk_uid
> 
> 
> 
> From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users- 
> bounces at lists.osgeo.org] On Behalf Of Spencer Gardner
> Sent: Monday, April 08, 2013 11:08 AM
> To: 'PostGIS Users Discussion'
> Subject: [postgis-users] Nearest Neighbor on Large Datasets
> 
> 
> 
> I have a layer of grid cells and a layer of discreet points 
> representing elevation samples. My grid layer has a total of about 
> 430,000 cells, the elevation data contains about 320,000 points, and both datasets have spatial indexes on them.
> I need to join each grid cell with the nearest elevation point (within 
> at most 30 feet of the cell's center point). To accomplish this, I created the following query:
> 
> 
> 
> SELECT DISTINCT ON (grid_rail_lines.pk_uid)
> 
>   grid_rail_lines.pk_uid
> 
> , elev_rail_combined.pk_uid AS el_id
> 
> , elev_rail_combined.elevation
> 
> FROM
> 
>   grid_rail_lines JOIN elev_rail_combined
> 
>     ON grid_rail_lines.the_geom<->elev_rail_combined.the_geom < 30
> 
> ORDER BY
> 
>   grid_rail_lines.pk_uid ASC
> 
> , grid_rail_lines.the_geom<->elev_rail_combined.the_geom ASC
> 
> 
> 
> 
> 
> EXPLAIN provides the following information (also available at depesz 
> <http://explain.depesz.com/s/Vsm> ):
> 
> 
> 
> Unique  (cost=141973943356.82..142206589733.23 rows=434170 width=624)
> 
>   ->  Sort  (cost=141973943356.82..142090266545.03 rows=46529275283
> width=624)
> 
>         Sort Key: grid_rail_lines.pk_uid, ((grid_rail_lines.the_geom 
> <->
> elev_rail_combined.the_geom))
> 
>         ->  Nested Loop  (cost=0.00..4910712887.16 rows=46529275283
> width=624)
> 
>               Join Filter: ((grid_rail_lines.the_geom <-> 
> elev_rail_combined.the_geom) < 30::double precision)
> 
>               ->  Seq Scan on grid_rail_lines  (cost=0.00..13202.70 
> rows=434170
> width=484)
> 
>               ->  Seq Scan on elev_rail_combined  (cost=0.00..6220.05 
> rows=321505
> width=140)
> 
> 
> 
> 
> 
> I let this query run over the weekend and it took a total of 21 hours. 
> These datasets are rather large so I expect it to take a long time, 
> but I wonder if there is a more efficient way to conduct the join. As 
> far as I can tell, I've structured the query as recommended in the 
> documentation. Does anyone have ideas for how to improve performance?
> 
> 
> 
> Thanks,
> 
> Spencer
> 
> ________________________________
> 
> This e-mail and any files transmitted with it are confidential and are 
> intended solely for the use of the individual or entity to whom they 
> are addressed. If you are NOT the intended recipient and receive this 
> communication, please delete this message and any attachments. Thank you.
> 
> ________________________________
> 
> This e-mail and any files transmitted with it are confidential and are 
> intended solely for the use of the individual or entity to whom they 
> are addressed. If you are NOT the intended recipient and receive this 
> communication, please delete this message and any attachments. Thank you.
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
------------------------------------------------------------------------------

This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are NOT the intended recipient and receive this communication, please delete this message and any attachments. Thank you.



More information about the postgis-users mailing list