[postgis-users] Nearest Neighbor on Large Datasets

Pierre Racine Pierre.Racine at sbf.ulaval.ca
Mon Apr 8 13:48:01 PDT 2013


How long does it take now?

> -----Original Message-----
> From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users-
> bounces at lists.osgeo.org] On Behalf Of Spencer Gardner
> Sent: Monday, April 08, 2013 4:41 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Nearest Neighbor on Large Datasets
> 
> Thanks! That did the trick. I had previously tried a slightly different version of
> your solution but it was far less efficient.
> 
> 
> 
> From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users-
> bounces at lists.osgeo.org] On Behalf Of Pierre Racine
> Sent: Monday, April 08, 2013 10:41 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Nearest Neighbor on Large Datasets
> 
> 
> 
> Try this:
> 
> 
> 
> SELECT grid.pk_uid, point.pk_uid AS el_id, point.elevation
> 
> FROM grid_rail_lines grid, elev_rail_combined point
> 
> WHERE point.pk_uid  = (SELECT point.pk_uid id2
> 
>         FROM grid_rail_lines point
> 
>         ORDER BY point.geom <-> grid.geom
> 
>         LIMIT 1)
> 
> ORDER BY grid.pk_uid
> 
> 
> 
> From: postgis-users-bounces at lists.osgeo.org [mailto:postgis-users-
> bounces at lists.osgeo.org] On Behalf Of Spencer Gardner
> Sent: Monday, April 08, 2013 11:08 AM
> To: 'PostGIS Users Discussion'
> Subject: [postgis-users] Nearest Neighbor on Large Datasets
> 
> 
> 
> I have a layer of grid cells and a layer of discreet points representing elevation
> samples. My grid layer has a total of about 430,000 cells, the elevation data
> contains about 320,000 points, and both datasets have spatial indexes on them.
> I need to join each grid cell with the nearest elevation point (within at most 30
> feet of the cell's center point). To accomplish this, I created the following query:
> 
> 
> 
> SELECT DISTINCT ON (grid_rail_lines.pk_uid)
> 
>   grid_rail_lines.pk_uid
> 
> , elev_rail_combined.pk_uid AS el_id
> 
> , elev_rail_combined.elevation
> 
> FROM
> 
>   grid_rail_lines JOIN elev_rail_combined
> 
>     ON grid_rail_lines.the_geom<->elev_rail_combined.the_geom < 30
> 
> ORDER BY
> 
>   grid_rail_lines.pk_uid ASC
> 
> , grid_rail_lines.the_geom<->elev_rail_combined.the_geom ASC
> 
> 
> 
> 
> 
> EXPLAIN provides the following information (also available at depesz
> <http://explain.depesz.com/s/Vsm> ):
> 
> 
> 
> Unique  (cost=141973943356.82..142206589733.23 rows=434170 width=624)
> 
>   ->  Sort  (cost=141973943356.82..142090266545.03 rows=46529275283
> width=624)
> 
>         Sort Key: grid_rail_lines.pk_uid, ((grid_rail_lines.the_geom <->
> elev_rail_combined.the_geom))
> 
>         ->  Nested Loop  (cost=0.00..4910712887.16 rows=46529275283
> width=624)
> 
>               Join Filter: ((grid_rail_lines.the_geom <-> elev_rail_combined.the_geom)
> < 30::double precision)
> 
>               ->  Seq Scan on grid_rail_lines  (cost=0.00..13202.70 rows=434170
> width=484)
> 
>               ->  Seq Scan on elev_rail_combined  (cost=0.00..6220.05 rows=321505
> width=140)
> 
> 
> 
> 
> 
> I let this query run over the weekend and it took a total of 21 hours. These
> datasets are rather large so I expect it to take a long time, but I wonder if there
> is a more efficient way to conduct the join. As far as I can tell, I've structured the
> query as recommended in the documentation. Does anyone have ideas for how
> to improve performance?
> 
> 
> 
> Thanks,
> 
> Spencer
> 
> ________________________________
> 
> This e-mail and any files transmitted with it are confidential and are intended
> solely for the use of the individual or entity to whom they are addressed. If you
> are NOT the intended recipient and receive this communication, please delete
> this message and any attachments. Thank you.
> 
> ________________________________
> 
> This e-mail and any files transmitted with it are confidential and are intended
> solely for the use of the individual or entity to whom they are addressed. If you
> are NOT the intended recipient and receive this communication, please delete
> this message and any attachments. Thank you.


More information about the postgis-users mailing list