[postgis-users] Large (and slow) Intersection

Paul Ramsey pramsey at refractions.net
Sat Aug 28 10:05:36 PDT 2004


Bzzt. Never use a buffer when you want to test proximity. Distance() is 
a much much cheaper function, so if you are only concerned about 
boolean tests of proximity you should be building your queries around 
that. Buffer() is very expensive, it has to build a whole new geometry.

How wedded are you to your 1km? Do you have to find all things 
*exactly* within 1km, or is "nearby" good enough? If "nearby" is the 
ticket I would suggest a naive transformation of 1km into degrees and 
pretend you are in a cartesian coordinate system. That would be the 
highest performance approach.

With reference to your actual example, your approach to generating a 
buffer is correct -- buffer makes no sense in a geographic coordinate 
system, so the double transform approach is required.

You need to think more like a database, if you are going to write 
efficient queries.

- Some functions are expensive, some are cheap.
- Indexes are (usually) cheapest of all.
- Indexes work against bounding boxes.

> select points.the_geom
> from points, polys
> where
>    points.the_geom && TRANSFORM(BUFFER(TRANSFORM(polys.the_geom, 
> 26711),
> 1000), 4326)
> and
>    polys.keyfield = 'KEYVALUE'
> and
>    within(points.the_geom, TRANSFORM(BUFFER(TRANSFORM(polys.the_geom,
> 26711), 1000), 4326))

So, in your index clause, you have a buffer()! Ouch! You are buffering 
*every* polygon in your polygon table as part of your index search. Not 
efficient. For your index search you want to leave your geometries 
alone for the most part. This is where working in lat/lon can be more 
trouble than it is worth. If you do not *need* to be working in lat/lon 
(too much global data to fit nicely in a planar system) you should pick 
a planar system. Otherwise you will spend a great deal of time working 
with degrees-to-meters and back again issues.

Here is an example of a query that efficiently does a "give me 
everything within X meters" query:

select * from foo where the_geom && expand(<geometry>,X) and 
distance(the_geom,<geometry>) < X

The expand() function is very important for this, because it is a very 
cheap function that does nothing but return a slightly larger bounding 
box of a geometry.

On Friday, August 27, 2004, at 11:31 AM, James McEachern wrote:

> Hi Paul,
>
> Exactly. Why isn't he running that.  Doh!
>
> That works perfectly. Your right. Extremely fast!
>
> Of course we wandered off the map when I starting trying to
> put a 1 km buffer around  each land polygon, to find the points close 
> by as
> well.
>
> And my points and land are in the database are in NAD27 lat/lon format,
> so I though just do a TRANSFORM(BUFFER(TRANSFORM(polys.the_geom, 
> 26711),
> 1000), 4326)
> to put a 1000 m buffer on each polygon, so I could compare this to the
> lat/lon points.
>
> And my new SQL is
>
> select points.the_geom
> from points, polys
> where
>    points.the_geom && TRANSFORM(BUFFER(TRANSFORM(polys.the_geom, 
> 26711),
> 1000), 4326)
> and
>    polys.keyfield = 'KEYVALUE'
> and
>    within(points.the_geom, TRANSFORM(BUFFER(TRANSFORM(polys.the_geom,
> 26711), 1000), 4326))
>
> Well, even for small polygon lists this never returns.
> But if I do this:
>
> select points.the_geom
> from points, polys
> WHERE
>   points.the_geom && (
>     SELECT
>       TRANSFORM(BUFFER(TRANSFORM(GEOMUNION(the_geom), 26711), 100), 
> 4326)
>     FROM
>       polys
>     WHERE
>       keyfield = 'KEYVALUE')
>   AND
>     keyfield = 'KEYVALUE'
>   AND
>   WITHIN(points.the_geom, 
> TRANSFORM(BUFFER(TRANSFORM(GEOMUNION(the_geom),
> 26711), 100), 4326))
>
> THis will return results for small and closely related buffers 
> polygons.
> But if the polygons are very far apart, then the point search is tooo 
> slow.
>
> Is this the proper method of adding a 1km buffer to a lat/lon polygon?
>
> thanks,
>
> jamesm
>
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of 
> Paul
> Ramsey
> Sent: Friday, August 27, 2004 11:06 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Large (and slow) Intersection
>
> I don't understand why you are not using the simple SQL for this
> problem:
>
> select points.the_geom
> from points, polys
> where
>    points.the_geom && polys.the_geom
> and
>    polys.keyfield = 'KEYVALUE'
> and
>    within(points.the_geom,polys.the_geom)
>
> Make sure you have USE_STATS turned on and have run select
> update_geometry_stats(), (or if you are in 8.0 land, have simply run 
> vacuum
> analyze (i love that)).
>
> On Friday, August 27, 2004, at 08:39 AM, James McEachern wrote:
>
>> We have the GIST index, and use a statement just like this for each
>> individual polygon.
>>
>> Our problem is that there are hundreds of polygons, and we would like
>> to get a result with one statement. These polygons could be in the
>> four outer corners (but still very small in area), which causes the
>> the envelope to be large.
>>
>> An example of the statement
>>
>>
>> SELECT points.the_geom
>> 	FROM points
>> 	WHERE points.the_geom &&
>>        (SELECT geomunion(polys.the_geom) from polys
>>        somekeyfield = "KEYVALUE')
>> 	AND withing(points.the_geom,
>>        (SELECT geomunion(polys.the_geom) from polys
>>        somekeyfield = "KEYVALUE'));
>>
>>
>> Now this works when the polygons are close together geographically,
>> but when they are wildly spaced over the whole area, the && condition
>> really returns all of the points.
>>
>> The obvious solution is to first to first select all of the individual
>> polys (no union), and then iteratively call your SELECT for each
>> polygon returned.
>>
>> SELECT polys.the_geom from polys WHERE somekeyfield = "KEYVALUE';
>>
>> foreach (poly in polyresult)
>> {
>>   SELECT points.the_geom
>>     FROM points
>>     WHERE points.the_geom && poly
>>     AND withing(points.the_geom, poly);
>>   // keep track of the results of each iteration }
>>
>> This gives the result quickly, but is more complex and more hits
>> against the database.
>>
>> thanks,
>>
>> jamesm
>>
>> -----Original Message-----
>> From: postgis-users-bounces at postgis.refractions.net
>> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
>> strk at refractions.net
>> Sent: Friday, August 27, 2004 2:17 AM
>> To: PostGIS Users Discussion
>> Subject: Re: [postgis-users] Large (and slow) Intersection
>>
>> I did not understand your problem, does this query do what you need ?
>>
>> SELECT points.the_geom
>> 	FROM points, polys
>> 	WHERE points.the_geom && polys.the_geom
>> 	AND withing(points.the_geom, polys.the_geom);
>>
>> --strk;
>>
>> On Fri, Aug 27, 2004 at 01:20:29AM -0600, James McEachern wrote:
>>> Hi All,
>>>
>>> We are trying to locate point objects that fall within polygon
>>> objects.
>>>
>>> Our data points are spread out over a large area, and the polygons
>>> used to locate may be spread over an equally large area, but only
>>> would have a very small coverage (< 1% of total area)
>>>
>>> There are ~500K point objects, and upwards of 1000 polygons.
>>>
>>> Of course if we union the polygons together, the resulting envelope
>>> used for the && operation basically gets all of the points.
>>> And this would run for ever.
>>>
>>> By doing the query on any single polygon with the && and
>>> intersection, we get the correct result very fast, therefore running
>>> it for 1000 polygons, it would take around ~1000 seconds. Quite
> manageable.
>>>
>>> I have seen previous posts regarding this problem/solution, and they
>>> used an iterative solution, to walk over the polygon result set (in
>>> code) doing each && intersection independently and summing (actually
>>> unioning) the results. Creating a stored proc looks like one 
>>> solution.
>>>
>>> I am looking for a correlated sub-query (or something) to get the
>>> result.
>>>
>>> Does anyone know if a single (correlated) query is possible?
>>>
>>> Thanks,
>>>
>>> James S McEachern
>>> Geo Webworks Inc.
>>> 2020, 801 6 Avenue SW
>>> Calgary, Alberta
>>> T2P 3W2
>>>
>>> 403-301-4001
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>       Paul Ramsey
>       Refractions Research
>       Email: pramsey at refractions.net
>       Phone: (250) 885-0632
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
      Paul Ramsey
      Refractions Research
      Email: pramsey at refractions.net
      Phone: (250) 885-0632




More information about the postgis-users mailing list