[postgis-users] Query performance issue

Wed Oct 12 20:59:21 PDT 2005

Michael,

Thank you for your explanation and example. It is very instructional. I 
am still new to Postgres, having used MySQL for many years. I wish I had 
started using it much sooner. I am amazed at all things it can do that 
were so hard, if not impossible with MySQL.

Yes, you interrupted my needs correctly, I just didn't understand how 
the query worked but your example cleared that up.

I will give both queries a try and see what I get for performance. I'm 
running the update without the sub select now 1.5 hrs into it. I ran a 
similar query on a different polygon table and it too about 3 hours, so 
I'll know in the morning how this went.

Many thanks for your time and patience,
   -Stephen Woodbridge

Michael Fuhr wrote:
> On Wed, Oct 12, 2005 at 04:40:57PM -0400, Stephen Woodbridge wrote:
> 
>>Michael Fuhr wrote:
>>
>>>UPDATE streets SET prov = (
>>> SELECT prov_ab
>>> FROM province b
>>> WHERE streets.the_geom && b.the_geom AND intersects(streets.the_geom, 
>>> b.the_geom)
>>> ORDER BY prov_ab
>>> LIMIT 1
>>>);
>>
>>Ah, this is very interesting also. I had thought that it would be nice 
>>to store an array() of provinces if there were more than one, but I was 
>>not sure how to define the column and do that.
>>
>>The UPDATE example above does not seem correct, as it looks like it will 
>>update prov on ALL rows because there is not where in the UPDATE only in 
>>the sub-select. Is Postgres smart enough to apply the the where in the 
>>sub-select also to the UPDATE? The example in the doc page above implies 
>>this is not the case.
> 
> 
> Are you referring to this example from the documentation?
> 
>   UPDATE employees SET sales_count = sales_count + 1 WHERE id =
>     (SELECT sales_person FROM accounts WHERE name = 'Acme Corporation');
> 
> This example does something different: it uses a subselect to
> restrict the set of rows being updated, whereas the example I posted
> updates every row in the table with the value returned by a subselect
> that's run once for each row.  That is, my example essentially does
> the following:
> 
>   for every row in the table
>       execute a query with this row's column values
>       assign the result to a column in this row
>   end
> 
> Isn't that what you're trying to do?  Update every row in the streets
> table to have the correct province, as in the following example?
> 
> CREATE TABLE province (
>     prov_ab char(2) PRIMARY KEY
> );
> SELECT AddGeometryColumn('province', 'the_geom', -1, 'POLYGON', 2);
> 
> CREATE TABLE streets (
>     name  text PRIMARY KEY,
>     prov  char(2)
> );
> SELECT AddGeometryColumn('streets', 'the_geom', -1, 'LINESTRING', 2);
> 
> COPY province (prov_ab, the_geom) FROM stdin DELIMITER '|';
> BC|POLYGON((0 0,20 0,20 50,0 50,0 0))
> AB|POLYGON((20 0,40 0,40 50,20 50,20 0))
> SK|POLYGON((40 0,60 0,60 50,40 50,40 0))
> MB|POLYGON((60 0,80 0,80 50,60 50,60 0))
> \.
> 
> COPY streets (name, the_geom) FROM stdin DELIMITER '|';
> Street 1|LINESTRING(10 20,30 20)
> Street 2|LINESTRING(65 10,75 10)
> Street 3|LINESTRING(45 10,55 10)
> Street 4|LINESTRING(25 10,35 10)
> Street 5|LINESTRING(5 10,10 10)
> \.
> 
> SELECT s.name, AsText(s.the_geom), p.prov_ab, AsText(p.the_geom)
> FROM streets AS s, province AS p
> WHERE s.the_geom && p.the_geom AND intersects(s.the_geom, p.the_geom)
> ORDER BY s.name, p.prov_ab;
>    name   |         astext          | prov_ab |                astext                 
> ----------+-------------------------+---------+---------------------------------------
>  Street 1 | LINESTRING(10 20,30 20) | AB      | POLYGON((20 0,40 0,40 50,20 50,20 0))
>  Street 1 | LINESTRING(10 20,30 20) | BC      | POLYGON((0 0,20 0,20 50,0 50,0 0))
>  Street 2 | LINESTRING(65 10,75 10) | MB      | POLYGON((60 0,80 0,80 50,60 50,60 0))
>  Street 3 | LINESTRING(45 10,55 10) | SK      | POLYGON((40 0,60 0,60 50,40 50,40 0))
>  Street 4 | LINESTRING(25 10,35 10) | AB      | POLYGON((20 0,40 0,40 50,20 50,20 0))
>  Street 5 | LINESTRING(5 10,10 10)  | BC      | POLYGON((0 0,20 0,20 50,0 50,0 0))
> (6 rows)
> 
> UPDATE streets SET prov = (
>   SELECT prov_ab
>   FROM province AS p
>   WHERE streets.the_geom && p.the_geom
>     AND intersects(streets.the_geom, p.the_geom)
>   ORDER BY prov_ab
>   LIMIT 1
> );
> 
> SELECT name, prov, AsText(the_geom) FROM streets ORDER BY name;
>    name   | prov |         astext          
> ----------+------+-------------------------
>  Street 1 | AB   | LINESTRING(10 20,30 20)
>  Street 2 | MB   | LINESTRING(65 10,75 10)
>  Street 3 | SK   | LINESTRING(45 10,55 10)
>  Street 4 | AB   | LINESTRING(25 10,35 10)
>  Street 5 | BC   | LINESTRING(5 10,10 10)
> (5 rows)
> 
> Notice that Street 1 matches two provinces so the update picked the
> first one alphabetically (ORDER BY prov_ab LIMIT 1); the other
> streets have all been assigned the province in which they're located.
> Is that what you're after?
> 
> I don't know how the subselect method will perform versus the FROM
> method on a large data set, but performance aside it should work.
>