[postgis-users] Query performance issue

Wed Oct 12 20:27:54 PDT 2005

On Wed, Oct 12, 2005 at 04:40:57PM -0400, Stephen Woodbridge wrote:
> Michael Fuhr wrote:
> >UPDATE streets SET prov = (
> >  SELECT prov_ab
> >  FROM province b
> >  WHERE streets.the_geom && b.the_geom AND intersects(streets.the_geom, 
> >  b.the_geom)
> >  ORDER BY prov_ab
> >  LIMIT 1
> >);
> 
> Ah, this is very interesting also. I had thought that it would be nice 
> to store an array() of provinces if there were more than one, but I was 
> not sure how to define the column and do that.
> 
> The UPDATE example above does not seem correct, as it looks like it will 
> update prov on ALL rows because there is not where in the UPDATE only in 
> the sub-select. Is Postgres smart enough to apply the the where in the 
> sub-select also to the UPDATE? The example in the doc page above implies 
> this is not the case.

Are you referring to this example from the documentation?

  UPDATE employees SET sales_count = sales_count + 1 WHERE id =
    (SELECT sales_person FROM accounts WHERE name = 'Acme Corporation');

This example does something different: it uses a subselect to
restrict the set of rows being updated, whereas the example I posted
updates every row in the table with the value returned by a subselect
that's run once for each row.  That is, my example essentially does
the following:

  for every row in the table
      execute a query with this row's column values
      assign the result to a column in this row
  end

Isn't that what you're trying to do?  Update every row in the streets
table to have the correct province, as in the following example?

CREATE TABLE province (
    prov_ab char(2) PRIMARY KEY
);
SELECT AddGeometryColumn('province', 'the_geom', -1, 'POLYGON', 2);

CREATE TABLE streets (
    name  text PRIMARY KEY,
    prov  char(2)
);
SELECT AddGeometryColumn('streets', 'the_geom', -1, 'LINESTRING', 2);

COPY province (prov_ab, the_geom) FROM stdin DELIMITER '|';
BC|POLYGON((0 0,20 0,20 50,0 50,0 0))
AB|POLYGON((20 0,40 0,40 50,20 50,20 0))
SK|POLYGON((40 0,60 0,60 50,40 50,40 0))
MB|POLYGON((60 0,80 0,80 50,60 50,60 0))
\.

COPY streets (name, the_geom) FROM stdin DELIMITER '|';
Street 1|LINESTRING(10 20,30 20)
Street 2|LINESTRING(65 10,75 10)
Street 3|LINESTRING(45 10,55 10)
Street 4|LINESTRING(25 10,35 10)
Street 5|LINESTRING(5 10,10 10)
\.

SELECT s.name, AsText(s.the_geom), p.prov_ab, AsText(p.the_geom)
FROM streets AS s, province AS p
WHERE s.the_geom && p.the_geom AND intersects(s.the_geom, p.the_geom)
ORDER BY s.name, p.prov_ab;
   name   |         astext          | prov_ab |                astext                 
----------+-------------------------+---------+---------------------------------------
 Street 1 | LINESTRING(10 20,30 20) | AB      | POLYGON((20 0,40 0,40 50,20 50,20 0))
 Street 1 | LINESTRING(10 20,30 20) | BC      | POLYGON((0 0,20 0,20 50,0 50,0 0))
 Street 2 | LINESTRING(65 10,75 10) | MB      | POLYGON((60 0,80 0,80 50,60 50,60 0))
 Street 3 | LINESTRING(45 10,55 10) | SK      | POLYGON((40 0,60 0,60 50,40 50,40 0))
 Street 4 | LINESTRING(25 10,35 10) | AB      | POLYGON((20 0,40 0,40 50,20 50,20 0))
 Street 5 | LINESTRING(5 10,10 10)  | BC      | POLYGON((0 0,20 0,20 50,0 50,0 0))
(6 rows)

UPDATE streets SET prov = (
  SELECT prov_ab
  FROM province AS p
  WHERE streets.the_geom && p.the_geom
    AND intersects(streets.the_geom, p.the_geom)
  ORDER BY prov_ab
  LIMIT 1
);

SELECT name, prov, AsText(the_geom) FROM streets ORDER BY name;
   name   | prov |         astext          
----------+------+-------------------------
 Street 1 | AB   | LINESTRING(10 20,30 20)
 Street 2 | MB   | LINESTRING(65 10,75 10)
 Street 3 | SK   | LINESTRING(45 10,55 10)
 Street 4 | AB   | LINESTRING(25 10,35 10)
 Street 5 | BC   | LINESTRING(5 10,10 10)
(5 rows)

Notice that Street 1 matches two provinces so the update picked the
first one alphabetically (ORDER BY prov_ab LIMIT 1); the other
streets have all been assigned the province in which they're located.
Is that what you're after?

I don't know how the subselect method will perform versus the FROM
method on a large data set, but performance aside it should work.

-- 
Michael Fuhr