[postgis-users] Query performance issue
Stephen Woodbridge
woodbri at swoodbridge.com
Wed Oct 12 20:59:21 PDT 2005
Michael,
Thank you for your explanation and example. It is very instructional. I
am still new to Postgres, having used MySQL for many years. I wish I had
started using it much sooner. I am amazed at all things it can do that
were so hard, if not impossible with MySQL.
Yes, you interrupted my needs correctly, I just didn't understand how
the query worked but your example cleared that up.
I will give both queries a try and see what I get for performance. I'm
running the update without the sub select now 1.5 hrs into it. I ran a
similar query on a different polygon table and it too about 3 hours, so
I'll know in the morning how this went.
Many thanks for your time and patience,
-Stephen Woodbridge
Michael Fuhr wrote:
> On Wed, Oct 12, 2005 at 04:40:57PM -0400, Stephen Woodbridge wrote:
>
>>Michael Fuhr wrote:
>>
>>>UPDATE streets SET prov = (
>>> SELECT prov_ab
>>> FROM province b
>>> WHERE streets.the_geom && b.the_geom AND intersects(streets.the_geom,
>>> b.the_geom)
>>> ORDER BY prov_ab
>>> LIMIT 1
>>>);
>>
>>Ah, this is very interesting also. I had thought that it would be nice
>>to store an array() of provinces if there were more than one, but I was
>>not sure how to define the column and do that.
>>
>>The UPDATE example above does not seem correct, as it looks like it will
>>update prov on ALL rows because there is not where in the UPDATE only in
>>the sub-select. Is Postgres smart enough to apply the the where in the
>>sub-select also to the UPDATE? The example in the doc page above implies
>>this is not the case.
>
>
> Are you referring to this example from the documentation?
>
> UPDATE employees SET sales_count = sales_count + 1 WHERE id =
> (SELECT sales_person FROM accounts WHERE name = 'Acme Corporation');
>
> This example does something different: it uses a subselect to
> restrict the set of rows being updated, whereas the example I posted
> updates every row in the table with the value returned by a subselect
> that's run once for each row. That is, my example essentially does
> the following:
>
> for every row in the table
> execute a query with this row's column values
> assign the result to a column in this row
> end
>
> Isn't that what you're trying to do? Update every row in the streets
> table to have the correct province, as in the following example?
>
> CREATE TABLE province (
> prov_ab char(2) PRIMARY KEY
> );
> SELECT AddGeometryColumn('province', 'the_geom', -1, 'POLYGON', 2);
>
> CREATE TABLE streets (
> name text PRIMARY KEY,
> prov char(2)
> );
> SELECT AddGeometryColumn('streets', 'the_geom', -1, 'LINESTRING', 2);
>
> COPY province (prov_ab, the_geom) FROM stdin DELIMITER '|';
> BC|POLYGON((0 0,20 0,20 50,0 50,0 0))
> AB|POLYGON((20 0,40 0,40 50,20 50,20 0))
> SK|POLYGON((40 0,60 0,60 50,40 50,40 0))
> MB|POLYGON((60 0,80 0,80 50,60 50,60 0))
> \.
>
> COPY streets (name, the_geom) FROM stdin DELIMITER '|';
> Street 1|LINESTRING(10 20,30 20)
> Street 2|LINESTRING(65 10,75 10)
> Street 3|LINESTRING(45 10,55 10)
> Street 4|LINESTRING(25 10,35 10)
> Street 5|LINESTRING(5 10,10 10)
> \.
>
> SELECT s.name, AsText(s.the_geom), p.prov_ab, AsText(p.the_geom)
> FROM streets AS s, province AS p
> WHERE s.the_geom && p.the_geom AND intersects(s.the_geom, p.the_geom)
> ORDER BY s.name, p.prov_ab;
> name | astext | prov_ab | astext
> ----------+-------------------------+---------+---------------------------------------
> Street 1 | LINESTRING(10 20,30 20) | AB | POLYGON((20 0,40 0,40 50,20 50,20 0))
> Street 1 | LINESTRING(10 20,30 20) | BC | POLYGON((0 0,20 0,20 50,0 50,0 0))
> Street 2 | LINESTRING(65 10,75 10) | MB | POLYGON((60 0,80 0,80 50,60 50,60 0))
> Street 3 | LINESTRING(45 10,55 10) | SK | POLYGON((40 0,60 0,60 50,40 50,40 0))
> Street 4 | LINESTRING(25 10,35 10) | AB | POLYGON((20 0,40 0,40 50,20 50,20 0))
> Street 5 | LINESTRING(5 10,10 10) | BC | POLYGON((0 0,20 0,20 50,0 50,0 0))
> (6 rows)
>
> UPDATE streets SET prov = (
> SELECT prov_ab
> FROM province AS p
> WHERE streets.the_geom && p.the_geom
> AND intersects(streets.the_geom, p.the_geom)
> ORDER BY prov_ab
> LIMIT 1
> );
>
> SELECT name, prov, AsText(the_geom) FROM streets ORDER BY name;
> name | prov | astext
> ----------+------+-------------------------
> Street 1 | AB | LINESTRING(10 20,30 20)
> Street 2 | MB | LINESTRING(65 10,75 10)
> Street 3 | SK | LINESTRING(45 10,55 10)
> Street 4 | AB | LINESTRING(25 10,35 10)
> Street 5 | BC | LINESTRING(5 10,10 10)
> (5 rows)
>
> Notice that Street 1 matches two provinces so the update picked the
> first one alphabetically (ORDER BY prov_ab LIMIT 1); the other
> streets have all been assigned the province in which they're located.
> Is that what you're after?
>
> I don't know how the subselect method will perform versus the FROM
> method on a large data set, but performance aside it should work.
>
More information about the postgis-users
mailing list