[postgis-users] Neighborhood Function using PL/PGSQL waaay to slow?
collin
collin at socrates.Berkeley.EDU
Wed Apr 21 08:53:39 PDT 2004
I'll try making a separate table and joining it with original. The
NULL comment meant that most of the points had no values placed into the
zmin field. The current test table a square 100m to a side, from Napa
County, US. The points are extremely dense: 0.25m to 1m apart. So,
there is no possibility of not picking up neighbors, even at the edges
and corners.
If I pick a random point and perform a select using the select below, I
always get points. Could there be an issue with bounding boxes?
-- Collin Bode
Paul Ramsey wrote:
> I am not sure what you mean with your NULL comment. Could it be that
> at 10m a large number of your points are not picking up any
> neighbors within 10m?
>
> Regarding a "batch update", you could do the update by joining your
> original table and your new "inserted" table into a new table using a
> standard SQL join. With the appropriate indexes this will be much
> much faster than the "death of a thousand updates" you had before.
>
> On Tuesday, April 20, 2004, at 05:36 PM, collin wrote:
>
>> Paul, Thanks for responding. The 100m turned out to be absurdly
>> large. I scaled it back to 10m box, which returns roughly 300
>> points on average. However, I found that only 3% of the points
>> returned any points at all! These ranged from 124 - 350 points
>> returned. The rest of the 10,000 points returned NULL. I do not
>> understand this result.
>>
>> I did combine the two statements into one, which now reads:
>> EXECUTE ''UPDATE test1 SET zmin = '' ||
>> '' (SELECT min(z) FROM test1 WHERE the_geom && '' ||
>> '' SetSRID(
>> EXPAND(
>> GeometryFromText(''''POINT('' || thisrow.x || ''
>> '' || thisrow.y || '' '' || thisrow.z ||'')'''',32610)
>> ,10 )
>> ,32610))
>> WHERE x = '' || thisrow.x || '' AND y = '' || thisrow.y ;
>>
>> If you could give a suggestion/pseudo code on how to do a batch
>> transfer, I would appreciate it.
>>
>> Collin Bode
>> GIS Informatics Researcher, UC Berkeley
>>
>> Paul Ramsey wrote:
>>
>>> First, how many results is your 100m box returning? If the database
>>> has to repeatedly sequence scan a pile of values for each
>>> iteration, yes, it will be slow. Not much to do there, it is just a
>>> consequence of the fact that the "local minimum" problem is hard to
>>> optimize.
>>> Second, you are doing an insert *and* an update for each iteration.
>>> Why not either (a) save the update for a batch transfer at the end
>>> or (b) combine the SQL into one huge update.
>>>
>>>
>>> collin wrote:
>>>
>>>> Folks,
>>>>
>>>> I posted earlier regarding getting statistics for every point on
>>>> their neighboring points. Currently I am trying to find the
>>>> lowest alititude within a ~100 meter radius of each point. To do
>>>> so, I wrote a PL/PGSQL function (see below) which iterates through
>>>> every row in a 10,000 row table and updates each row with the
>>>> lowest altitude within a bounding box of 100meters.
>>>> ---------------------------------------------------------------------
>>>> --------------------------------
>>>> CREATE OR REPLACE FUNCTION collintest2() RETURNS integer AS '
>>>> DECLARE
>>>> thisrow RECORD;
>>>> zzmin double precision;
>>>> ztot integer := 0;
>>>> BEGIN
>>>> FOR thisrow IN SELECT * FROM test1 LOOP
>>>> SELECT INTO zzmin min(z) FROM test1 WHERE
>>>> the_geom &&
>>>> EXPAND(GeometryFromText(''POINT('' || thisrow.x || '' '' ||
>>>> thisrow.y || '' '' || thisrow.z ||'')'',32610),100);
>>>> EXECUTE ''UPDATE test1 SET zmin = '' || zzmin || '' WHERE x =
>>>> '' || thisrow.x || '' AND y = '' || thisrow.y ;
>>>> ztot := ztot + 1;
>>>> END LOOP;
>>>> RETURN ztot;
>>>> END;
>>>> ' LANGUAGE 'plpgsql';
>>>> ---------------------------------------------------------------------
>>>> -----------------------------------
>>>>
>>>> EXPLAIN ANALYZE SELECT collintest2();
>>>> QUERY PLAN
>>>>
>>>> ---------------------------------------------------------------------
>>>> ---------------------------
>>>> Result (cost=0.00..0.01 rows=1 width=0) (actual
>>>> time=3318131.024..3318131.027 rows=1 loops=1)
>>>> Total runtime: 3318131.075 ms
>>>>
>>>> This runs painfully slow, as you can see (331ms per row). Is there
>>>> another way to do this? I am new to pl/pgsql, though very familiar
>>>> with sql. How do I optimize? Use cursors? I built a gist index
>>>> on the_geom, but it doesn't make much of a difference.
>>>>
>>>> Thanks in advance,
>>>>
>>>> Collin Bode
>>>> GIS Informatics Researcher, UC Berkeley
>>>>
>>>>
>>>> The here is the schema for the table being operated on:
>>>> CREATE TABLE test1 (
>>>> the_geom geometry,
>>>> zmin double precision,
>>>> "time" double precision,
>>>> x double precision,
>>>> y double precision,
>>>> z double precision,
>>>> i double precision,
>>>> xf double precision,
>>>> yf double precision,
>>>> zf double precision,
>>>> if double precision
>>>> CONSTRAINT "$1" CHECK ((srid(the_geom) = 32610)),
>>>> CONSTRAINT "$2" CHECK (((geometrytype(the_geom) =
>>>> 'POINT'::text) OR (the_geom IS NULL)))
>>>> );
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>
>>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
> Paul Ramsey
> Refractions Research
> Email: pramsey at refractions.net
> Phone: (250) 885-0632
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
More information about the postgis-users
mailing list