[postgis-users] How long does iteration over 4-5 million rows usually take?

Giuseppe Broccolo g.broccolo.7 at gmail.com
Sun Apr 3 07:29:16 PDT 2022


Hi David,

That blog article is from 12 years ago, and was presenting the best
solution available at that time. Several new features have been included in
the meantime. For example, the usage of LATERAL JOIN would allow in your
case to avoid the loop over the records in your pl/pgsql procedure, which
is what is overloading your DB in the execution.

Giuseppe.

On Sat, 2 Apr 2022, 08:43 Shaozhong SHI, <shishaozhong at gmail.com> wrote:

> This failed with an error message saying invalid allocation of memory.
>
> Network Walking in PostGIS · Paul Ramsey (cleverelephant.ca)
> <http://blog.cleverelephant.ca/2010/07/network-walking-in-postgis.html>
>
> The following works well on small data sets (9000 rows) and do not know
> how long it would run for 4-5 million rows.
>
> DO
> $$DECLARE
> this_id bigint;
> this_geom geometry;
> cluster_id_match integer;
>
> id_a bigint;
> id_b bigint;
> BEGINDROP TABLE IF EXISTS clusters;CREATE TABLE clusters (cluster_id serial, ids bigint[], geom geometry);CREATE INDEX ON clusters USING GIST(geom);
> -- Iterate through linestrings, assigning each to a cluster (if there is an intersection)-- or creating a new cluster (if there is not)FOR this_id, this_geom IN SELECT id, geom FROM lines LOOP
>   -- Look for an intersecting cluster.  (There may be more than one.)
>   SELECT cluster_id FROM clusters WHERE ST_Intersects(this_geom, clusters.geom)
>      LIMIT 1 INTO cluster_id_match;
>
>   IF cluster_id_match IS NULL THEN
>      -- Create a new cluster
>      INSERT INTO clusters (ids, geom) VALUES (ARRAY[this_id], this_geom);
>   ELSE
>      -- Append line to existing cluster
>      UPDATE clusters SET geom = ST_Union(this_geom, geom),
>                           ids = array_prepend(this_id, ids)
>       WHERE clusters.cluster_id = cluster_id_match;
>   END IF;END LOOP;
> -- Iterate through the clusters, combining clusters that intersect each other
> LOOP
>     SELECT a.cluster_id, b.cluster_id FROM clusters a, clusters b
>      WHERE ST_Intersects(a.geom, b.geom)
>        AND a.cluster_id < b.cluster_id
>       INTO id_a, id_b;
>
>     EXIT WHEN id_a IS NULL;
>     -- Merge cluster A into cluster B
>     UPDATE clusters a SET geom = ST_Union(a.geom, b.geom), ids = array_cat(a.ids, b.ids)
>       FROM clusters b
>      WHERE a.cluster_id = id_a AND b.cluster_id = id_b;
>
>     -- Remove cluster B
>     DELETE FROM clusters WHERE cluster_id = id_b;END LOOP;END;
> $$ language plpgsql;
>
>
> Regards,
>
>
> David
>
>
> On Sat, 2 Apr 2022 at 08:26, Alexandre Neto <senhor.neto at gmail.com> wrote:
>
>> It really depends on many factors, but mainly of what you are trying to
>> do. Can you explain what you are trying to do, can you share your query and
>> data structure?
>>
>> On Sat Apr 2, 2022, 01:33 AM GMT, Shaozhong SHI <shishaozhong at gmail.com>
>> wrote:
>>
>> I have a script running to iterate over 4-5 million rows.  It keeps
>> showing up in red in PgAdmin.  It remains active.
>>
>> How long does iteration over 4-5 million rows usually take?
>>
>> Regards,
>>
>> David
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220403/0b58427f/attachment.html>


More information about the postgis-users mailing list