# [postgis-users] How long does iteration over 4-5 million rows usually take?

Shaozhong SHI shishaozhong at gmail.com
Sat Apr 2 00:43:15 PDT 2022

```This failed with an error message saying invalid allocation of memory.

Network Walking in PostGIS · Paul Ramsey (cleverelephant.ca)
<http://blog.cleverelephant.ca/2010/07/network-walking-in-postgis.html>

The following works well on small data sets (9000 rows) and do not know how
long it would run for 4-5 million rows.

DO
\$\$DECLARE
this_id bigint;
this_geom geometry;
cluster_id_match integer;

id_a bigint;
id_b bigint;
BEGINDROP TABLE IF EXISTS clusters;CREATE TABLE clusters (cluster_id
serial, ids bigint[], geom geometry);CREATE INDEX ON clusters USING
GIST(geom);
-- Iterate through linestrings, assigning each to a cluster (if there
is an intersection)-- or creating a new cluster (if there is not)FOR
this_id, this_geom IN SELECT id, geom FROM lines LOOP
-- Look for an intersecting cluster.  (There may be more than one.)
SELECT cluster_id FROM clusters WHERE ST_Intersects(this_geom, clusters.geom)
LIMIT 1 INTO cluster_id_match;

IF cluster_id_match IS NULL THEN
-- Create a new cluster
INSERT INTO clusters (ids, geom) VALUES (ARRAY[this_id], this_geom);
ELSE
-- Append line to existing cluster
UPDATE clusters SET geom = ST_Union(this_geom, geom),
ids = array_prepend(this_id, ids)
WHERE clusters.cluster_id = cluster_id_match;
END IF;END LOOP;
-- Iterate through the clusters, combining clusters that intersect each other
LOOP
SELECT a.cluster_id, b.cluster_id FROM clusters a, clusters b
WHERE ST_Intersects(a.geom, b.geom)
AND a.cluster_id < b.cluster_id
INTO id_a, id_b;

EXIT WHEN id_a IS NULL;
-- Merge cluster A into cluster B
UPDATE clusters a SET geom = ST_Union(a.geom, b.geom), ids =
array_cat(a.ids, b.ids)
FROM clusters b
WHERE a.cluster_id = id_a AND b.cluster_id = id_b;

-- Remove cluster B
DELETE FROM clusters WHERE cluster_id = id_b;END LOOP;END;
\$\$ language plpgsql;

Regards,

David

On Sat, 2 Apr 2022 at 08:26, Alexandre Neto <senhor.neto at gmail.com> wrote:

> It really depends on many factors, but mainly of what you are trying to
> do. Can you explain what you are trying to do, can you share your query and
> data structure?
>
> On Sat Apr 2, 2022, 01:33 AM GMT, Shaozhong SHI <shishaozhong at gmail.com>
> wrote:
>
> I have a script running to iterate over 4-5 million rows.  It keeps
> showing up in red in PgAdmin.  It remains active.
>
> How long does iteration over 4-5 million rows usually take?
>
> Regards,
>
> David
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220402/9b4d815b/attachment.html>
```