[postgis-users] How long does iteration over 4-5 million rows usually take?

Shaozhong SHI shishaozhong at gmail.com
Sat Apr 2 00:43:15 PDT 2022

This failed with an error message saying invalid allocation of memory.

Network Walking in PostGIS · Paul Ramsey (cleverelephant.ca)

The following works well on small data sets (9000 rows) and do not know how
long it would run for 4-5 million rows.

this_id bigint;
this_geom geometry;
cluster_id_match integer;

id_a bigint;
id_b bigint;
BEGINDROP TABLE IF EXISTS clusters;CREATE TABLE clusters (cluster_id
serial, ids bigint[], geom geometry);CREATE INDEX ON clusters USING
-- Iterate through linestrings, assigning each to a cluster (if there
is an intersection)-- or creating a new cluster (if there is not)FOR
this_id, this_geom IN SELECT id, geom FROM lines LOOP
  -- Look for an intersecting cluster.  (There may be more than one.)
  SELECT cluster_id FROM clusters WHERE ST_Intersects(this_geom, clusters.geom)
     LIMIT 1 INTO cluster_id_match;

  IF cluster_id_match IS NULL THEN
     -- Create a new cluster
     INSERT INTO clusters (ids, geom) VALUES (ARRAY[this_id], this_geom);
     -- Append line to existing cluster
     UPDATE clusters SET geom = ST_Union(this_geom, geom),
                          ids = array_prepend(this_id, ids)
      WHERE clusters.cluster_id = cluster_id_match;
-- Iterate through the clusters, combining clusters that intersect each other
    SELECT a.cluster_id, b.cluster_id FROM clusters a, clusters b
     WHERE ST_Intersects(a.geom, b.geom)
       AND a.cluster_id < b.cluster_id
      INTO id_a, id_b;

    -- Merge cluster A into cluster B
    UPDATE clusters a SET geom = ST_Union(a.geom, b.geom), ids =
array_cat(a.ids, b.ids)
      FROM clusters b
     WHERE a.cluster_id = id_a AND b.cluster_id = id_b;

    -- Remove cluster B
    DELETE FROM clusters WHERE cluster_id = id_b;END LOOP;END;
$$ language plpgsql;



On Sat, 2 Apr 2022 at 08:26, Alexandre Neto <senhor.neto at gmail.com> wrote:

> It really depends on many factors, but mainly of what you are trying to
> do. Can you explain what you are trying to do, can you share your query and
> data structure?
> On Sat Apr 2, 2022, 01:33 AM GMT, Shaozhong SHI <shishaozhong at gmail.com>
> wrote:
> I have a script running to iterate over 4-5 million rows.  It keeps
> showing up in red in PgAdmin.  It remains active.
> How long does iteration over 4-5 million rows usually take?
> Regards,
> David
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220402/9b4d815b/attachment.html>

More information about the postgis-users mailing list