[postgis-users] How long does iteration over 4-5 million rows usually take?

Shaozhong SHI shishaozhong at gmail.com
Sat Apr 2 00:43:15 PDT 2022


This failed with an error message saying invalid allocation of memory.

Network Walking in PostGIS · Paul Ramsey (cleverelephant.ca)
<http://blog.cleverelephant.ca/2010/07/network-walking-in-postgis.html>

The following works well on small data sets (9000 rows) and do not know how
long it would run for 4-5 million rows.

DO
$$DECLARE
this_id bigint;
this_geom geometry;
cluster_id_match integer;

id_a bigint;
id_b bigint;
BEGINDROP TABLE IF EXISTS clusters;CREATE TABLE clusters (cluster_id
serial, ids bigint[], geom geometry);CREATE INDEX ON clusters USING
GIST(geom);
-- Iterate through linestrings, assigning each to a cluster (if there
is an intersection)-- or creating a new cluster (if there is not)FOR
this_id, this_geom IN SELECT id, geom FROM lines LOOP
  -- Look for an intersecting cluster.  (There may be more than one.)
  SELECT cluster_id FROM clusters WHERE ST_Intersects(this_geom, clusters.geom)
     LIMIT 1 INTO cluster_id_match;

  IF cluster_id_match IS NULL THEN
     -- Create a new cluster
     INSERT INTO clusters (ids, geom) VALUES (ARRAY[this_id], this_geom);
  ELSE
     -- Append line to existing cluster
     UPDATE clusters SET geom = ST_Union(this_geom, geom),
                          ids = array_prepend(this_id, ids)
      WHERE clusters.cluster_id = cluster_id_match;
  END IF;END LOOP;
-- Iterate through the clusters, combining clusters that intersect each other
LOOP
    SELECT a.cluster_id, b.cluster_id FROM clusters a, clusters b
     WHERE ST_Intersects(a.geom, b.geom)
       AND a.cluster_id < b.cluster_id
      INTO id_a, id_b;

    EXIT WHEN id_a IS NULL;
    -- Merge cluster A into cluster B
    UPDATE clusters a SET geom = ST_Union(a.geom, b.geom), ids =
array_cat(a.ids, b.ids)
      FROM clusters b
     WHERE a.cluster_id = id_a AND b.cluster_id = id_b;

    -- Remove cluster B
    DELETE FROM clusters WHERE cluster_id = id_b;END LOOP;END;
$$ language plpgsql;


Regards,


David


On Sat, 2 Apr 2022 at 08:26, Alexandre Neto <senhor.neto at gmail.com> wrote:

> It really depends on many factors, but mainly of what you are trying to
> do. Can you explain what you are trying to do, can you share your query and
> data structure?
>
> On Sat Apr 2, 2022, 01:33 AM GMT, Shaozhong SHI <shishaozhong at gmail.com>
> wrote:
>
> I have a script running to iterate over 4-5 million rows.  It keeps
> showing up in red in PgAdmin.  It remains active.
>
> How long does iteration over 4-5 million rows usually take?
>
> Regards,
>
> David
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220402/9b4d815b/attachment.html>


More information about the postgis-users mailing list