[pgrouting-users] "invalid memory alloc request size" on large graphs
Regina Obe
lr at pcorp.us
Sat Jan 21 15:29:15 PST 2023
I think 1GB is a hard limit in PostgreSQL .
See for example this note
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/utils/memutils.h;h=21640d62a647d23a4bfd3f12922b615d9cc03e97;hb=HEAD#l31
and discussed here - https://postgrespro.com/list/thread-id/1549721
Instead of using pgr_createTopology, have you tried using pgr_extractVertices which will eventually replace pgr_createTopology? https://docs.pgrouting.org/latest/en/pgr_extractVertices.html
That said I think internally (the pgRouting project) we'll need to rethink how we implement some functions like (pgr_components) to avoid these issues.
In meantime, I think your best bet is to segmentize your data into groups and reconnect them by overlapping node ids.
For this I'm using this function for breaking apart https://postgis.net/docs/ST_SquareGrid.html
(alternatively - https://postgis.net/docs/ST_HexagonGrid.html ) or
(if you have areas - https://postgis.net/docs/ST_Subdivide.html --
Or some sort of known boundaries like country, city, states, provinces etc.
-- I'm going to assume your data is in 4326 (you'll need to change size if it's in another special ref)
I haven't thoroughly tested this so might need changing and you will need to change to accommodate your field names. Note the 2 in the ST_SquareGrid is 2 degrees.
DO language plpgsql $do$
DECLARE var_sql text; i integer; r record; last_component bigint;
BEGIN
-- create table of divisions that breaks into square grid regions
DROP TABLE IF EXISTS tmp_divisions;
CREATE TEMP TABLE tmp_divisions AS
SELECT row_number() OVER(ORDER BY sg.i, sg.j) AS rn, sg.geom
FROM ST_SquareGrid(2, (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM edges) ) AS sg
WHERE EXISTS (SELECT 1 FROM edges WHERE edges.the_geom && sg.geom) ;
i = 0; last_component = 0;
-- create subtables which will hold components of each sub region and will set components of overlapping nodes to the component with lowest number
FOR r IN(SELECT rn, geom FROM tmp_divisions ORDER BY rn) LOOP
i = i + 1;
var_sql = 'DROP TABLE IF EXISTS tmp_connectedComponents_' || r.rn::text || '; CREATE TEMP TABLE tmp_connectedComponents_' || r.rn::text || ' AS
SELECT (component + ' || last_component::text || ') AS component, node, NULL::bigint AS new_component FROM pgr_connectedComponents( ' ||
quote_literal('SELECT id, source, target, cost, cost AS reverse_cost FROM edges
WHERE ST_Intersects(the_geom,' || quote_literal(r.geom::text) || '::geometry' || ') ' ) || ')';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
var_sql = 'ALTER TABLE tmp_connectedComponents_' || r.rn::text || ' ADD CONSTRAINT pk_tmp_connectedComponents_' || r.rn::text || ' PRIMARY KEY(node);';
EXECUTE var_sql;
var_sql = 'CREATE INDEX ix_tmp_connectedComponents_' || r.rn::text || '_component ON tmp_connectedComponents_' || r.rn::text || '(component);';
EXECUTE var_sql;
RAISE NOTICE '%', var_sql;
COMMIT;
EXECUTE 'SELECT MAX(component) FROM tmp_connectedComponents_' || r.rn::text INTO last_component;
IF i > 1 THEN
var_sql = 'UPDATE tmp_connectedComponents_' || r.rn::text || ' AS c
SET new_component = COALESCE(p.new_component, p.component)
FROM tmp_connectedComponents_' || (i - 1) || ' AS p WHERE c.node = p.node;';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
COMMIT;
var_sql = 'UPDATE tmp_connectedComponents_' || r.rn::text || ' AS c
SET new_component = pp.new_component
FROM (SELECT min(COALESCE(p.new_component, p.component)) AS new_component, component
FROM tmp_connectedComponents_' || r.rn::text || ' AS p GROUP BY component) AS pp
WHERE c.new_component IS NULL AND pp.component = c.component;';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
COMMIT;
END IF;
END LOOP;
-- after we are all done, we consolidate all into a singled final_connectedComponents table
-- and insure a node is represented only once.
IF i > 0 THEN
DROP TABLE IF EXISTS final_connectedComponents;
var_sql = 'CREATE TABLE final_connectedComponents AS ' ||
string_agg('SELECT new_component AS component, component AS old_component, node
FROM tmp_connectedComponents_' || gi::text, ' UNION ') FROM generate_series(1, i) AS gi;
RAISE NOTICE '%' , var_sql;
EXECUTE var_sql;
var_sql = 'UPDATE final_connectedComponents SET component = old_component
WHERE component IS NULL AND
old_component IN(SELECT o.component FROM final_connectedComponents AS o WHERE o.component IS NOT NULL)';
RAISE NOTICE '%', var_sql;
EXECUTE var_sql;
-- delete dupes keeping the one that already has a component assigned
var_sql = 'DELETE FROM final_connectedComponents WHERE component IS NULL
AND node IN(SELECT o.node FROM final_connectedComponents AS o WHERE o.component IS NOT NULL)';
RAISE NOTICE '%, %', var_sql, clock_timestamp();
EXECUTE var_sql;
-- fill in still missing components
var_sql = 'UPDATE final_connectedComponents SET component = old_component WHERE component IS NULL;';
RAISE NOTICE '%, %', var_sql, clock_timestamp();
EXECUTE var_sql;
CREATE INDEX ix_final_connectedComponents_component ON final_connectedComponents(component);
COMMIT;
UPDATE final_connectedComponents AS c
SET component = pp.new_component
FROM (SELECT min(least(p.component, p.old_component)) AS new_component, old_component
FROM final_connectedComponents AS p GROUP BY old_component ) AS pp
WHERE c.old_component = pp.old_component AND c.component <> pp.new_component;
-- delete dupes
var_sql = 'DELETE FROM final_connectedComponents AS n
USING (SELECT o.node, MIN(ctid) AS min_ctid FROM final_connectedComponents AS o GROUP BY o.node HAVING COUNT(*) > 1) AS o
WHERE n.node = o.node AND n.ctid <> o.min_ctid;';
RAISE NOTICE '%, %', var_sql, clock_timestamp();
EXECUTE var_sql;
ALTER TABLE final_connectedComponents ADD CONSTRAINT pk_final_connetectedComponents PRIMARY KEY(node);
END IF;
END;
$do$;
Hope that helps,
Regina
> -----Original Message-----
> From: Pgrouting-users [mailto:pgrouting-users-bounces at lists.osgeo.org] On
> Behalf Of Phyks
> Sent: Saturday, January 21, 2023 4:13 PM
> To: pgrouting-users at lists.osgeo.org
> Subject: Re: [pgrouting-users] "invalid memory alloc request size" on large
> graphs
>
> Hi,
>
> > 1.) set the highest "work_mem" for the session and close all other
> > running programs ..
> >
> > SET work_mem = '8GB';
> > SELECT pgr_createTopology( ... )
> > -- set back for the default value;
> > SET work_mem = '1GB';
>
> Same issue :
>
> =# SET work_mem = '8GB';
> SET
> =# (SELECT component, node FROM pgr_connectedComponents(
> 'SELECT id, source, target, cost, cost AS reverse_cost FROM edges'
> ));
> ERREUR: invalid memory alloc request size 1080000000 CONTEXTE : fonction
> SQL « pgr_connectedcomponents », instruction 1
>
>
> > 2.) Linux: Enable Swap space ( ~ 20GB or 2x of your RAM )
> > https://www.digitalocean.com/community/tutorials/how-to-add-swap-
> space
> > -on-ubuntu-20-04
> >
> > 3.) optimize table variables storage spaces ( to less space )
> > - change your bigint variables to integer
> > for (about 40M edges) an integer [ -2147483648 to +2147483647 ]
> > should enought
> > https://www.postgresql.org/docs/current/datatype-numeric.html
>
> As far as I understand, `pgr_connectedComponents` as run in the example
> above should only query the `edges` table. I already have `source` and `target`
> set as regular `integer` there.
>
> I have the impression the malloc is failing for another reason that OOM:
> 1/ Malloc size should be about 1GB as per the error message. This is far less
> than my free RAM and seems to be a fixed buffer size somewhere.
> 2/ I have only 2GB RAM used on my system (hence 10GB free RAM) and do not
> see any increase of RAM usage between the start of the
> `pgr_connectedComponents` query and the error output.
>
> For these reasons, I do not expect swap / RAM changes to have any effect
> here, but instead a configuration option in PostgreSQL / pgRouting to allow
> allocations of more than 1GB?
>
> It seems I'm hitting https://postgrespro.com/list/thread-id/1549721
> which references https://github.com/pgRouting/pgrouting/issues/291 but I
> don't see much solutions there apart from using bounding boxes (which is
> easily done for routing, much less for contraction / connected components
> analysis).
>
> Thanks!
> Phyks
>
> > Phyks <phyks+pgrouting at phyks.me> ezt írta (időpont: 2023. jan. 21.,
> > Szo,
> > 14:06):
> >
> >> Hi,
> >>
> >> I'm trying to use pgRouting on a quite large graph (about 40M edges).
> >> My graph structure is expected to be quite simple and not very
> >> connected (being mostly a simple tree structure).
> >>
> >> I came across the discussion on speeding up `pgr_createTopology` for
> >> large networks (
> >> https://github.com/pgRouting/pgrouting/discussions/2255#discussioncom
> >> ment-4739376)
> >>
> >> and am trying to put it in practice for my use case. Moving from
> >> `pgr_createTopology` to `pgr_extractVertices` gives a huge increase
> >> in processing time and memory requirements.
> >>
> >> However, when trying to run `pgr_*` (`pgr_dijkstra`,
> >> `pgr_contraction` or `pgr_connectedComponents` for example) on the
> >> subsequent graph, I am facing a malloc issue : invalid memory alloc request
> size 1080000000.
> >>
> >> This is easily worked around with `pgr_dijkstra` since routing is
> >> mainly local and the edges table can easily be filtered with a bounding box.
> >> This is not as easy for `pgr_contraction` or
> >> `pgr_connectedComponents` where tiling the computation (through bbox)
> >> requires some extra post-processing.
> >>
> >> I'm not sure about the units of the `malloc` error message, but it
> >> seems to me this is about 1GB allocation which should be no issue on
> >> the machine and setup I am running. For the record, my postgresql is
> >> having either a basic configuration (out of the box install) or a
> >> pgtune-d one, same issue in both cases. My machine has a bit more than
> 10GB of free RAM.
> >>
> >> Would anyone have more insights on this issue which could help me
> >> better understand and work around it? This is particularly
> >> frustrating since computations on a quarter of the table (about 10M
> >> edges) work perfectly fine and in a very reasonable amount of time.
> >>
> >> Thank you in advance,
> >> Best,
> >> _______________________________________________
> >> Pgrouting-users mailing list
> >> Pgrouting-users at lists.osgeo.org
> >> https://lists.osgeo.org/mailman/listinfo/pgrouting-users
> >>
> >
> >
> > _______________________________________________
> > Pgrouting-users mailing list
> > Pgrouting-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/pgrouting-users
> _______________________________________________
> Pgrouting-users mailing list
> Pgrouting-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pgrouting-users
More information about the Pgrouting-users
mailing list