[pgrouting-users] "invalid memory alloc request size" on large graphs

Sat Jan 21 15:29:15 PST 2023

I think 1GB is a hard limit in PostgreSQL .

See for example this note

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/utils/memutils.h;h=21640d62a647d23a4bfd3f12922b615d9cc03e97;hb=HEAD#l31

and discussed here - https://postgrespro.com/list/thread-id/1549721

Instead of using pgr_createTopology, have you tried using pgr_extractVertices which will eventually replace pgr_createTopology? https://docs.pgrouting.org/latest/en/pgr_extractVertices.html

That said I think internally (the pgRouting project) we'll need to rethink how we implement some functions like (pgr_components) to avoid these issues.

In meantime, I think your best bet is to segmentize your data into groups and reconnect them by overlapping node ids.

For this I'm using this function for breaking apart https://postgis.net/docs/ST_SquareGrid.html
(alternatively - https://postgis.net/docs/ST_HexagonGrid.html ) or
(if you have areas - https://postgis.net/docs/ST_Subdivide.html -- 

Or  some sort of known boundaries like country, city, states, provinces etc.

-- I'm going to assume your data is in 4326 (you'll need to change size if it's in another special ref)
I haven't thoroughly tested this so might need changing and you will need to change to accommodate your field names.  Note the 2 in the ST_SquareGrid is 2 degrees.

DO language plpgsql $do$
DECLARE var_sql text; i integer; r record; last_component bigint;
BEGIN
    -- create table of divisions that breaks into square grid regions
    DROP TABLE IF EXISTS tmp_divisions;
    CREATE TEMP TABLE tmp_divisions AS 
    SELECT row_number() OVER(ORDER BY sg.i, sg.j) AS rn, sg.geom
    FROM ST_SquareGrid(2,  (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM edges)  ) AS sg
    WHERE EXISTS (SELECT 1 FROM  edges WHERE edges.the_geom && sg.geom) ;
    i = 0; last_component = 0;
    -- create subtables which will hold components of each sub region and will set components of overlapping nodes to the component with lowest number
    FOR r IN(SELECT rn, geom FROM tmp_divisions ORDER BY rn) LOOP
        i = i + 1;
        var_sql = 'DROP TABLE IF EXISTS tmp_connectedComponents_' || r.rn::text || '; CREATE TEMP TABLE tmp_connectedComponents_' || r.rn::text || ' AS 
    SELECT (component + ' || last_component::text || ') AS component, node, NULL::bigint AS new_component FROM pgr_connectedComponents( ' ||
            quote_literal('SELECT id, source, target, cost, cost AS reverse_cost FROM edges
            WHERE ST_Intersects(the_geom,' || quote_literal(r.geom::text) || '::geometry' || ') ' ) || ')';
        RAISE NOTICE '%', var_sql;
        EXECUTE var_sql;
        var_sql = 'ALTER TABLE tmp_connectedComponents_' || r.rn::text || ' ADD CONSTRAINT pk_tmp_connectedComponents_' || r.rn::text || ' PRIMARY KEY(node);';
        EXECUTE var_sql;
        var_sql = 'CREATE INDEX ix_tmp_connectedComponents_' || r.rn::text || '_component ON tmp_connectedComponents_' || r.rn::text || '(component);';
        EXECUTE var_sql;
        RAISE NOTICE '%', var_sql;
        COMMIT;
        EXECUTE 'SELECT MAX(component) FROM tmp_connectedComponents_' || r.rn::text INTO last_component;
        IF i > 1 THEN 
            var_sql =  'UPDATE tmp_connectedComponents_' || r.rn::text || ' AS c 
                SET new_component =  COALESCE(p.new_component, p.component) 
                FROM tmp_connectedComponents_' || (i - 1) || ' AS p WHERE c.node = p.node;';
            RAISE NOTICE '%', var_sql;
            EXECUTE var_sql;
            COMMIT;
            var_sql =  'UPDATE tmp_connectedComponents_' || r.rn::text || ' AS c 
                SET new_component = pp.new_component
         FROM (SELECT min(COALESCE(p.new_component, p.component)) AS new_component, component 
                FROM tmp_connectedComponents_' || r.rn::text || ' AS p GROUP BY component) AS pp
                WHERE c.new_component IS NULL AND pp.component = c.component;';
            RAISE NOTICE '%', var_sql;
            EXECUTE var_sql;
            COMMIT;
        END IF;
    END LOOP;

    -- after we are all done, we consolidate all into a singled final_connectedComponents table
    -- and insure a node is represented only once.
    IF i > 0 THEN
        DROP TABLE IF EXISTS final_connectedComponents;
        var_sql = 'CREATE TABLE final_connectedComponents AS ' || 
                string_agg('SELECT new_component AS component, component AS old_component, node 
                             FROM tmp_connectedComponents_' || gi::text, ' UNION ') FROM  generate_series(1, i) AS gi;

        RAISE NOTICE '%' , var_sql;
        EXECUTE var_sql;

        var_sql = 'UPDATE final_connectedComponents SET component = old_component 
            WHERE component IS NULL AND 
                old_component IN(SELECT o.component FROM final_connectedComponents AS o WHERE o.component IS NOT NULL)';
        RAISE NOTICE '%', var_sql;
        EXECUTE var_sql;

         -- delete dupes keeping the one that already has a component assigned
        var_sql = 'DELETE FROM final_connectedComponents WHERE component IS NULL 
            AND node IN(SELECT o.node FROM final_connectedComponents AS o WHERE o.component IS NOT NULL)';
        RAISE NOTICE '%, %', var_sql, clock_timestamp();
        EXECUTE var_sql;

        -- fill in still missing components
        var_sql = 'UPDATE final_connectedComponents SET component = old_component WHERE component IS NULL;';
        RAISE NOTICE '%, %', var_sql, clock_timestamp();
        EXECUTE var_sql;

        CREATE INDEX ix_final_connectedComponents_component ON final_connectedComponents(component);
        COMMIT;
        UPDATE final_connectedComponents AS c 
                SET component = pp.new_component
         FROM (SELECT min(least(p.component, p.old_component)) AS new_component, old_component 
                FROM final_connectedComponents AS p GROUP BY old_component ) AS pp
                WHERE c.old_component = pp.old_component  AND c.component <> pp.new_component;

        -- delete dupes
        var_sql = 'DELETE FROM final_connectedComponents AS n
            USING (SELECT o.node, MIN(ctid) AS min_ctid FROM final_connectedComponents AS o GROUP BY o.node HAVING COUNT(*) > 1) AS o
            WHERE n.node = o.node AND n.ctid <> o.min_ctid;';
        RAISE NOTICE '%, %', var_sql, clock_timestamp();
        EXECUTE var_sql;
        ALTER TABLE final_connectedComponents ADD CONSTRAINT pk_final_connetectedComponents PRIMARY KEY(node);
    END IF;
END;
$do$;

Hope that helps,
Regina

> -----Original Message-----
> From: Pgrouting-users [mailto:pgrouting-users-bounces at lists.osgeo.org] On
> Behalf Of Phyks
> Sent: Saturday, January 21, 2023 4:13 PM
> To: pgrouting-users at lists.osgeo.org
> Subject: Re: [pgrouting-users] "invalid memory alloc request size" on large
> graphs
> 
> Hi,
> 
> > 1.) set the highest "work_mem" for the session and close all other
> > running programs ..
> >
> > SET work_mem = '8GB';
> > SELECT pgr_createTopology( ... )
> > -- set back for the default value;
> > SET work_mem = '1GB';
> 
> Same issue :
> 
> =# SET work_mem = '8GB';
> SET
> =# (SELECT component, node FROM pgr_connectedComponents(
>          'SELECT id, source, target, cost, cost AS reverse_cost FROM edges'
> ));
> ERREUR:  invalid memory alloc request size 1080000000 CONTEXTE : fonction
> SQL « pgr_connectedcomponents », instruction 1
> 
> 
> > 2.) Linux:  Enable Swap space (  ~ 20GB  or  2x of your RAM )
> > https://www.digitalocean.com/community/tutorials/how-to-add-swap-
> space
> > -on-ubuntu-20-04
> >
> > 3.) optimize table variables storage spaces ( to less space )
> >      - change your bigint variables to integer
> >      for (about 40M edges) an integer  [  -2147483648 to +2147483647 ]
> > should enought
> >      https://www.postgresql.org/docs/current/datatype-numeric.html
> 
> As far as I understand, `pgr_connectedComponents` as run in the example
> above should only query the `edges` table. I already have `source` and `target`
> set as regular `integer` there.
> 
> I have the impression the malloc is failing for another reason that OOM:
> 1/ Malloc size should be about 1GB as per the error message. This is far less
> than my free RAM and seems to be a fixed buffer size somewhere.
> 2/ I have only 2GB RAM used on my system (hence 10GB free RAM) and do not
> see any increase of RAM usage between the start of the
> `pgr_connectedComponents` query and the error output.
> 
> For these reasons, I do not expect swap / RAM changes to have any effect
> here, but instead a configuration option in PostgreSQL / pgRouting to allow
> allocations of more than 1GB?
> 
> It seems I'm hitting https://postgrespro.com/list/thread-id/1549721
> which references https://github.com/pgRouting/pgrouting/issues/291 but I
> don't see much solutions there apart from using bounding boxes (which is
> easily done for routing, much less for contraction / connected components
> analysis).
> 
> Thanks!
> Phyks
> 
> > Phyks <phyks+pgrouting at phyks.me> ezt írta (időpont: 2023. jan. 21.,
> > Szo,
> > 14:06):
> >
> >> Hi,
> >>
> >> I'm trying to use pgRouting on a quite large graph (about 40M edges).
> >> My graph structure is expected to be quite simple and not very
> >> connected (being mostly a simple tree structure).
> >>
> >> I came across the discussion on speeding up `pgr_createTopology` for
> >> large networks (
> >> https://github.com/pgRouting/pgrouting/discussions/2255#discussioncom
> >> ment-4739376)
> >>
> >> and am trying to put it in practice for my use case. Moving from
> >> `pgr_createTopology` to `pgr_extractVertices` gives a huge increase
> >> in processing time and memory requirements.
> >>
> >> However, when trying to run `pgr_*` (`pgr_dijkstra`,
> >> `pgr_contraction` or `pgr_connectedComponents` for example) on the
> >> subsequent graph, I am facing a malloc issue : invalid memory alloc request
> size 1080000000.
> >>
> >> This is easily worked around with `pgr_dijkstra` since routing is
> >> mainly local and the edges table can easily be filtered with a bounding box.
> >> This is not as easy for `pgr_contraction` or
> >> `pgr_connectedComponents` where tiling the computation (through bbox)
> >> requires some extra post-processing.
> >>
> >> I'm not sure about the units of the `malloc` error message, but it
> >> seems to me this is about 1GB allocation which should be no issue on
> >> the machine and setup I am running. For the record, my postgresql is
> >> having either a basic configuration (out of the box install) or a
> >> pgtune-d one, same issue in both cases. My machine has a bit more than
> 10GB of free RAM.
> >>
> >> Would anyone have more insights on this issue which could help me
> >> better understand and work around it? This is particularly
> >> frustrating since computations on a quarter of the table (about 10M
> >> edges) work perfectly fine and in a very reasonable amount of time.
> >>
> >> Thank you in advance,
> >> Best,
> >> _______________________________________________
> >> Pgrouting-users mailing list
> >> Pgrouting-users at lists.osgeo.org
> >> https://lists.osgeo.org/mailman/listinfo/pgrouting-users
> >>
> >
> >
> > _______________________________________________
> > Pgrouting-users mailing list
> > Pgrouting-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/pgrouting-users
> _______________________________________________
> Pgrouting-users mailing list
> Pgrouting-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pgrouting-users