[postgis-users] CPU tuning

Mon Jan 9 01:51:53 PST 2017

Hey,
I
'm afraid you may not use the most efficient approach.

Assuming you want to find for each node of table x the closest node of
table y,
it takes less than 1 second on my computer.

DROP TABLE  IF EXISTS  test_1;
CREATE TABLE test_1 AS
SELECT s AS gid, ST_makePoint(random()*1000,random()*1000) AS geom
FROM generate_series(1,12000) AS s ;
CREATE INDEX ON test_1 USING GIST(geom) ;
ALTER TABLE test_1 ADD PRIMARY KEY (gid) ;

DROP TABLE  IF EXISTS  test_2;
CREATE TABLE test_2 AS
SELECT s AS gid, ST_makePoint( random()*1000,random()*1000) AS geom
FROM generate_series(1,50000) AS s ;
CREATE INDEX ON test_2 USING GIST(geom) ;
ALTER TABLE test_2 ADD PRIMARY KEY (gid) ;

-- option 1 : you have lots of ram : brute force
DROP TABLE IF EXISTS test_results ;
CREATE TABLE test_results AS
SELECT DISTINCT ON (test_1.gid ) test_1.gid AS gid1, test_2.gid AS gid2-- ,
ST_MakeLine(test_1.geom,test_2.geom) AS sline
FROM test_1, test_2
ORDER BY test_1.gid ,ST_Distance(test_1.geom,test_2.geom) ASC ;

--option2  : you have limited ram : snipping : force index use

DROP FUNCTION IF EXISTS test_sdist(geom1 geometry) ;
CREATE OR REPLACE FUNCTION test_sdist(geom1 geometry, OUT gid2 bigint, OUT
geom2 geometry) AS
 $$
    --given a point, find the closes one in test_2
    DECLARE
    BEGIN
        SELECT gid, geom INTO gid2, geom2
        FROM test_2
        ORDER BY ST_Distance(geom2,test_2.geom) ASC
        LIMIT 1  ;
        RETURN ;
    END;
$$ LANGUAGE plpgsql;

DROP TABLE IF EXISTS test_result_2 ;
CREATE TABLE IF NOT EXISTS test_result_2 AS
SELECT test_1.gid AS gid1, f.gid2 AS gid2
FROM test_1, test_sdist(test_1.geom) AS f

--option 3 : you could use a recursive CTE to avoid creating a dedicated
dummy function,
--           this would be a pain to write though

Cheers,
Rémi-C

2017-01-08 16:36 GMT+01:00 Olivier Leprêtre <o.lepretre at gmail.com>:

> Hi,
>
>
>
> I have a pgsql postgis function that last about an hour on an Hp Envy
> (W10, i7-6500 CPU 2,5 Ghz 8go). This function is calculating minimum
> distance between each nodes of table x (12000 nodes) and table y (42000
> nodes)
>
>
>
> FOR row IN
>
>                    SELECT code,ST_AsEwkt(ST_StartPoint(geom)) as geom1
>  FROM t_x
>
> LOOP
>
>                    FOR row1 IN
>
>                             SELECT code, ST_Distance(ST_AsEwkt(geom)::text,row.geom1)
> FROM t_y order by ST_Distance(ST_AsEwkt(geom)::text,row.geom1) LIMIT 1
>
>
>
>                    LOOP
>
>                    END LOOP;
>
> ....
>
>
>
>
>
> What I don't understand is that postgresql server process CPU is stable
> during all that time at 30/31% with no other process running. Details of
> each processor 1 to 3 shows that all 4 uses 30%. Disk and network usage are
> 0% (database is local) Ram usage is 47% and Postgresql server uses process
> only 7,4 Mo.
>
>
>
> My question is "Why postgres process uses such a precise and constant cpu
> and doesn't grow up to more than 30% ?" I didn't find any parameter forcing
> such a limit in order to reduce duration of this calculation.
>
>
>
> Thanks for any answer,
>
>
>
> Olivier
>
>
> ------------------------------
> [image: Avast logo]
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20170109/4ff0d67b/attachment.html>