[postgis-users] Query uses too much memory

Wed Jan 21 15:15:21 PST 2009

Dear all,

I have a fairly simple (or so I thought) PostGIS query which eats my 
machine: it consumes more and more memory until the machine begins to 
thrash and becomes unresponsive. I have 1GB memory, and AFAICT Postgres 
is configured to use a fairly modest portion of that. The query consumes 
at least 1200GB. It is OK if the query takes a while, but I can't have 
it crashing my box. ;)

Here is the query:

> insert into wh_viewport_familiarity (username, viewport_id, score)
> select
>    username,
>    wv.id as viewport_id,
>    sum(score) as score
> from
>    wh_familiarity wf
>    join wh_viewport wv on ST_Intersects(wf.geometry, wv.geometry)
> group by username, viewport_id;

Note that wh_familiarity is a view, defined as:

>  SELECT pf.username, pf.score, p.geometry
>    FROM wh_point_familiarity pf
>    JOIN point p ON pf.point_id = p.id AND pf.point_version = p.version
> UNION ALL 
>  SELECT bf.username, bf.score, bs.geometry
>    FROM wh_byway_familiarity bf
>    JOIN byway_segment bs ON bf.byway_id = bs.id AND bf.byway_version = bs.version;

Here is the EXPLAIN:

>                                                       QUERY PLAN                                                      
> ----------------------------------------------------------------------------------------------------------------------
>  Subquery Scan "*SELECT*"  (cost=969612.43..970964.38 rows=45065 width=44)
>    ->  HashAggregate  (cost=969612.43..970175.74 rows=45065 width=40)
>          ->  Nested Loop  (cost=81.60..969274.44 rows=45065 width=40)
>                Join Filter: _st_intersects(wf.geometry, wv.geometry)
>                ->  Append  (cost=81.60..397329.54 rows=1997133 width=68)
>                      ->  Hash Join  (cost=81.60..5507.26 rows=71222 width=40)
>                            Hash Cond: ((pf.point_id = p.id) AND (pf.point_version = p.version))
>                            ->  Seq Scan on wh_point_familiarity pf  (cost=0.00..1197.24 rows=70324 width=23)
>                            ->  Hash  (cost=53.04..53.04 rows=1904 width=33)
>                                  ->  Seq Scan on point p  (cost=0.00..53.04 rows=1904 width=33)
>                      ->  Merge Join  (cost=335494.26..371850.95 rows=1925911 width=112)
>                            Merge Cond: ((bs.version = bf.byway_version) AND (bs.id = bf.byway_id))
>                            ->  Sort  (cost=31806.55..32247.51 rows=176382 width=105)
>                                  Sort Key: bs.version, bs.id
>                                  ->  Seq Scan on byway_segment bs  (cost=0.00..6184.82 rows=176382 width=105)
>                            ->  Sort  (cost=303687.71..308950.44 rows=2105094 width=23)
>                                  Sort Key: bf.byway_version, bf.byway_id
>                                  ->  Seq Scan on wh_byway_familiarity bf  (cost=0.00..35824.94 rows=2105094 width=23)
>                ->  Index Scan using wh_viewport_gist on wh_viewport wv  (cost=0.00..0.27 rows=1 width=117)
>                      Index Cond: (wf.geometry && wv.geometry)
>                      Filter: (wf.geometry && wv.geometry)

Adding "where wv.id % 10 = 0" barely affects the plan/predicted cost, 
and the query finishes in ~15 min with 468M memory consumed at peak.

Full schema of these tables at http://www.cs.umn.edu/~reid/schema.txt

Row counts:

wh_viewport              13,539
wh_point_familiarity     70,324
point                     1,904
wh_byway_familiarity  2,105,094
byway_segment           176,382

We are on Postgres 8.2.7 on Ubuntu Hardy. postgis_full_version() is:

>  POSTGIS="1.3.3" GEOS="3.0.0-CAPI-1.4.1" PROJ="Rel. 4.6.0, 21 Dec 2007" USE_STATS

postgresql.conf is at http://www.cs.umn.edu/~reid/postgresql.conf

Any ideas? Is this something upgrading to GEOS 3.0.3 might help with? 
(And where's the release notes for GEOS? I downloaded the 3.0.3 tarball 
and ChangeLog stops with 3.0.0.)

Thanks so much for your help,

Reid