[postgis-users] Query uses too much memory
Reid Priedhorsky
reid at umn.edu
Wed Jan 21 15:15:21 PST 2009
Dear all,
I have a fairly simple (or so I thought) PostGIS query which eats my
machine: it consumes more and more memory until the machine begins to
thrash and becomes unresponsive. I have 1GB memory, and AFAICT Postgres
is configured to use a fairly modest portion of that. The query consumes
at least 1200GB. It is OK if the query takes a while, but I can't have
it crashing my box. ;)
Here is the query:
> insert into wh_viewport_familiarity (username, viewport_id, score)
> select
> username,
> wv.id as viewport_id,
> sum(score) as score
> from
> wh_familiarity wf
> join wh_viewport wv on ST_Intersects(wf.geometry, wv.geometry)
> group by username, viewport_id;
Note that wh_familiarity is a view, defined as:
> SELECT pf.username, pf.score, p.geometry
> FROM wh_point_familiarity pf
> JOIN point p ON pf.point_id = p.id AND pf.point_version = p.version
> UNION ALL
> SELECT bf.username, bf.score, bs.geometry
> FROM wh_byway_familiarity bf
> JOIN byway_segment bs ON bf.byway_id = bs.id AND bf.byway_version = bs.version;
Here is the EXPLAIN:
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Subquery Scan "*SELECT*" (cost=969612.43..970964.38 rows=45065 width=44)
> -> HashAggregate (cost=969612.43..970175.74 rows=45065 width=40)
> -> Nested Loop (cost=81.60..969274.44 rows=45065 width=40)
> Join Filter: _st_intersects(wf.geometry, wv.geometry)
> -> Append (cost=81.60..397329.54 rows=1997133 width=68)
> -> Hash Join (cost=81.60..5507.26 rows=71222 width=40)
> Hash Cond: ((pf.point_id = p.id) AND (pf.point_version = p.version))
> -> Seq Scan on wh_point_familiarity pf (cost=0.00..1197.24 rows=70324 width=23)
> -> Hash (cost=53.04..53.04 rows=1904 width=33)
> -> Seq Scan on point p (cost=0.00..53.04 rows=1904 width=33)
> -> Merge Join (cost=335494.26..371850.95 rows=1925911 width=112)
> Merge Cond: ((bs.version = bf.byway_version) AND (bs.id = bf.byway_id))
> -> Sort (cost=31806.55..32247.51 rows=176382 width=105)
> Sort Key: bs.version, bs.id
> -> Seq Scan on byway_segment bs (cost=0.00..6184.82 rows=176382 width=105)
> -> Sort (cost=303687.71..308950.44 rows=2105094 width=23)
> Sort Key: bf.byway_version, bf.byway_id
> -> Seq Scan on wh_byway_familiarity bf (cost=0.00..35824.94 rows=2105094 width=23)
> -> Index Scan using wh_viewport_gist on wh_viewport wv (cost=0.00..0.27 rows=1 width=117)
> Index Cond: (wf.geometry && wv.geometry)
> Filter: (wf.geometry && wv.geometry)
Adding "where wv.id % 10 = 0" barely affects the plan/predicted cost,
and the query finishes in ~15 min with 468M memory consumed at peak.
Full schema of these tables at http://www.cs.umn.edu/~reid/schema.txt
Row counts:
wh_viewport 13,539
wh_point_familiarity 70,324
point 1,904
wh_byway_familiarity 2,105,094
byway_segment 176,382
We are on Postgres 8.2.7 on Ubuntu Hardy. postgis_full_version() is:
> POSTGIS="1.3.3" GEOS="3.0.0-CAPI-1.4.1" PROJ="Rel. 4.6.0, 21 Dec 2007" USE_STATS
postgresql.conf is at http://www.cs.umn.edu/~reid/postgresql.conf
Any ideas? Is this something upgrading to GEOS 3.0.3 might help with?
(And where's the release notes for GEOS? I downloaded the 3.0.3 tarball
and ChangeLog stops with 3.0.0.)
Thanks so much for your help,
Reid
More information about the postgis-users
mailing list