[postgis-users] speeding up "not in" query
davideps at umich.edu
Sun Jul 25 21:24:24 PDT 2010
Thank you all for your suggestions.
My original query not only took 4.6 hours to run, but also returned the
wrong answer. The "not exists" version below returned the correct answer
(as far as I can tell) in 2 minutes. I thought these were essentially
the same query but apparently they are not. I have not yet tried the
left join version.
WHERE NOT EXISTS
FROM survey WHERE parcels_esri.parcelnumb = survey.parcelnum)
"Seq Scan on parcels_esri (cost=0.00..3320785.37 rows=193444
width=4537) (actual time=17.793..4154.274 rows=84489 loops=1)"
" Filter: (NOT (subplan))"
" -> Index Scan using parcelnum_idx on survey (cost=0.00..8.35
rows=1 width=26) (actual time=0.009..0.009 rows=1 loops=386887)"
" Index Cond: ($0 = parcelnum)"
"Total runtime: 4200.203 ms"
Running EXPLAIN ANALYZE on my original query takes more time than I am
willing to wait. This seems like a very problematic formulation:
WHERE parcels_esri.parcelnumb NOT IN
Thanks again for your help. I now know the huge improvements that can
come from such variations.
More information about the postgis-users