[postgis-users] Severe performance problems with prepared statements

Sun Jan 5 10:12:19 PST 2014

On 01/05/2014 11:12 AM, William Becker wrote:
> I just tried changing my code base from using hard coded queries to prepared statements. As part of this I pass a parameter as the argument to the ST_GeographyFromText() function. You can see the code here (1).
>
> After doing this, it took ~20 seconds to run, when previously it took ~2 seconds. I thought this might be because it wasn't using an index anymore, but I checked this out - when it was previously taking 2 seconds, it also ran without using the index. For some reason it's using it on my dev machine, but this has a smaller dataset - I'll try tuning this but it does not seem to be the cause of the issue.
>
> I "improved" this by templatising the query, so that a new query is always generated with a different hard coded geography, and this speeds things up - code is here (2). However, it now needs to create a separate query for each execution, thus it also needs to generate a new plan every time, which is not ideal.
>
> I have attached query plans for the slow run (3) and the sped up run (4). (I apologise that the queries aren't exactly the same in each case, but the timings are pretty much the same regardless of what the other parameters are).
>
> You can also see the create statement for the database here: (5)
>
> I'm not entirely sure if the problem lies with postgis or postgres in general, but I thought I'd present it to you nice people first, since you are probably in a better position to decide! This is on postgis 2.0 with postgres 9.2. I just upgraded to 9.3 and 2.1 but I need to re-process all my data (which will take overnight) before I can do anything.
>
> My guess is that it is evalutating the parameter each time it applies the where in the query and there is something slow about this. I'd poke around the relavant code if someone could give me some pointers about where to look!
>
> I've seem some other threads that seem to be related:
> http://postgresql.1045698.n5.nabble.com/avoid-prepared-statements-on-complex-queries-td4996363.html
> http://grokbase.com/t/postgresql/pgsql-performance/127rp189dw/geoserver-postgis-performance-problems
>
> but it seems that I have isolated it just down to that scan taking a long time with the prepared statement, but quick without it?
>
> Cheers,
> Will
>
> (1) https://github.com/twistedvisions/anaximander/blob/c0a7c89e438bcf2504ed8ae3d5f75828e11c2ec9/db_templates/get_event_attendees_in_area.sql
>
> (2) https://github.com/twistedvisions/anaximander/blob/12e65537f1d3887677b592d1f0ac0203f2da7d39/db_templates/get_event_attendees_in_area.sql
>
> (3) http://explain.depesz.com/s/kb1A
>
> (4) http://explain.depesz.com/s/G9Wj
>
> (5) https://github.com/twistedvisions/anaximander/blob/f6ab21ebff4cf3541bfb53715addb4b3181d5d2e/db/create.sql
>

It looks like a bunch of row estimates are off.  Have you run analyze on the db?

> a new query is always generated with a different hard coded geography.  However, it now needs to create a separate query for each execution, thus it also needs to generate a new plan every time, which is not ideal.

Actually, creating a plan can use information about the data to make a better plan.  That's why using a $1 parameter may not give optimal plans.  When you say "where id < 5", the planer knows the distribution of data, so it knows about the probability's of 5.  But when you say "where id < $1", it has to just use a generic plan.

As long as you dont have your stats target too high, plan time is very fast, and compared to a bad plan, could easily be worth paying the price to re-plan on every execute.

Any reason you are using HASH indexes?  They usually aren't better.  Have you tried it with btree?

Hum?  As a simple test, does this use an index:

select count(*) from place
where st_covers('0103000020E610000001000000050000000000D33CB60D9C3FB0941EB661BD49400000D33CB60D9C3FFF56F03723BA494000002DC349F290BFFF56F03723BA494000002DC349F290BFB0941EB661BD49400000D33CB60D9C3FB0941EB661BD4940'::geography, location)

After you analyze does it?

Does this:

prepare x as select count(*) from place where st_covers($1::geography, location);
explain analyze execute x('01030000....');
deallocate x;

-Andy