[postgis-users] Query performance issue

Thu Oct 13 10:18:12 PDT 2005

On Thu, Oct 13, 2005 at 12:11:43PM -0400, Stephen Woodbridge wrote:
> Well the query plan looks better, but it has been running for 13+ hrs. 
> Also on the query I changed the intersects() to distance() = 0.0 which I 
> think is supposed to be a little quicker.
>
> Is there any way to know how far it has proceeded? I would hate to abort 
> the update only to find out later that it was 90% done?

You could install contrib/pgstattuple and watch dead_tuple_count
in another session.  If you know the table had no dead tuples prior
to the update (e.g., you had just vacuumed it) then dead_tuple_count /
tuple_count should tell you how much of the table has been updated.
If you don't know how many dead tuples there were then you could
calculate dead_tuple_count's rate of increase and perhaps make an
estimate.

Another possibility would be to look at rows' xmax columns.  As
each row is updated, it should receive a new xmax value that's
visible to other transactions, even while the update is still in
progress.  Count how many rows have the new value to see how many
rows have been updated (it'll be changing, but at least you'll get
an idea).  Something like the following might do the trick:

SELECT count(*) FROM streets WHERE xmax = (SELECT xmax FROM streets LIMIT 1);

This assumes that "SELECT xmax FROM streets LIMIT 1" returns the
transaction ID of the update.  You could query pg_locks and
pg_stat_activity (if you have stats_command_string enabled) to see
if the process using that transaction ID is indeed the one doing
the update.

-- 
Michael Fuhr