[postgis-users] Indexes

Paul Ramsey pramsey at cleverelephant.ca
Thu May 29 09:59:34 PDT 2014


Running queries does not build up an index, you CREATE INDEX, and
that's it, it's built.

The time differences you are seeing are almost certainly the result of
data being transferred up into the operating system's virtual file
system (VFS) from the disk. Reading from disk is slow; reading from
memory is fast; to try and take advantage of RAM, the operating system
caches frequently read disk pages in RAM. So, your first slow query is
going all the way to disk, the second (and subsequent) query is able
to take advantage of the cached data.

I'm pretty sure that a VACUUM command doesn't get transferred to the
slave replicas, but I could be wrong. In any event the slaves should
have their own autovacuum settings in their own configs and be
gathering stats and updating dead pages on their own.

VACUUM and ANALYZE are only going to make differences performance-wise
for data that is changing enough to (a) create a lot of dead tuples
and (b) to change the distribution of values of data in the tables.

All the forgoing to say: you're just seeing the difference between a
"cold" (uncached) and "hot" (cached) query and then making a lot of
(incorrect) inferences about indexes from that observation. If your
data is large enough that it doesn't fit all in RAM, you'll find that
as you bounce around the data set with queries some queries will be
"hot" (they'll have been cached by a previous query) and some will be
"cold", but that each query will cause different bits of data to be
swapped in and out of memory as the operating system does its best to
keep the "most useful" bits of data in RAM.

P.

On Wed, May 28, 2014 at 10:29 AM, Graham Glanfield
<graham.glanfield at geocento.com> wrote:
> Hi folks,
>
> I have streaming replication running between one master and a few slaves
> (Fedorax_86_64 - PostgreSQL 9.2.6 - POSTGIS 2.1.2).
> My application only ever reads from a slave, and I have a couple of
> questions with regards to indexes and vacuuming.
> It is my understanding (probably incorrect) that any vacuuming executed on
> the master gets propagated to the slaves as part of the streaming
> replication process.
>
> My main question is if I run a spatial query against a slave the first time,
> it may take 3 seconds, then the 2nd time it is near instant (if I query the
> same slave), so I'm assuming that the slave index/stats must be being
> updated? Vacuum is set to run once per night on the master, so does this
> mean that if I ran the query on that slave the next morning, it would take 3
> seconds again?
>
> On a similar point, I have a background process that creates dynamic spatial
> queries that I hope will help build up the GIST index, and increase the
> speed of users spatial queries, but I don't know whether to run these
> queries against the master or on each of the slaves?
>
> Best regards,
>
> Graham
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users


More information about the postgis-users mailing list