[postgis-users] Insert performance issues

Daryl Herzmann akrherz at iastate.edu
Tue Oct 28 08:22:59 PST 2003


Hi Gerry and Charlton!

Just to throw my two cents in here, since I am doing a similar thing that 
you are.  Here is my methodology with PostGiS/Mapserver & Weather data.  I 
am currently using this to manage nearly 1000 stations in 'real-time'.  
Some sites update every minute.

 * Put your database on the fastest SCSI you can find!  U320 here, I 
   have a raid 10 system on U320 coming online tonight :)

 * Run vacuum; very often, perhaps every 5 minutes.  Consider running
   'vaccum full' at 3-4 in the morning, once per day.  The next version
   of PostgreSQL is supposed to have a autovacuum daemon that you can 
   just leave run all of the time..

 * Have you fully tuned your database?  This page has lots of helpful
   information.
   http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

 * Create one table for just currents, you only do updates on it, so
   table locking is not necessary

 * Create another table called summary, which contains daily summary
   values.  All of my database updates are handled by a python class
   that simulataneous updates the 'summary' table if warranted. Here
   is psuedo code

    iem = iemob('AMW')
    iem.setTS( mx.DateTime.DateTime(2003, 10, 28, 10, 0) )
    iem.setTemperatureF(85)
    iem.updatedatabase()

    Then the python class does a SQL update on the current and summary 
    table using CASE statements.

 * Create another table called 'current_log' which contains a 3 day
   (in my case) log of observations from the current table.  So when
   a current table update is done, a copy of the old row is sent to 
   the current_log table.  I then move the current_log to archive tables
   or to backup files.  

 * You will want to create indexes on the tables based on how you 
   query them.  In my case, I always index the timestamp and the station
   indentifier.

Now you can get fancy with your SQL calls and get daily hi/lo temps out 
via a simple join.  You can then go crazy with triggers and manage it all 
cleanly!  

If you are having to REINDEX, then something is not configured correctly 
or your database is lacking a good schema?  

HTH,
  Daryl

On Tue, 28 Oct 2003, Charlton Purvis wrote:

>Cool.  One more thing, you might want to consider Dave's advice in
>response to an earlier email of mine dated 10/25/2003 having to do w/
>slow addgeomoetrycolumn execution.  Once I removed it, things flew.
>
>[from Dave]
>If you want you can remove that portion from AddGeometryColumns() - just
>stick null in instead. 
>
>The only part that depends on those column being present is the advanced
>STATS package (makefile USE_STATS=1).
>
>Charlton
>
>> -----Original Message-----
>> From: Gerry Creager N5JXS [mailto:gerry.creager at tamu.edu]
>> Sent: Tuesday, October 28, 2003 9:39 AM
>> To: PostGIS Users Discussion
>> Subject: Re: [postgis-users] Insert performance issues
>> 
>> Excellent idea.  Thanks for the paradigm shift.  The use of hourly, or
>> daily, tables is one thing I'd briefly considered but not put
>sufficient
>> time into.  You've just crystallized its use.
>> 
>> I can do a much shorter, 48 hour history and dump a day's worth of
>data
>> into the archive db.  That'd give me real fast queries and allow me to
>> keep solely unique historic data (archival info _is_ important).
>> 
>> We're branching into models, especially EPA-based ozone mapping and
>> modelling, and looking at sea-breeze influence.  Hence, we're going to
>> have to become more efficient.
>> 
>> Thanks!
>> gerry
>
>

/**
 * Daryl Herzmann (akrherz at iastate.edu)
 * Program Assistant -- Iowa Environmental Mesonet
 * http://mesonet.agron.iastate.edu
 */




More information about the postgis-users mailing list