[postgis-users] Insert performance issues
Charlton Purvis
cpurvis at asg.sc.edu
Tue Oct 28 06:26:45 PST 2003
Hey, Gerry:
Good. So I wasn't too far off base. A rare occurrence, these days.
I'm working w/ two type of data: (1) near real-time oceanographic and
meteorological data, and (2) model hindcasting and forecasting. So you
can bet your bottom dollar that real-time snags is important. It's
farther along than "experimental", but it can stand to endure some
revisions. However, I think our solution works well for our beast.
It's likely that my (1) matches your situation more readily. I update
my stuff on an hourly basis, namely because the buoys and weather
stations across the SE all have hourly updates as their lowest common
denominator. The good news for me is that these hourly inserts are
relatively speedy and don't require for me to lock the table. Yes, just
one table. And I don't use the copy command, just good old inserts.
This is where it differs from my (2).
The modeling stuff (2) is a daily update, and it is mucho data that uses
the solution I presented to you below. But w/ a twist. Since I know
that my stuff is visible in hourly increments, e.g. show me the hourly
forecast for the next 24 hours across the SE Atlantic, when I am
inserting the newest forecast in the evening, I create a separate table
for each hour. Each hourly table consists of about 25k records. Not
too bad, but when you multiply that by several days, several models, and
several variables, it's a really big deal. Creating these hourly tables
results in very, very speedy query returns since I take care of the
logic up front about which table to choose based on which hour the user
wants to see. It's likely that I could have gotten around this w/
multiple column indexing, but I couldn't get a geom to gel w/ other
column types.
Anyway, just my food for thought. I'd say that if you could find a
reliable way to divide your data time-wise, consider separating the data
out into different time-based tables. It's great when you start
considering putting different time chunks on different disks or
databases or boxes, and creating cached images is a snap and doesn't
muck up performance if you've got everything placed and tuned correctly.
Just my thoughts.
Charlton
> -----Original Message-----
> From: Gerry Creager N5JXS [mailto:gerry.creager at tamu.edu]
> Sent: Tuesday, October 28, 2003 8:38 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Insert performance issues
>
> So: I'm doing a whole slew of inserts (OK, COPY's) on a frequent
basis,
> say, every 6 minutes. I can lock the table, take down the index, do
the
> COPY, unlock the table, reindex. Doesn't take _too_ long 'til I get
to
> the new index. I've got to be able to snag those data in almost real
> (read: "as inserted") time for display on my Mesonet site.
>
> I'm concerned that your solution, while we've found it to be the
> fastest, too, will adversely affect our SELECT performance and
plotting.
>
> Thanks,
> Gerry
More information about the postgis-users
mailing list