[postgis-users] design problem

pcreso at pcreso.com pcreso at pcreso.com
Fri Mar 1 14:24:34 PST 2013


Hi Denise,

You also have the option of using a metadata catalogue, such as Geonetwork or CKAN to catalogue the incoming files. I have a colleague who has developed a system to search through a folder of incoming files, containing text, spreadsheets, word docs etc. These are parsed to extract meaningful metadata and a Geonetwork record is automatically created for each new file. 

With a Postgis solution (we have a timeseries database of 350,000,000 records), it is partitioned on year of timestamp, with a clustered index on the timestamp in all but the current partition. This approach could perhaps work well for your application.

Files on the filesystem, referenced/catalogued by path (or URI) in a timestamped database record. Create a new partition whenever you want to keep the partitions a manageable size.

If you have a standard query that returns multiple "neighbouring" records based on a field (on timestamp for example) then add a clustered index on this field so that you are maximising the hits per disk read.


Brent 

--- On Sat, 3/2/13, Andy Colson <andy at squeakycode.net> wrote:

From: Andy Colson <andy at squeakycode.net>
Subject: Re: [postgis-users] design problem
To: "PostGIS Users Discussion" <postgis-users at lists.osgeo.org>
Date: Saturday, March 2, 2013, 10:44 AM

On 3/1/2013 3:11 PM, Denise Janson wrote:
> Hi,
> 
> I have an application that is going to receive lots of georeferenced files
> every day. Each file has information of several points. Probably in few
> years my application will have a Tera of points information stored.
> 
> I think I can do this design  in two ways:
> 
> 1. Two tables, one of “uploaded_files”, and another of “points” (one
> uploadedFile to N points). And I'll have to partition the points table,
> maybe by month …
> 2. Or I can create one table per file, having thousands of tables in few
> years.
> 
> Which case is better for my application?  Is there any better way to do
> this?
> 

If performance is a concern, and the file's are of any meaningful size, you might consider leaving them on the filesystem and have the table point to it (full path name sort of thing).

Storing the file in PG is possible, and its nice because everything is kept together, but if you have to get to and read the files fast, then leave them on the filesystem.

The lots of tables approach is problematic if you ever want to write queries that look back in time.  Its much harder to say, give me every record from the beginning of time at this point.

With a good index, PG wont have a problem with a single table containing billions of rows.  Just try to avoid doing bulk operations (like update and delete) on the entire table.


> uploadedFile to N points). And I'll have to partition the points table,


Why will you have to partition it?

-Andy
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20130301/20724ff1/attachment.html>


More information about the postgis-users mailing list