[postgis-users] Large dataset help
Paul Scott
pscott at uwc.ac.za
Wed Mar 23 06:24:12 PST 2005
> A search of the archives of this mailing list will provide some previous dialogue on this subject ... storing images in a database has some advantages -- the invaluable benefits of ACID for one, and uniform backups for a somewhat related issue. The problem seems to be more in the operational end -- databases are very good at certain things, and streaming large amounts of image data is not necessarily something they excel at.
>
Thank you, I have been looking through some earlier posts, I am
beginning to formulate a plan!
> Let me give a crude example. I worked on applications at TRW that stored image data in Informix BLOBs; Informix uses a fixed page size (generally 2k, depending on the operating system), but with BLOB objext you can define the size of the page. We had TIFF images that were all 10-15k in size, so we size out BLOBs at 15k which led to very streamlined retrieval -- a single disk read (well, Informix using raw disk storage) would get an entire image in one operation. If we had stored them all in normal pages it would have taken ~7 operations to get the same BLOB.
>
> Postgres uses a very different method of disk access, which changes the picture, more so if you have a database with more transient data.
>
> Our (e.g. my current employer and by extension, me) current system puts the images onto servers which are superb at handling lots of simultaneous requests for large amounts of data; the database stores metadata (acquisition date, owner, etc) and spatial data about the image, and we let the disk cache and filers do the bulk of the work in the access; the database just tells us what to retrieve.
>
> The precise life cycle of your data might make an enormous difference in the "best" solution. We have lots of relatively static data (some of which is really nonchanging, some of which is a replacement of older data); if this data were changing more frequently the required careful syncronization betweenm disk/image and disk/databse might be more problematic and I'd lean towards an "all-in-the-db" solution. System security and access rights might also play a part in your analysis.
>
At the moment I have no idea as to what the expected lifecycle of the
images especially, will be. I obviously need to do a lot more talking
and red-tape cutting before I really hanker down on this one.
I was just trying to get a feel of what may be possible, keeping massive
scalability in mind at all times for now.
> The # of simultaneous connections is also an issue to consider. Some standard disk storage systems will collapse if you hit them with 50-100 requests for different data (obviously, using a database doesn't make this problem go away bit is does the ground rules).
>
I should think that the traffic aspect will play a large role. There
should be, at any one time, day or night, approximately 50-80
simultaneous requests. This is the only project of this nature in SA.
> Backup strategies might also come into play here -- what are the impacts on the whole system if you have to replace some 20% of your data ?
>
Backups will be taken care of off site, but rapid restoration and
backups will be priority.
> Vector data is a thing that so far we put exclusively into postGIS, but we don't have to play with DEM data or other such datasets much; some solutions other than a database might be worth considering if there are such large point data sets.
>
Yeah, I have handled large vector datasets before in postGIS - smiling
all the way may I add! As I said in my previous post, this is the first
time that I get to play with raster datasets of this size, so I am a
little intimidated...
> Sorry for such a meandering post, but this is not a clear-cut issue .
>
I appreciate every little bit of help I can get - Thank you!
> My gut level feeling is towards putting iamgery outside of the database and dealing with syncronizing it and data about it seperately.
>
I was tending towards the same, but I do need to do a little more
research!
> HTH,
>
Thanks it does!
More information about the postgis-users
mailing list