[OSGeo-Discuss] Raster data on RDBMS

Sylvan Ascent Inc. sylvanascent at mail2web.net
Wed Oct 29 07:32:47 PDT 2008


Mike and Ivan,
 
I'd like to see them also compared to a caching solution, like GeoWebCache, or TileCache. These effectively create a file-based "database" of little bitty tiles at certain resolutions, kind of like a tile pyrimid that is created gradually over time as the image data is accessed.
 
One would think the file-based cache system would be faster than a similar database solution, with the database solution giving no real benefits that I can see.
 
>>>I basically agreed with that but
>>> after seeing how fragile and complicated even a well defined structure of
>>> folders and files could be I would vote in favor of the good and old
>>> relational model.
 
Since the cache is maintained by the software in a completely defined way, and never messed with by humans, I wonder what could go wrong?
 
Roger Bedell, Sylvan Ascent Inc.
800-362-8971
+34 626 855 662
roger at sylvanascent.com <mailto:roger at sylvanascent.com> 

________________________________

From: discuss-bounces at lists.osgeo.org on behalf of Smith, Michael ERDC-CRREL-NH
Sent: Wed 10/29/2008 10:13 AM
To: Lucena, Ivan; OSGeo Discussions; Paul Ramsey
Subject: Re: [OSGeo-Discuss] Raster data on RDBMS



Ivan,

Those numbers look impressive. We are just starting to set up some new
hardware here and I plan to do some testing also. Perhaps we can collaborate
and come up with a test suite in order to track these numbers across builds.

Mike


--
Michael Smith
RSGIS Center
ERDC - CRREL
US Army Corps of Engineers




On 10/29/08  1:35 AM, "Lucena, Ivan" <ivan.lucena at pmldnet.com> wrote:

> Paul,
>
> Good thought.
>
> Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1).
> That is good because GeoTiffs doesn't tile on band space. So I would imagine
> that if I tiled the GeoTiff this way:
>
> ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF
> Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m42.991s
> user 0m20.289s
> sys   0m2.516s
>
> The comparison would be fair:
>
> ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF
> out2.tif -srcwin 0 0 2000 2000
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m1.604s
> user 0m1.156s
> sys  0m0.444s
>
> What do you think?
>
> I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one
> application could take advantage of it by telling Oracle to cache the BLOB in
> memory. So the next time a user zoom-in the performance would be even better.
>
> I am trying to setup a mapserver experiment on that issue but for now I would
> like to keep my analysis on that very simple process of extracting a subset.
>
> Best regards,
>
> Ivan
>
>
>>  -------Original Message-------
>>  From: Paul Ramsey <pramsey at cleverelephant.ca>
>>  Subject: Re: [OSGeo-Discuss] Raster data on RDBMS
>>  Sent: Oct 29 '08 05:00
>> 
>>  The data is chunked in Oracle into tiles, so unless you tile the TIFF
>>  as well you aren't really doing a direct comparison. Even if you end
>>  up with the same numbers for both processes, I'll still be impressed,
>>  since I assumed Oracle would have a higher overhead.
>> 
>>  P.
>> 
>>  On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <ivan.lucena at pmldnet.com>
>> wrote:
>>> Hi There,
>>>
>>> I would like to return to a discussion that we had months ago about raster
>>> on RDBMS. But this time I would like to present some number.
>>>
>>> As long as I could recall there was basically two major arguments contrary
>>> to storing raster on RDBMS. One very pragmatical: "Why waste precious
>>> process time with the overhead of dealing with queries, tables, client-sever
>>> back and forth just to get the data from BLOB fields on a database when you
>>> can get it directly from the file system?". The other argument was
>>> semantical: "Why store raster on RDBMS if in general we are not expecting to
>>> have a transactions on that data?"
>>>
>>> I cannot argue against the second one. I basically agreed with that but
>>> after seeing how fragile and complicated even a well defined structure of
>>> folders and files could be I would vote in favor of the good and old
>>> relational model.
>>>
>>> That is my experiment. I downloaded two free data samples from Naveteq
>>> website. Two geotiff files with the same size and number of bands (14336,
>>> 14336,  3):
>>>
>>> ilucena at think:~/Data> du -k Barcelona_2007_R2C2.TIF
>>> 602828  Barcelona_2007_R2C2.TIF
>>> ilucena at think:~/Data> du -k San_Francisco_2006_R1C2.TIF
>>> 602828  San_Francisco_2006_R1C2.TIF
>>>
>>> Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The
>>> loading process is comparable than some commercial ETL products on the
>>> market. It took about 2 minutes to load each image.
>>>
>>> ilucena at think:~/Data> time gdal_translate -of georaster
>>> Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
>>> real  1m54.973s
>>> user 0m4.368s
>>> sys   0m1.936s
>>>
>>> If you are a Oracle GeoRaster users you might be excited about those number
>>> already but those are not the numbers I want to show. What I would like to
>>> do is to compare the time that it takes to extract subset from the original
>>> geotiff and compare with the time to extract the same subset from the RDBMS.
>>> He are the numbers:
>>>
>>> ilucena at think:~/Data> time gdal_translate
>>> georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> real      0m0.720s
>>> user 0m0.408s
>>> sys   0m0.108s
>>>
>>> ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF out2.tif
>>> -srcwin 0 0 2000 2000
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> real      0m1.177s
>>> user 0m0.976s
>>> sys       0m0.188s
>>>
>>> And I also checked the integrity of the results to see if I get the same
>>> result:
>>>
>>> ilucena at think:~/Data> gdalinfo -checksum out.tif
>>> ...
>>> Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>>>   Checksum=58248
>>> Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>>>   Checksum=21226
>>> Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>>>   Checksum=8002
>>>
>>> ilucena at think:~/Data> gdalinfo -checksum out2.tif
>>> ...
>>> Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>>>   Checksum=58248
>>> Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>>>   Checksum=21226
>>> Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>>>   Checksum=8002
>>>
>>> What are others test would be interesting to perform?
>>>
>>> Best regards,
>>>
>>> Ivan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Discuss mailing list
>>> Discuss at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/discuss
>>>
>> 
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
Discuss at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 12145 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/discuss/attachments/20081029/e0859f34/attachment-0002.bin>


More information about the Discuss mailing list