Re: [OSGeo-Discuss] Raster data on RDBMS

Lucena, Ivan ivan.lucena at pmldnet.com
Tue Oct 28 22:35:19 PDT 2008


Paul,

Good thought. 

Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1). That is good because GeoTiffs doesn't tile on band space. So I would imagine that if I tiled the GeoTiff this way:

ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
real	  0m42.991s
user 0m20.289s
sys	  0m2.516s

The comparison would be fair:

ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF out2.tif -srcwin 0 0 2000 2000
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
real	  0m1.604s
user 0m1.156s
sys	 0m0.444s

What do you think?

I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one application could take advantage of it by telling Oracle to cache the BLOB in memory. So the next time a user zoom-in the performance would be even better. 

I am trying to setup a mapserver experiment on that issue but for now I would like to keep my analysis on that very simple process of extracting a subset. 

Best regards,

Ivan


>  -------Original Message-------
>  From: Paul Ramsey <pramsey at cleverelephant.ca>
>  Subject: Re: [OSGeo-Discuss] Raster data on RDBMS
>  Sent: Oct 29 '08 05:00
>  
>  The data is chunked in Oracle into tiles, so unless you tile the TIFF
>  as well you aren't really doing a direct comparison. Even if you end
>  up with the same numbers for both processes, I'll still be impressed,
>  since I assumed Oracle would have a higher overhead.
>  
>  P.
>  
>  On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <ivan.lucena at pmldnet.com> wrote:
>  > Hi There,
>  >
>  > I would like to return to a discussion that we had months ago about raster on RDBMS. But this time I would like to present some number.
>  >
>  > As long as I could recall there was basically two major arguments contrary to storing raster on RDBMS. One very pragmatical: "Why waste precious process time with the overhead of dealing with queries, tables, client-sever back and forth just to get the data from BLOB fields on a database when you can get it directly from the file system?". The other argument was semantical: "Why store raster on RDBMS if in general we are not expecting to have a transactions on that data?"
>  >
>  > I cannot argue against the second one. I basically agreed with that but after seeing how fragile and complicated even a well defined structure of folders and files could be I would vote in favor of the good and old relational model.
>  >
>  > That is my experiment. I downloaded two free data samples from Naveteq website. Two geotiff files with the same size and number of bands (14336, 14336,  3):
>  >
>  > ilucena at think:~/Data> du -k Barcelona_2007_R2C2.TIF
>  > 602828  Barcelona_2007_R2C2.TIF
>  > ilucena at think:~/Data> du -k San_Francisco_2006_R1C2.TIF
>  > 602828  San_Francisco_2006_R1C2.TIF
>  >
>  > Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The loading process is comparable than some commercial ETL products on the market. It took about 2 minutes to load each image.
>  >
>  > ilucena at think:~/Data> time gdal_translate -of georaster Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
>  > Input file size is 14336, 14336
>  > 0...10...20...30...40...50...60...70...80...90...100 - done.
>  > Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
>  > real  1m54.973s
>  > user 0m4.368s
>  > sys   0m1.936s
>  >
>  > If you are a Oracle GeoRaster users you might be excited about those number already but those are not the numbers I want to show. What I would like to do is to compare the time that it takes to extract subset from the original geotiff and compare with the time to extract the same subset from the RDBMS. He are the numbers:
>  >
>  > ilucena at think:~/Data> time gdal_translate georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000
>  > Input file size is 14336, 14336
>  > 0...10...20...30...40...50...60...70...80...90...100 - done.
>  > real      0m0.720s
>  > user 0m0.408s
>  > sys   0m0.108s
>  >
>  > ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF out2.tif -srcwin 0 0 2000 2000
>  > Input file size is 14336, 14336
>  > 0...10...20...30...40...50...60...70...80...90...100 - done.
>  > real      0m1.177s
>  > user 0m0.976s
>  > sys       0m0.188s
>  >
>  > And I also checked the integrity of the results to see if I get the same result:
>  >
>  > ilucena at think:~/Data> gdalinfo -checksum out.tif
>  > ...
>  > Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>  >  Checksum=58248
>  > Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>  >  Checksum=21226
>  > Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>  >  Checksum=8002
>  >
>  > ilucena at think:~/Data> gdalinfo -checksum out2.tif
>  > ...
>  > Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>  >  Checksum=58248
>  > Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>  >  Checksum=21226
>  > Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>  >  Checksum=8002
>  >
>  > What are others test would be interesting to perform?
>  >
>  > Best regards,
>  >
>  > Ivan
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  > _______________________________________________
>  > Discuss mailing list
>  > Discuss at lists.osgeo.org
>  > http://lists.osgeo.org/mailman/listinfo/discuss
>  >
>  



More information about the Discuss mailing list