[OSGeo-Discuss] Raster data on RDBMS

Smith, Michael ERDC-CRREL-NH Michael.Smith at usace.army.mil
Wed Oct 29 07:13:10 PDT 2008


Ivan,

Those numbers look impressive. We are just starting to set up some new
hardware here and I plan to do some testing also. Perhaps we can collaborate
and come up with a test suite in order to track these numbers across builds.

Mike


-- 
Michael Smith
RSGIS Center
ERDC - CRREL
US Army Corps of Engineers




On 10/29/08  1:35 AM, "Lucena, Ivan" <ivan.lucena at pmldnet.com> wrote:

> Paul,
> 
> Good thought. 
> 
> Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1).
> That is good because GeoTiffs doesn't tile on band space. So I would imagine
> that if I tiled the GeoTiff this way:
> 
> ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF
> Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m42.991s
> user 0m20.289s
> sys   0m2.516s
> 
> The comparison would be fair:
> 
> ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF
> out2.tif -srcwin 0 0 2000 2000
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m1.604s
> user 0m1.156s
> sys  0m0.444s
> 
> What do you think?
> 
> I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one
> application could take advantage of it by telling Oracle to cache the BLOB in
> memory. So the next time a user zoom-in the performance would be even better.
> 
> I am trying to setup a mapserver experiment on that issue but for now I would
> like to keep my analysis on that very simple process of extracting a subset.
> 
> Best regards,
> 
> Ivan
> 
> 
>>  -------Original Message-------
>>  From: Paul Ramsey <pramsey at cleverelephant.ca>
>>  Subject: Re: [OSGeo-Discuss] Raster data on RDBMS
>>  Sent: Oct 29 '08 05:00
>>  
>>  The data is chunked in Oracle into tiles, so unless you tile the TIFF
>>  as well you aren't really doing a direct comparison. Even if you end
>>  up with the same numbers for both processes, I'll still be impressed,
>>  since I assumed Oracle would have a higher overhead.
>>  
>>  P.
>>  
>>  On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <ivan.lucena at pmldnet.com>
>> wrote:
>>> Hi There,
>>> 
>>> I would like to return to a discussion that we had months ago about raster
>>> on RDBMS. But this time I would like to present some number.
>>> 
>>> As long as I could recall there was basically two major arguments contrary
>>> to storing raster on RDBMS. One very pragmatical: "Why waste precious
>>> process time with the overhead of dealing with queries, tables, client-sever
>>> back and forth just to get the data from BLOB fields on a database when you
>>> can get it directly from the file system?". The other argument was
>>> semantical: "Why store raster on RDBMS if in general we are not expecting to
>>> have a transactions on that data?"
>>> 
>>> I cannot argue against the second one. I basically agreed with that but
>>> after seeing how fragile and complicated even a well defined structure of
>>> folders and files could be I would vote in favor of the good and old
>>> relational model.
>>> 
>>> That is my experiment. I downloaded two free data samples from Naveteq
>>> website. Two geotiff files with the same size and number of bands (14336,
>>> 14336,  3):
>>> 
>>> ilucena at think:~/Data> du -k Barcelona_2007_R2C2.TIF
>>> 602828  Barcelona_2007_R2C2.TIF
>>> ilucena at think:~/Data> du -k San_Francisco_2006_R1C2.TIF
>>> 602828  San_Francisco_2006_R1C2.TIF
>>> 
>>> Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The
>>> loading process is comparable than some commercial ETL products on the
>>> market. It took about 2 minutes to load each image.
>>> 
>>> ilucena at think:~/Data> time gdal_translate -of georaster
>>> Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
>>> real  1m54.973s
>>> user 0m4.368s
>>> sys   0m1.936s
>>> 
>>> If you are a Oracle GeoRaster users you might be excited about those number
>>> already but those are not the numbers I want to show. What I would like to
>>> do is to compare the time that it takes to extract subset from the original
>>> geotiff and compare with the time to extract the same subset from the RDBMS.
>>> He are the numbers:
>>> 
>>> ilucena at think:~/Data> time gdal_translate
>>> georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> real      0m0.720s
>>> user 0m0.408s
>>> sys   0m0.108s
>>> 
>>> ilucena at think:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF out2.tif
>>> -srcwin 0 0 2000 2000
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> real      0m1.177s
>>> user 0m0.976s
>>> sys       0m0.188s
>>> 
>>> And I also checked the integrity of the results to see if I get the same
>>> result:
>>> 
>>> ilucena at think:~/Data> gdalinfo -checksum out.tif
>>> ...
>>> Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>>>   Checksum=58248
>>> Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>>>   Checksum=21226
>>> Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>>>   Checksum=8002
>>> 
>>> ilucena at think:~/Data> gdalinfo -checksum out2.tif
>>> ...
>>> Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>>>   Checksum=58248
>>> Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>>>   Checksum=21226
>>> Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>>>   Checksum=8002
>>> 
>>> What are others test would be interesting to perform?
>>> 
>>> Best regards,
>>> 
>>> Ivan
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Discuss mailing list
>>> Discuss at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/discuss
>>> 
>>  
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss




More information about the Discuss mailing list