[gdal-dev] Gdalinfo slow with big Rasterlite tables

Even Rouault even.rouault at mines-paris.org
Tue Aug 21 14:12:31 PDT 2012


Le mardi 21 août 2012 09:53:50, Rahkonen Jukka a écrit :
> Even Rouault wrote:
> > Selon Rahkonen Jukka <Jukka.Rahkonen at mmmtike.fi>:
> > > Even Rouault wrote:
> > > >> Better after doing these. 4 minutes and 10 seconds for the biggest
> > > >> layer, record before this was 6 minutes. Big enough difference for
> > > >> being meaningful, I think.
> > > > 
> > > > Meaningful, but I would have expected it to be much better and get
> > > > that
> > > 
> > > back
> > > 
> > > > to a few seconds...
> > > > 
> > > > Could you try : ogrinfo your_rasterlite.db -sql "VACUUM"
> > > 
> > > There were only 1% of unused pages but vacuuming had still an effect.
> > > The first run after vacuum: 50 seconds, next warm runs: 20 seconds.
> > 
> > That's what I call a significant improvement ! I think (but without any
> > strong evidence) that the main effect of vacuuming in that context is to
> > avoid the fragmentation of the data belonging to various tables. For
> > each XXX raster in a rasterlite dataset, you have a XXX_rasters table
> > that contain the binary blobs of the raster tiles, a XXX_metadata table
> > with the extent of each tile, and various utility tables used by the
> > spatial index on XXX_metadata. When generating the rasterlite dataset,
> > all of them are updated each time a tile is inserted in. The
> > fragmentation due to spatial indexes can be solved with building them
> > right at the end, but avoiding the interleaving of XXX_metadata and
> > XXX_rasters would be much harder.
> 
> That does make sense. I have noticed even before that Spatialite gains much
> from vacuuming but never before as much as now.  In this case
> SQLite/Spatialite tools cannot give a hint when vacuuming is needed
> because they only check if there is empty lines in the db and we do not
> have such after ogr2ogr conversion.
> 
> > In fact, you should be able to reduce the time again by running "SELECT
> > UpdateLayerStatistics()" after VACUUM, since currently VACUUM is wrongly
> > invalidating them (I'll fix that). So I think that you are still paying
> > the cost of fetching the extent of the raster (if when running gdalinfo
> > --debug on your_raster.db, you don't see debug traces mentionning the
> > use of layer statistics, that is the case).
> 
> That perhaps cut another couple of seconds. However, going through 700000
> rows in the raster_metadata is quite fast. Of course it is good to take
> care that direct access to statistics work and it would be more important
> with bigger tables.
> 
> > So I think that you can forgot the previous workaround I gave that
> > disables/drop/recreates spatial index, and try the following minimal
> > steps once you have used gdal_translate / gdaladdo to generate the
> > dataset :
> > 
> > ogrinfo your_rasterlite.db -sql "VACUUM"
> > ogrinfo your_rasterlite.db -sql "SELECT UpdateLayerStatistics()" (that
> > one should be unnecesserary once I've fixed the sqlite driver)
> 
> I guess I created a pretty good (= very messy) database (3.5 gigabytes) for
> testing with my workflow: I created first seven Rasterlite layers with
> decreasing pixel size/increasing table size and after that I created the
> overviews. I believe it guarantees that spatial indexes and metadata
> tables for each layer is written to non-continuos parts into the db file.
> And raster data tables, too. Vacuum is the only way to cure the database.
> Rasterlite tools seem to run vacuum always as the last step of the
> process. I think that ogr2ogr should not do it or there should be a switch
> to turn it off. Vacuuming is an awfully slow process and if one wants to
> insert many layers it is waste of time to run vacuum after each layer. But
> driver page http://www.gdal.org/frmt_rasterlite.html might have a
> sub-title about the importance of vacuuming and that it is a good
> investment for the time even it can be slow.

Jukka,

as you've suggested, I've added a "Performance hints" section to 
frmt_rasterlite.html to mention the VACUUM trick.

> 
> -Jukka-


More information about the gdal-dev mailing list