[mapguide-internals] MapGuide RFC 112 - sqlite based tile cache
Trevor Wekel
trevor_wekel at otxsystems.com
Fri May 13 12:06:31 EDT 2011
Hmm... Now how many times have I heard "Repository Corruption" on the mailing lists. Moving to files on disk would make repository hacking easier and guarantee that you would not lose the entire library at once. I do have a couple of concerns about moving to a file based approach. We may run out of file handles could happen if we keep the files open and there may be some overhead for executing fopen/fclose on every resource access.
Regards,
Trevor
-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Traian Stanev
Sent: May 13, 2011 9:44 AM
To: MapGuide Internals Mail List
Subject: RE: [mapguide-internals] MapGuide RFC 112 - sqlite based tile cache
For ultimate serving speed of static tile caches, it's would be best to bypass everything MapGuide and serve the tile cache directly from a directory exposed via Apache. This way you would automatically get browser caching as well. This is also why file storage of the tile cache is best for optimal serving speed.
As far as the resource repository, IMO it's small enough to use a direct storage of XML files on the file system (for example, mapping what are currently XMLdb paths to file system paths). Using a database to store blobs in there would complicate things more than necessary and also adds an unnecessary dependency to what could be a really simple piece of code (reading and writing files).
Traian
-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Trevor Wekel
Sent: Friday, May 13, 2011 11:33 AM
To: MapGuide Internals Mail List
Subject: RE: [mapguide-internals] MapGuide RFC 112 - sqlite based tile cache
Since MapGuide is targeted for both Windows and Linux, I think SQLite is the only choice. If we are going to introduce another local database into MapGuide, perhaps we should consider other use cases for it. Here's a few just off the top of my head:
Response caching for mapagent and possibly web extensions
- Operations like getting tiles and dynamic map overlays (for initial views) may be cacheable if we remove SESSION from the HTTP GET/POST and put it in a cookie. We would have to implement time to live logic for this to be truly effective.
Move log files to database storage
- This could make query and analysis of log files easier
Serving tiles directly from the mapagent
- Copying/propagating the SQLite database files to the web tier would eliminate the agent/server hop
Move to a SQLite backend for MgResourceService
- Maintaining multiple database technologies in one product could be additional overhead
- BerkeleyDB doesn't seem to be great fit for the Session repository. After six years, we are still working on it. Write-Ahead logging in SQLite could be effective for the Session repository http://www.sqlite.org/wal.html.
A new service "MgStorageService" implemented in MapGuideCommon could wrap the SQLite database and make it accessible to the server, agent, and web extensions. The API to MgStorageService would have to be considered carefully based on expected use cases.
Regards,
Trevor
-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Traian Stanev
Sent: May 13, 2011 8:49 AM
To: 'MapGuide Internals Mail List'
Subject: RE: [mapguide-internals] MapGuide RFC 112 - sqlite based tile cache
It's very strange that you are only getting <20% space efficiency out of the file system -- the tiles must be really tiny or the block size really huge. Perhaps this is another thing to look into (i.e. pick a better file system, but I guess sqlite is just going to be used as file system in a file in this case).
Yes, copying one file once is faster due to less seeking involved compared to 300K files. But, if you want to do incremental backups, where there are only a few changed tiles, things will likely reverse.
Anyway, if you are looking for a file system in a file, on Windows sqlite is probably the only choice. On Linux, there would be more options (ext3 in a file with B-tree indexing and tail-packing enabled, for example).
Traian
-----Original Message-----
From: mapguide-internals-bounces at lists.osgeo.org [mailto:mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Zac Spitzer
Sent: Friday, May 13, 2011 3:49 AM
To: MapGuide Internals Mail List
Subject: Re: [mapguide-internals] MapGuide RFC 112 - sqlite based tile cache
Lets take a tiny small tiled map as an example, using win7 x64, default configurations, quad core machine, fully seeded tile cache.
Samples_Sheboygan_MapsTiled_Sheboygan
Size: 267 MB (280,474,999 bytes)
Size on disk: 1.50 GB (1,613,549,568 bytes)
Contains: 375,084 Files, 527 Folders
as a single zip file, store aka zero compression
Samples_Sheboygan_MapsTiled_Sheboygan_2.zip
361 MB (379,239,309 bytes)
Block size is the issue, but you can't really optimise this as raster tiles will require a different blocksize than vector tiles.
Some simple Robocopy backup tests (same disk/non raided/default options)
copying a tile cache in a zip file
Total Copied Skipped Mismatch FAILED Extras
Dirs : 1 0 1 0 0 0
Files : 1 1 0 0 0 0
Bytes : 361.67 m 361.67 m 0 0 0 0
Times : 0:00:06 0:00:06 0:00:00 0:00:00
Speed : 62632420 Bytes/sec.
Speed : 3583.855 MegaBytes/min.
copying the raw tilecache using /MIR (requiring a 65% CPU utilisation)
Total Copied Skipped Mismatch FAILED Extras
Dirs : 528 527 1 0 0 0
Files : 375084 375084 0 0 0 0
Bytes : 267.48 m 267.48 m 0 0 0 0
Times : 0:27:30 0:19:54 0:00:00 0:07:35
Speed : 234717 Bytes/sec.
Speed : 13.430 MegaBytes/min.
z
On Fri, May 13, 2011 at 9:28 AM, Trevor Wekel <trevor_wekel at otxsystems.com>
wrote:
> I agree with Traian. There are alternative solutions for replication
> and
backup.
>
> If we are considering replication and backup for the tile sets, we
> should
also consider replication for the XML definitions (layer,feature,map) used to generate those tiles. In other words, I would like to consider tile replication and repository replication together.
>
> The replication and backup functionality in MapGuide is certainly lacking.
MGP files do not propagate user/group/role information. The only "easy"
way to back up or replicate an entire server is to stop the MapGuide Server and copy all the files around. I doubt that Rsync or robocopy could replicate live BerkeleyDB files.
>
> I also took a quick look at SQLite replication. Google didn't turn up
anything that was LGPL and actively maintained. SQLite does have an internal hook that we could use to replicate stuff stored in SQLite http://www.sqlite.org/capi3ref.html#sqlite3_update_hook. We could roll our own.
>
> Since we allow access to external data sources (SHP files, ECW files,
etc), replication of file based data from server to server would have to be considered as part of the solution. And replication to a UNC path would be an easy way to implement backup.
>
> Master/Slave replication based on files and SQLite could be
> implemented in
phases. Here's a very rough outline:
>
> Phase 1 - Tile and external data replication
> - Reintroduce master/slave concept for MapGuide Server
> - Implement server to server TCP/IP communication logic to transfer
> files
> - Implement local "file copy (UNC backup)" logic
>
> Phase 2 - Switch to SQLite for tiles
> - Add SQLite to the MapGuide Server. Recode MgTileService to populate
> the
database
> - Implement a SQLite update hook
> - Implement server to server TCP/IP communication logic for
> propagating
SQLite updates
>
> Phase 3 - Full repository replication
> - Rip out BerkeleyDB and replace it with SQLite
> - Use existing mechanism from Phase 2 to implement full replication of
repository
>
> Regards,
> Trevor
>
> -----Original Message-----
> From: mapguide-internals-bounces at lists.osgeo.org [mailto:
mapguide-internals-bounces at lists.osgeo.org] On Behalf Of Traian Stanev
> Sent: May 12, 2011 11:59 AM
> To: 'MapGuide Internals Mail List'
> Subject: RE: [mapguide-internals] MapGuide RFC 112 - sqlite based tile
cache
>
>
> Hi Tom,
>
> It would depend on what exactly the problem is -- sure if one is using
Windows Explorer drag and drop to backup the files it would be faster to have one file. But if one is using rsync or robocopy (or similar), it still makes sense to use files, since those programs know how to copy only the changed files (or even changed parts of files).
>
> Traian
>
>
> _______________________________________________
> mapguide-internals mailing list
> mapguide-internals at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapguide-internals
>
>
--
Zac Spitzer
Solution Architect / Director
Ennoble Consultancy Australia
http://www.ennoble.com.au
http://zacster.blogspot.com
+61 405 847 168
_______________________________________________
mapguide-internals mailing list
mapguide-internals at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/mapguide-internals
More information about the mapguide-internals
mailing list