[MapProxy] Most performant storage option?

Travis Kirstine traviskirstine at gmail.com
Tue Mar 10 08:54:41 PDT 2020


 We have been using a riak cache for a few years with good results, without
going into too much detail here is our basic setup and comments:

- we use mapproxy and riak to store fully seeded imagery tiles
- 6 node riak cluster with a ring size of 64 (or 64 partitions) and a n_val
of 3 meaning that 3 copies of each partition is distributed across the 6
nodes
- haproxy for load distribution across all nodes
- riak is configure with the leveldb backend with anti-entropy turned on
- since we are using only 64 partitions each partition can be fairly large,
riak attempts to balance the partition across the nodes, so in our case 4
nodes have 16% of the partitions and 2 nodes have 19%.  This can
significant depending on the size of each partition (number of objects /
tiles), some nodes may require significantly more storage.  To get around
this issue you could have a higher number of partitions resulting in a more
equal distribution of partitions, however once the cluster is set up it
almost impossible to change this value.
- very easy to add another node, riak will redistribute the partitions
automatically
- riak stores the tile using a key / object like couchdb, mapproxy can use
a secondary index to help with queries
- riak uses the concept of a bucket, kind of a container to hold your
keys.  Mapproxy uses the bucket to hold a cache, 1 bucket=1 cache
- very fast read / writes compared to other cache types, we see 100+ tile /
second when cache without pushing it hard (transfer of caches from sqlite
to riak).
- can be problematic deleting / updating your cache depending on your use
case.  You cannot simply delete a bucket/cache but need to delete each
object.  If you try to use the mapproxy cleanup it may take forever to
complete as it needs to cycle through each possible key and query riak.
We've had to write python scripts that utilize the secondary index to
retrieve lists of keys and then delete each key / object in the list.
- riak logging is terrible so it can be very difficult to troubleshoot
issues, there is no logging of request / errors for client applications, so
once in a while you'll see a timeout and you'll have no idea if is a riak
or network issue
- been very reliable so far
- basho the company that developed riak went bankrupt a few years ago but
there is still some active development, it was taken over by bet360

On Tue, 10 Mar 2020 at 03:53, Jeff Konnen <jaykayone at gmail.com> wrote:

> Hi all,
>
> we host several of hundreds of caches in a fie system that resides on a
> NFS disk which is mounted on different hosts with mapproxy containers
> accessing these files.
>
> It works well but we are hitting some I/O Problems on the NFS server.
> That's why we are considering using a distributed system right now.
>
> Does anyone have experience with mapproxy on riak, s3 (minio?) ?
>
> What would your recommendation be?
>
> We don't have any of these components running today, so given that we
> would have to create a new infrastructure from scratch, which one would you
> recommend?
>
> Best regards
> Jeff
>
> --
> Jeff Konnen
> _______________________________________________
> MapProxy mailing list
> MapProxy at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/mapproxy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/mapproxy/attachments/20200310/f6edeae4/attachment.html>


More information about the MapProxy mailing list