[mapserver-dev] Trying to improve perfs of MapServer with PostGIS
Even Rouault
even.rouault at spatialys.com
Fri Dec 11 08:06:47 PST 2015
Le vendredi 11 décembre 2015 13:38:56, Daniel Morissette a écrit :
> Hi Patrick,
>
> We have run into similar behaviors with a quite similar setup on Amazon
> a little while ago, in our case we had a dedicated DB server with SSD
> (like yours) and multiple rendering nodes connecting to it, and adding
> more nodes or more powerful ones still resulted in a constant throughput
> and the additional rendering nodes CPU running mostly idle. After much
> hair pulling we finally found out that we were hitting the 1 Gbps limit
> of the network connection between the DB and rendering node.
>
> I suspect you are running into the same thing here: if you could verify
> that the size of the original vs binary response is about the double,
> then that would confirm this possibility. We used PGBouncer to look at
> the volume of data transferred by the DB server, it produces output like
> this:
>
> 2014-06-06 18:49:44.597 27891 LOG Stats: 852 req/s, in 435114 b/s, out
> 83424883 b/s,query 7133 us
> 2014-06-06 18:50:44.597 27891 LOG Stats: 818 req/s, in 418046 b/s, out
> 79177139 b/s,query 5761 us
> 2014-06-06 18:52:44.598 27891 LOG Stats: 898 req/s, in 458041 b/s, out
> 88441708 b/s,query 7949 us
>
>
> Then to really confirm it, look at the number of hits per second in each
> test case (instead of the processing time per request). You will likely
> notice that it is constant for a given zoom level once you hit the
> network limit no matter how many parallel requests you have. This is
> because the number of hits per seconds that can be served are limited by
> the volume of data per second that the DB server can deliver to
> MapServer (around ~800Mbits/sec for a 1 Gbps connection).
>
> In our case, if I remember correctly, we used to hit a limit of about 40
> maps/second, where each map was averaging 2MB of data... so we were
> capping at 4 hits x 2MB ~= 80MBytes per second from the DB (or
> ~800Mbits/sec) ... which is as I wrote above in line with a 1Gbps limit.
If bandwidth is the bottleneck, an even more efficient solution than the binary
WKB representation would be to use http://postgis.net/docs/manual-
dev/ST_AsTWKB.html . This would require a TWKB decoder to be developped in
MapServer and be likely only appropriate for WMS like scenarios where you can
afford loosing some precision to coordinates values.
Even
>
> Good luck!
>
> Daniel
>
> On 2015-12-11 4:13 AM, Patrick Valsecchi wrote:
> > Hi,
> >
> > I was a bit disapointed by the perfs of reading the features from
> > PostGIS compared to reading them from a local Shapefile. So I've pocked
> > my nose around the code and I saw that geometries were fetched from the
> > DB in EWKB encoded in hex.
> >
> > So I went ahead and changed the code to use libpq's binary mode when we
> > are fetching features. To avoid conversion nightmares when fetching
> > attributes, I'm tricking Postgres into sending them in their textual
> > form. That way, the behavior of MapServer should not change regarding
> > the other columns.
> >
> > The patch can be found in github [1].
> >
> > General testing algo:
> > for each zoom level
> > start at the same time N threads that does
> > for each iteration
> > compute random BBOX
> > get the 1000x1000 image (throw the content out, but read it)
> > Wait for all the threads to stop
> > Collect the stats from the threads (throwing out the slowest result and
> > the fastest result).
> >
> > The DB is on a db.m4.large from Amazon with SSD disk. Mapserver is on
> > apache with fcgid (up to 20 processes) and runs on a m4.xlarge. The
> > measures are taken from another Amazon machine to avoid hitting the
> > bandwidth limit too fast.
> >
> > Times are for one query from one thread and are given in milliseconds
> > along with their standard deviation. Each time, I did a full run before
> > taking measures.
> >
> > Data used is the swiss villages (total 2479 features) with polygon
> > contours in a 1.7MB shapefile file [2], imported as is in PostGIS.
> >
> > nbThreads=1 nbIterations=20
> > zoom level 1.00 4.00 16.00 32.00
> > original 964± 14 222± 126 66± 18 68± 19
> > binary 807± 13 194± 111 87± 28 79± 24
> > shapefile 554± 94 187± 107 72± 26 56± 3
> >
> >
> > nbThreads=5 nbIterations=20
> > zoom level 1.00 4.00 16.00 32.00
> > original 3686± 946 403± 264 84± 37 70± 22
> > binary 1710± 486 340± 242 105± 59 89± 32
> > shapefile 519± 225 278± 166 91± 58 80± 34
> >
> > nbThreads=10 nbIterations=20
> > zoom level 1.00 4.00 16.00 32.00
> > original 7287±1936 800± 575 119± 79 110± 81
> > binary 3737± 647 471± 294 123± 70 110± 54
> > shapefile 884± 241 412± 269 111± 119 98± 57
> >
> > nbThreads=20 nbIterations=20
> > zoom level 1.00 4.00 16.00 32.00
> > original 14969±2507 1643±1221 239± 231 166± 103
> > binary 7649± 730 857± 576 210± 121 181± 77
> > shapefile 1455± 438 483± 326 143± 97 126± 75
> >
> > What is what:
> > * original: Mapserver 7.0 (git 157fa47)
> > * binary: Same as above with a small patch to configure libpg to use
> >
> > binary transfer.
> >
> > * shapefile: Same as original, but using a shapefile on the local disk
> >
> > (just here for comparison).
> >
> > We can see that when the machine gets parallel queries, we quickly get a
> > factor 2 in perfs when there are a lot of features (low zoom level).
> > There is no measurable negative impact at higher zoom levels and lower
> > loads.
> >
> > Now, what do you guys think? Do you see in risk? Or should do a pull
> > request?
> >
> > Thanks.
> >
> >
> > [1]
> > https://github.com/pvalsecc/mapserver/commit/45bd3d5795c9108618c37cc8c747
> > 2809cff54d16 [2]
> > http://www.swisstopo.admin.ch/internet/swisstopo/en/home/products/landsca
> > pe/swissBOUNDARIES3D.html
> >
> >
> >
> > _______________________________________________
> > mapserver-dev mailing list
> > mapserver-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapserver-dev
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the mapserver-dev
mailing list