[mapserver-dev] Trying to improve perfs of MapServer with PostGIS

Fri Dec 11 01:13:09 PST 2015

Hi,

I was a bit disapointed by the perfs of reading the features from PostGIS
compared to reading them from a local Shapefile. So I've pocked my nose
around the code and I saw that geometries were fetched from the DB in EWKB
encoded in hex.

So I went ahead and changed the code to use libpq's binary mode when we are
fetching features. To avoid conversion nightmares when fetching attributes,
I'm tricking Postgres into sending them in their textual form. That way,
the behavior of MapServer should not change regarding the other columns.

The patch can be found in github [1].

General testing algo:
for each zoom level
start at the same time N threads that does
for each iteration
compute random BBOX
get the 1000x1000 image (throw the content out, but read it)
Wait for all the threads to stop
Collect the stats from the threads (throwing out the slowest result and the
fastest result).

The DB is on a db.m4.large from Amazon with SSD disk. Mapserver is on
apache with fcgid (up to 20 processes) and runs on a m4.xlarge. The
measures are taken from another Amazon machine to avoid hitting the
bandwidth limit too fast.

Times are for one query from one thread and are given in milliseconds along
with their standard deviation. Each time, I did a full run before taking
measures.

Data used is the swiss villages (total 2479 features) with polygon contours
in a 1.7MB shapefile file  [2], imported as is in PostGIS.

nbThreads=1 nbIterations=20
zoom level   1.00          4.00          16.00        32.00
original     964±  14      222± 126      66±  18      68±  19
binary       807±  13      194± 111      87±  28      79±  24
shapefile    554±  94      187± 107      72±  26      56±   3

nbThreads=5 nbIterations=20
zoom level  1.00           4.00          16.00        32.00
original    3686± 946      403± 264      84±  37      70±  22
binary      1710± 486      340± 242     105±  59      89±  32
shapefile    519± 225      278± 166      91±  58      80±  34

nbThreads=10 nbIterations=20
zoom level  1.00           4.00          16.00        32.00
original    7287±1936      800± 575     119±  79     110±  81
binary      3737± 647      471± 294     123±  70     110±  54
shapefile    884± 241      412± 269     111± 119      98±  57

nbThreads=20 nbIterations=20
zoom level  1.00           4.00          16.00        32.00
original   14969±2507     1643±1221    239± 231      166± 103
binary      7649± 730      857± 576    210± 121      181±  77
shapefile   1455± 438      483± 326    143±  97      126±  75

What is what:

   - original: Mapserver 7.0 (git 157fa47)
   - binary: Same as above with a small patch to configure libpg to use
   binary transfer.
   - shapefile: Same as original, but using a shapefile on the local disk
   (just here for comparison).

We can see that when the machine gets parallel queries, we quickly get a
factor 2 in perfs when there are a lot of features (low zoom level). There
is no measurable negative impact at higher zoom levels and lower loads.

Now, what do you guys think? Do you see in risk? Or should do a pull
request?

Thanks.

[1]
https://github.com/pvalsecc/mapserver/commit/45bd3d5795c9108618c37cc8c7472809cff54d16
[2]
http://www.swisstopo.admin.ch/internet/swisstopo/en/home/products/landscape/swissBOUNDARIES3D.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/mapserver-dev/attachments/20151211/2bdbd968/attachment.html>