[GRASS-dev] Benchmark the overhead of calling GRASS modules

Sun Apr 28 23:49:03 PDT 2019

Hello all

You might find it easier to read the following text i
<https://gist.github.com/pmav99/8f4546fe15940b3cb7db0cfb65e18d33>n a gist
<https://gist.github.com/pmav99/8f4546fe15940b3cb7db0cfb65e18d33>

## Introduction

I was trying to write a decorator/contextmanager that would temporary
change the
computational region, but while using it I noticed that there was some
overhead the root
of which seemed to be the usage of the GRASS modules. So in order to
quantify this
I wrote a small benchmark that tries to measure the overhead of calling a
GRASS Module.
This is what I found.

## tl;dr

Calling a GRASS module incurs a constant but measurable overhead which, in
certain
cases, e.g. when writing a module that uses a lot of the other modules, can
quickly
add up to a significant quantity.

## Disclaimer

If you try to run the benchmark on your own PC, the actual timings you will
get will
probably be different. The differences might be caused by having:

- a stronger/weaker CPU
- faster/slower hard disk.
- using different Python version
- using different compilation flags

Still, I think that some of the findings are reproducible.

For reference, I used:

- OS: Linux 5.0.9
- CPU: Intel i7 6700HQ
- Disk type: SSD
- Python: 3.7
- `CFLAGS='-O2 -fPIC -march=native -std=gnu99'`

## Demonstration

The easiest way to demonstrate the performance difference between using a
GRASS module
vs using the GRASS API is to run the following snippets.

Both of them do exactly the same thing, i.e. retrieve the current region
settings 10
times in a row.  The performance difference is huge though. On my laptop,
the first one
needs 0.36 seconds while the second one needs just 0.00038 seconds. That's
almost
a 1000x difference...

``` python
import time
import grass.script as gscript

start = time.time()
for i in range(10):
    region = gscript.parse_command("g.region", flags="g")
end = time.time()
total = end - start

print("Total time: %s" % total)
```

vs

``` python
import time
from grass.pygrass.gis.region import Region

start = time.time()
for i in range(10):
    region = Region()
end = time.time()
total = end - start

print("Total time: %s" % total)
```

## How much is the overhead exactly?

In order to measure the actual overhead of calling a GRASS module, I
created two new
GRASS modules that all they do is parse the command line arguments and
measured how much
time is needed for their execution. The first module is [`r.simple`]() and
is
implemented in Python while the other one is [`r.simple.c`]() and is
implemented in C.
The timings are in msec and the benchmark was executed using Python 3.7

| call method                    | r.simple | r.simple.c |
|--------------------------------|:--------:|:----------:|
| pygrass.module.Module          |   85.9   |    66.5    |
| pygrass.module.Module.shortcut |   85.5   |    66.9    |
| grass.script.run_command       |   41.3   |    30.5    |
| subprocess.check_call          |   41.8   |    30.3    |

As we can see, `gsrcipt.run_command` and `subprocess` give more or less a
identical
results, which is to be expected since `run_command` + friends are just a
thin wrapper
around `subprocess`.  Similarly `shortcuts` has the same overhead as
"vanila"
`pygrass.Module`.  Nevertheless, it is obvious that `pygrass` is roughly 2x
times slower
than `grass.script` (but more about that later).

As far as C vs Python goes, on my computer modules implemented in C seem to
be 25% faster than their
Python counterparts.  Nevertheless, a 40 msec startup time doesn't seem
extraordinary
for a Python script, while 30 msec feels rather large for a CLI application
implemented
in C.

## Where is all that time being spent?

### C Modules

Unfortunately, I am not familiar enough with the GRASS internals to easily
check what is
going on and I didn't have the time to try to profile the code. I suspect
that
`G_gisinit` or something similar is causing the overhead, but someone more
familiar with
the C API should be able to enlighten us.

### Python Modules

In order to gain a better understanding of the overhead we have when
calling python
modules, we also need to measure the following quantities:

1. The time that python needs to spawn a new process
2. The startup time for the python interpreter
3. The time that python needs to `import grass`

These are the results:

|                         | msec |
|-------------------------|:----:|
| subprocess spawn        |  1.2 |
| python 2 startup        |  9.0 |
| python 2 + import grass | 24.5 |
| python 3 startup        | 18.2 |
| python 3 + import grass | 39.3 |

As we can see:

- the overhead of spawning a new process from within python is more or less
negligible
  (at least compared to the other quantities).
- The overhead of spawning a python 3 interpreter is 2x bigger than Python
2; i.e.  the
  transition to Python 3 will have some performance impact, no matter what.
- The overhead of spawning a python 3 interpreter accounts for roughly 50%
of the total
  overhead (18 msec out of 41 msec).
- The other 50% of the overhead is pretty much caused just by importing the
`grass`
  library.

## Why is Pygrass 2x times slower?

I haven't carefully looked into this (i.e. I haven't profiled the code),
but it seems
that the culprit is [this
line](
https://github.com/GRASS-GIS/grass-ci/blob/fe814a4fb73937ce96a36c11b6876db42acf28fa/lib/python/pygrass/modules/interface/module.py#L528
).
In other words, `pygrass` calls a module twice, once to get the interface's
description
and once to actually run the module. That's why the overhead is double both
for
C modules and Python ones.

## Is this truly a problem?

Well, it depends :)

The most important factor is probably the size of the computational
region.  If you are
dealing with really large regions, then the overhead is probably miniscule.
If you are
dealing with much smaller ones though, then it can be significant.

That being said, there are at least two cases where I think that this
overhead is
important:

- interactive sessions
- running tests

Just to give an example, `i.pansharpen` calls ~50 other GRASS modules. On
my laptop, the
overhead for these calls is almost 2 seconds. Is this a lot? Well, if you
are
pansharpening Landsat Tiles, it probably is not that much, but you will
have this
overhead even if you are pansharpening a 4 pixel map (e.g. when running
tests).

I am pretty sure that there are numerous other modules that are just like
that, too.

## What is to be done?

I think that there are two different areas where work can be done:

1. Reduce the actual overhead; i.e. make calling GRASS modules faster.
2. Convert calls to GRASS modules with API calls which, as shown earlier,
are orders of
   magnitude faster.

### Reduce the overhead

The overhead of Python and C modules has different causes.

As I said, I haven't looked into what causes C modules to take so long so i
can't make
any suggestions. This should be further looked into though. The reason is
that modules
like e.g. `g.region` are being used practically everywhere, so speeding
these up should
give a measureable performance improvement.

As far as Python modules go, unfortunately, the overhead seems to be
relatively
inelastic.  Speeding up the startup time of the Python interpreter is not
something that
GRASS can do or rely upon. So, the only thing that can be actually be done
is to try to
reduce the time needed for importing the `grass` library. That being said,
this should
probably not skim more than a few ms at best, but at least the gains should
be there for
all python modules.

Luckily there is at least one relatively low hanging fruit. Speeding
pygrass should not
be that difficult. I haven't looked into this but pre-generating the
modules' XML
descriptions seems feasible and it should remove the need to call each
module twice,
thus making pygrass performance comparable to `grass.script.run_command`

### Convert module calls to API calls

Converting the module calls to API calls is what can potentially give the
bigger
benefits, but at the same time is what needs the most work.

There are some low hanging fruits here too. E.g. functions like
`use_temp_region()` and
`raster_info()` can probably be refactored to use the pygrass API, while
calls to e.g.
`g.region` can probably be replaced with `pygrass.gis.Region` objects etc.

Nevertheless, this does not really touch the root of the problem which IMHV
is the tight
coupling between the GRASS modules functionality and the CLI. In layman
terms, at the
moment, if you want to use module A from module B you are forced to spawn a
new process
and suffer the overhead that this entails.

TBH, I am not sure if this a problem that can even be tackled at this
stage, but if each
module had one or more functions that could be imported/called by other
modules,
everything would be much easier and performance would be significantly
better.

## How to run the benchmark?

If you want to run this benchmark on your own you need to:

1. `git remote add panos https://github.com/pmav99/grass-ci.git`
1. `git fetch --all`
1. `git checkout overhead`
1. `cd scripts/r.simple && make && cd ../../`
1. `cd raster/r.simple.c && make && cd ../../`
1. Start a grass session
1. `python benchmark.py`

I haven't run the benchmark under Python 2, but even if there are
incompatibilities,
they should be trivial to fix.

I will be happy to hear any remarks :)

with kind regards,
Panos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20190429/066be9ff/attachment-0001.html>