[GRASS-dev] Benchmark the overhead of calling GRASS modules

Thu May 23 06:28:20 PDT 2019

hello Markus and Vaclav,

thank you for your feedback. My answer is inline.

On Wed, May 1, 2019 at 6:03 PM Markus Metz <markus.metz.giswork at gmail.com>
wrote:

> IMHO the overhead of calling GRASS modules is insignificant because it is
> in the range of milliseconds. I am much more concerned whether executing a
> GRASS module takes days or hours or minutes.
>

And the overhead is insignificant (not measurable) compared to actual
> execution time for larger datasets/regions.

I would argue that this depends on what you are doing. For a single GRASS
Session using a really big computational region the overhead is obviously
negligible; I wrote that in the initial post
<https://gist.github.com/pmav99/8f4546fe15940b3cb7db0cfb65e18d33#is-this-truly-a-problem>
too. But if you do need to massively parallelize GRASS, then the overhead
of setting up the GRASS Session and/or calling GRASS modules might be
measurable too.

Regardless, the overhead

   - can be noticeable while doing exploratory analysis
   - can be significant while **developing** GRASS (e.g. when running
   tests).

BTW, let us also keep in mind that the majority of the tests should be
using really small maps/computational regions (note: they currently don't,
but that's a different issue) which means that the impact of this overhead
should be larger

On Thu, May 2, 2019 at 4:49 AM Vaclav Petras <wenzeslaus at gmail.com> wrote:

> Hi Panos and Markus,
>
> I actually touched on this in my master's thesis [1, p. 54-58],
> specifically on the subprocess call overhead (i.e. not import or
> initialization overheads). I compared speed of calling subprocess in Python
> to a Python function call. The reason was that I was calling GRASS modules
> many times for small portions of my computational region, i.e. I was
> changing region always to the area/object of interest within the actual
> (user set) computational region. So, the overall process involved actually
> many subprocess calls depending on the size of data. Unfortunately, I don't
> have there a comparison of how the two cases (functions versus
> subprocesses) would look like in terms of time spend for the whole process.
>

Again I would argue that the answer depends on what you are doing.
Pansharpening a 100 pixel map, has a (comparatively) huge overhead.
Pansharpening a landast tile, not so much. Regardless of that, I think we
can all agree that directly calling a function implementing algorithm Foo
is always going to be faster than calling a script that calls the same
function. Unfortunately, and as you pointed out, perhaps most of the GRASS
functionality is only accessible from the CLI and not through an API.

> And speaking more generally, it seems to me that the functionality-CLI
> coupling issue is what might me be partially fueling Facundo's GSoC
> proposal (Python package for topology tools). There access to functionality
> does not seem direct enough to the user-programmer with overhead of
> subprocess call as well as related I/O cost, whether real or perceived,
> playing a role.
>

I can't speak about Facundo. Nevertheless, whenever I try to work with the
API, I do find it limited and it feels that it still has rough edges (e.g.
#3833 <https://trac.osgeo.org/grass/ticket/3833> and #3845
<https://trac.osgeo.org/grass/ticket/3845> ). It very soon becomes clear
that in order to get work done you need to use the Command Line Interface.
As a programmer, I do find this annoying :P

> Best,
> Vaclav
>
> [1] Petras V. 2013. Building detection from aerial images in GRASS GIS
> environment. Master’s thesis. Czech Technical University in Prague.
> http://geo.fsv.cvut.cz/proj/dp/2013/vaclav-petras-dp-2013.pdf
>

Thank you for the link)

all the best,
Panos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20190523/16c082a7/attachment.html>