[GRASS-dev] Adding an expert mode to the parser

Markus Metz markus.metz.giswork at gmail.com
Wed Sep 28 13:51:16 PDT 2016


On Sun, Sep 25, 2016 at 9:49 PM, Markus Neteler <neteler at osgeo.org> wrote:
> On Fri, Sep 23, 2016 at 11:30 PM, Markus Metz
> <markus.metz.giswork at gmail.com> wrote:
>> On Fri, Sep 23, 2016 at 11:22 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>> On Fri, Sep 23, 2016 at 11:05 PM, Markus Metz
>>> <markus.metz.giswork at gmail.com> wrote:
>>>> On Fri, Sep 23, 2016 at 1:11 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>> ...
>>>> Your motivation is to provide a specialized CLI interface for HPC
>>>> processing?
>>>
>>> No, not the case.
>>>
>>>> We used GRASS with HPC processing for years and the
>>>> problems we faced were causes by the HPC hardware and software
>>>> infrastructure, not by GRASS. What exactly is the problem with using
>>>> GRASS and HPC processing?
>>>
>>> There is no problem. There is just the issue that with an increasing
>>> amount of additions (e.g. maybe the need to provide region/resolution
>>> to individual modules for independent parallel processing without the
>>> overhead of always opening a new mapset)
>>
>> Getting closer it seems. Can you quantify "the overhead of always
>> opening a new mapset"?
>
> As an example, when aiming at processing all Sentinel-2 tiles
> globally, we speak about currently 73000 scenes * up-to-16 tiles along
> with global data, analysis on top of other global data is more complex
> when doing each job in its own mapset and reintegrate it in a single
> target mapset as if able to process then in parallel in one mapset by
> simply specifying the respective region to the command of interest.
> Yes, different from the current paradigm and not for G7.

from our common experience, I would say that creating separate mapsets
is a safety feature. If anything goes wrong with that particular
processing chain, cleaning up is easy, simply delete this particular
mapset and run the job again, if possible on a different host/node
(assuming that failed jobs are logged). Anyway, I would be surprised
if the overhead of opening a separate mapset is measurable when
processing all Sentinel-2 tiles globally. Reintegration into a single
target mapset could cause problems with regard to IO saturation, but
in a HPC environment temporary data always need to be copied to a
final target location at some stage. The HPC system you are using now
is most probably quite different from the one we used previously, so
this is a lot of guessing, particularly about the storage location of
temporary data (no matter if it is the same mapset or a separate
mapset).

To be continued in a GRASS+HPC thread?

Markus M

>
> But my original comment was targeted at the increasing number of
> module parameters and how to handle that (with some new HPC related
> idea as an example).
>
> I'm fine to archive this question for now, it will likely come up
> again in the future.
>
> markusN


More information about the grass-dev mailing list