[GRASS-user] Global overview of GRASS HPC resources/installations ?
Laura Poggio
laura.poggio at gmail.com
Thu May 24 08:40:19 PDT 2018
Hi Massi,
I can see we live in a quite different "computational" world :-)
I will try to further answer your questions below. I do agree completely
that if you have access to a good HPC than cloud providers are probably not
so needed. If you have a good infrastructure that fits (and even goes
beyond) your needs and it is well maintained. I also think that from the
point of view of setting up GRASS the two approaches are not so different.
And it will be interesting to see the development and comparison of the
different set-ups.
Laura
On 24 May 2018 at 12:12, Massi Alvioli <nocharge at gmail.com> wrote:
> even if you can tile-up your problem - which probably covers 95% of
> the parallelization one can do in GRASS - you still have
the problem instantiate the the cloud machines,
Scriptable: once the instance template is ready it takes few seconds to
launch the machines (even hundreds of them)
copying data to the multiple instances, gather back the results and
> patching them into your final results - the order of last two steps is
> your choice - and
>
This is where I am still exploring the most efficient solution. There are
storage options to avoid or reduce the needs to copy the data to/from the
instances
> I expect all of these operations to be much slower going through the cloud
> than in any other
> architecture.
The overall processing time is what matters, from importing initial
> data to having the final result available.
You are right. However I think it depends if the larger data are on own
premises or online (e.g. remote sensing images). For example for me it is
much faster to download an image on a online instance, especially when the
files are already stored on the provider servers.
> Of course, if the cloud is the only viable possibility of having
> multiple cores, there is no way out. It is also true that everybody
> owns a couple of desktop machines with a few tens of computing cores
> overall ..
>
>
To set up a cluster with spare desktops you need to follow IT policies and
sometimes they are not so easy to adapt. In my opinion, it is also not so
easy to set up a proper cluster, with shared storage, backup, faster net
connections, etc.
> M
>
>
> 2018-05-24 9:11 GMT+02:00 Laura Poggio <laura.poggio at gmail.com>:
> > Hi Massi,
> > using multiple single instances of GRASS had advantages (in our workflow)
> > when tiling: each tile was sent in its own mapset to a different instance
> > for processing.
> > I am aware that this can be done on HPC locally. However, doing this on
> the
> > cloud had the advantage (for us) to be able to use many more instances
> than
> > the cores available locally.
> >
> > I think you are right and I/O operation and concurrent database
> operations
> > will be probably slower, but our workflow focus mainly on raster
> operations
> > and integrated GRASS / R models. If these operations can be tiled, then
> > there are advantages in doing so on different instances, when one does
> not
> > have access to enough local cores.
> >
> > I am trying to tidy up the workflow used to be able to share. And I am
> > looking forward to see other workflows.
> >
> > Thanks
> >
> > Laura
> >
> > On 23 May 2018 at 21:08, Massi Alvioli <nocharge at gmail.com> wrote:
> >>
> >> Hi Laura,
> >>
> >> well, not actually - it does not answer my question. I mean, I am
> >> pretty sure one can have GRASS up and running on some cloud instance,
> >> but the point is: when it comes to performance, is that convenient? I
> >> mean multi-process performance, of course. There is not much point on
> >> running single GRASS instances, if not for very peculiar applications,
> >> right? I bet it is not convenient, on any level, either if we look at
> >> I/O operations, or mapcalc operations, not to talk about concurrent
> >> database operations ... I might be wrong, of course. But my experience
> >> with cloud environments and parallel processing were rather
> >> disappointing. On some un-related problem (I mean, not GRASS-related),
> >> I tried something here https://doi.org/10.30437/ogrs2016_paper_08,
> >> with little success. I can't imagine a reason why it should be
> >> different using GRASS modules, while I found undoubtfully good
> >> performance on HPC machines.
> >>
> >> M
> >>
> >> 2018-05-23 16:35 GMT+02:00 Laura Poggio <laura.poggio at gmail.com>:
> >> > Hi Massi,
> >> > we managed to run GRASS on different single-core instances on a cloud
> >> > provider. It was a bit tricky (initially) to set up the NFS mount
> >> > points. I
> >> > am still exploring the different types of storage possible and what
> >> > would be
> >> > cheaper and more efficient.
> >> >
> >> > I hope this answers your question.
> >> >
> >> > Once the workflow is more stable I hope I will be able to share it
> more
> >> > widely.
> >> >
> >> > Thanks
> >> >
> >> > Laura
> >> >
> >> > On 23 May 2018 at 14:37, Massi Alvioli <nocharge at gmail.com> wrote:
> >> >>
> >> >> Hi Laura,
> >> >>
> >> >> the effort on cloud providers is probably useless. Was it different
> in
> >> >> your case?
> >> >>
> >> >>
> >> >> M
> >> >>
> >> >> 2018-05-22 10:12 GMT+02:00 Laura Poggio <laura.poggio at gmail.com>:
> >> >> > I am really interested in this. I am experimenting with different
> >> >> > settings
> >> >> > to use GRASS on HPC, more specifically on multi-core local machines
> >> >> > and
> >> >> > on
> >> >> > single-core multiple instances on a cloud provider. It would be
> great
> >> >> > to
> >> >> > share experiences with other people fighting the same problems.
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> > Laura
> >> >> >
> >> >> > On 20 May 2018 at 12:32, Moritz Lennert
> >> >> > <mlennert at club.worldonline.be>
> >> >> > wrote:
> >> >> >>
> >> >> >> Le Sun, 20 May 2018 09:30:53 +0200,
> >> >> >> Nikos Alexandris <nik at nikosalexandris.net> a écrit :
> >> >> >>
> >> >> >> > * Massi Alvioli <nocharge at gmail.com> [2018-05-17 15:01:39
> +0200]:
> >> >> >> >
> >> >> >> > >2018-05-17 10:09 GMT+02:00 Moritz Lennert
> >> >> >> > ><mlennert at club.worldonline.be>:
> >> >> >> > >
> >> >> >> > >Hi,
> >> >> >> > >
> >> >> >> > >> [I imagine your mail was supposed to go onto the mailing list
> >> >> >> > >> and
> >> >> >> > >> not just to me...]
> >> >> >> > >
> >> >> >> > >sure my answer was for everyone to read, I believe I tried to
> >> >> >> > > send
> >> >> >> > > it
> >> >> >> > >again afterwards..
> >> >> >> > >something must have gone wrong.
> >> >> >> > >
> >> >> >> > >> I just presented GRASS and a short overview over GRASS on HPC
> >> >> >> > >> yesterday at the FOSS4F-FR and there was a lot of interest
> for
> >> >> >> > >> this. Several people asked me about specific documentation on
> >> >> >> > >> the
> >> >> >> > >> subject.
> >> >> >> > >
> >> >> >> > >What we did about GRASS + HPC was for specific production
> >> >> >> > > purposes
> >> >> >> > >and no documentation
> >> >> >> > >whatsoever wascreated, basically due to lack of time.. so I
> find
> >> >> >> > > it
> >> >> >> > >hard to say whether this is going
> >> >> >> > >to change in the near future:). Surely the topic is of wide
> >> >> >> > > interest
> >> >> >> > >and worth being discussed in
> >> >> >> > >several contexts.
> >> >> >> > >
> >> >> >> > >> Currently, I'm aware of the following wiki pages which each
> >> >> >> > >> potentially touches on some aspects of HPC:
> >> >> >> > >
> >> >> >> > >I must admit that existing documentation/papers did not help
> >> >> >> > > much.
> >> >> >> > >Well, did not help at all, actually.
> >> >> >> > >One major problem in my opinion/experience is that
> >> >> >> > >multi-core/multi-node machines can be really
> >> >> >> > >different from each other, and parallelization strategies very
> >> >> >> > >purpose-specific, so that creating
> >> >> >> > >general-purpose documents/papers, or even software, *may* be a
> >> >> >> > >hopeless effort. Smart ideas
> >> >> >> > >are most welcome, of course:)
> >> >> >> >
> >> >> >> > Dear Massimo and all,
> >> >> >> >
> >> >> >> > Being a beginner in massively processing Landsat 8 images using
> >> >> >> > JRC's
> >> >> >> > JEODPP system (which is designed for High-Throughput,
> >> >> >> > https://doi.org/10.1016/j.future.2017.11.007), I found useful
> >> >> >> > notes
> >> >> >> > in
> >> >> >> > the Wiki (notably Veronica's excellent tutorials) and elsewhere,
> >> >> >> > got
> >> >> >> > specific answers through the mailing lists and learned a lot in
> >> >> >> > on-site discussions during the last OSGeo sprint, for example.
> >> >> >> >
> >> >> >> > Nonetheless, I think to have learned quite some things the hard
> >> >> >> > way.
> >> >> >> > In this regard, some answers to even "non-sense" questions are
> >> >> >> > worth
> >> >> >> > documenting.
> >> >> >> >
> >> >> >> > My aim is to transfer notes of practical value. Having HPC and
> HTC
> >> >> >> > related notes in a wiki, will help to get started, promote best
> >> >> >> > practices, learn through common mistakes and give an overview
> for
> >> >> >> > the
> >> >> >> > points Peter put in this thread's first message.
> >> >> >>
> >> >> >> +1
> >> >> >>
> >> >> >> >
> >> >> >> > I hope it's fine to name the page "High Performance Computing".
> >> >> >> > Please
> >> >> >> > advise or create a page with another name if you think
> otherwise.
> >> >> >>
> >> >> >>
> >> >> >> +1
> >> >> >>
> >> >> >> Moritz
> >> >> >> _______________________________________________
> >> >> >> grass-user mailing list
> >> >> >> grass-user at lists.osgeo.org
> >> >> >> https://lists.osgeo.org/mailman/listinfo/grass-user
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > grass-user mailing list
> >> >> > grass-user at lists.osgeo.org
> >> >> > https://lists.osgeo.org/mailman/listinfo/grass-user
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20180524/a4ab1ed9/attachment-0001.html>
More information about the grass-user
mailing list