<div dir="ltr">Hi Massi,<div>I can see we live in a quite different "computational" world :-)</div><div>I will try to further answer your questions below. I do agree completely that if you have access to a good HPC than cloud providers are probably not so needed. If you have a good infrastructure that fits (and even goes beyond) your needs and it is well maintained. I also think that from the point of view of setting up GRASS the two approaches are not so different. And it will be interesting to see the development and comparison of the different set-ups. </div><div><br></div><div>Laura<br><div class="gmail_extra"><br><div class="gmail_quote">On 24 May 2018 at 12:12, Massi Alvioli <span dir="ltr"><<a href="mailto:nocharge@gmail.com" target="_blank">nocharge@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">even if you can tile-up your problem - which probably covers 95% of<br>
the parallelization one can do in GRASS - you still have</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">the problem instantiate the the cloud machines, </blockquote><div><br></div><div>Scriptable: once the instance template is ready it takes few seconds to launch the machines (even hundreds of them)</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">copying data to the multiple instances, gather back the results and<br>
patching them into your final results - the order of last two steps is your choice - and<br></blockquote><div><br></div><div>This is where I am still exploring the most efficient solution. There are storage options to avoid or reduce the needs to copy the data to/from the instances</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I expect all of these operations to be much slower going through the cloud than in any other<br>
architecture. </blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The overall processing time is what matters, from importing initial<br>
data to having the final result available. </blockquote><div><br></div><div>You are right. However I think it depends if the larger data are on own premises or online (e.g. remote sensing images). For example for me it is much faster to download an image on a online instance,<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> especially when the files are already stored on the provider servers. </span></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Of course, if the cloud is the only viable possibility of having<br>
multiple cores, there is no way out. It is also true that everybody<br>
owns a couple of desktop machines with a few tens of computing cores overall ..<br><span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>To set up a cluster with spare desktops you need to follow IT policies and sometimes they are not so easy to adapt. I<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">n my opinion, it is also not so easy </span>
to set up a proper cluster, with shared storage, backup, faster net connections, etc. </div><div> </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888">
M<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
2018-05-24 9:11 GMT+02:00 Laura Poggio <<a href="mailto:laura.poggio@gmail.com">laura.poggio@gmail.com</a>>:<br>
> Hi Massi,<br>
> using multiple single instances of GRASS had advantages (in our workflow)<br>
> when tiling: each tile was sent in its own mapset to a different instance<br>
> for processing.<br>
> I am aware that this can be done on HPC locally. However, doing this on the<br>
> cloud had the advantage (for us) to be able to use many more instances than<br>
> the cores available locally.<br>
><br>
> I think you are right and I/O operation and concurrent database operations<br>
> will be probably slower, but our workflow focus mainly on raster operations<br>
> and integrated GRASS / R models. If these operations can be tiled, then<br>
> there are advantages in doing so on different instances, when one does not<br>
> have access to enough local cores.<br>
><br>
> I am trying to tidy up the workflow used to be able to share. And I am<br>
> looking forward to see other workflows.<br>
><br>
> Thanks<br>
><br>
> Laura<br>
><br>
> On 23 May 2018 at 21:08, Massi Alvioli <<a href="mailto:nocharge@gmail.com">nocharge@gmail.com</a>> wrote:<br>
>><br>
>> Hi Laura,<br>
>><br>
>> well, not actually - it does not answer my question. I mean, I am<br>
>> pretty sure one can have GRASS up and running on some cloud instance,<br>
>> but the point is: when it comes to performance, is that convenient? I<br>
>> mean multi-process performance, of course. There is not much point on<br>
>> running single GRASS instances, if not for very peculiar applications,<br>
>> right? I bet it is not convenient, on any level, either if we look at<br>
>> I/O operations, or mapcalc operations, not to talk about concurrent<br>
>> database operations ... I might be wrong, of course. But my experience<br>
>> with cloud environments and parallel processing were rather<br>
>> disappointing. On some un-related problem (I mean, not GRASS-related),<br>
>> I tried something here <a href="https://doi.org/10.30437/ogrs2016_paper_08" rel="noreferrer" target="_blank">https://doi.org/10.30437/<wbr>ogrs2016_paper_08</a>,<br>
>> with little success. I can't imagine a reason why it should be<br>
>> different using GRASS modules, while I found undoubtfully good<br>
>> performance on HPC machines.<br>
>><br>
>> M<br>
>><br>
>> 2018-05-23 16:35 GMT+02:00 Laura Poggio <<a href="mailto:laura.poggio@gmail.com">laura.poggio@gmail.com</a>>:<br>
>> > Hi Massi,<br>
>> > we managed to run GRASS on different single-core instances on a cloud<br>
>> > provider. It was a bit tricky (initially) to set up the NFS mount<br>
>> > points. I<br>
>> > am still exploring the different types of storage possible and what<br>
>> > would be<br>
>> > cheaper and more efficient.<br>
>> ><br>
>> > I hope this answers your question.<br>
>> ><br>
>> > Once the workflow is more stable I hope I will be able to share it more<br>
>> > widely.<br>
>> ><br>
>> > Thanks<br>
>> ><br>
>> > Laura<br>
>> ><br>
>> > On 23 May 2018 at 14:37, Massi Alvioli <<a href="mailto:nocharge@gmail.com">nocharge@gmail.com</a>> wrote:<br>
>> >><br>
>> >> Hi Laura,<br>
>> >><br>
>> >> the effort on cloud providers is probably useless. Was it different in<br>
>> >> your case?<br>
>> >><br>
>> >><br>
>> >> M<br>
>> >><br>
>> >> 2018-05-22 10:12 GMT+02:00 Laura Poggio <<a href="mailto:laura.poggio@gmail.com">laura.poggio@gmail.com</a>>:<br>
>> >> > I am really interested in this. I am experimenting with different<br>
>> >> > settings<br>
>> >> > to use GRASS on HPC, more specifically on multi-core local machines<br>
>> >> > and<br>
>> >> > on<br>
>> >> > single-core multiple instances on a cloud provider. It would be great<br>
>> >> > to<br>
>> >> > share experiences with other people fighting the same problems.<br>
>> >> ><br>
>> >> > Thanks<br>
>> >> ><br>
>> >> > Laura<br>
>> >> ><br>
>> >> > On 20 May 2018 at 12:32, Moritz Lennert<br>
>> >> > <<a href="mailto:mlennert@club.worldonline.be">mlennert@club.worldonline.be</a>><br>
>> >> > wrote:<br>
>> >> >><br>
>> >> >> Le Sun, 20 May 2018 09:30:53 +0200,<br>
>> >> >> Nikos Alexandris <<a href="mailto:nik@nikosalexandris.net">nik@nikosalexandris.net</a>> a écrit :<br>
>> >> >><br>
>> >> >> > * Massi Alvioli <<a href="mailto:nocharge@gmail.com">nocharge@gmail.com</a>> [2018-05-17 15:01:39 +0200]:<br>
>> >> >> ><br>
>> >> >> > >2018-05-17 10:09 GMT+02:00 Moritz Lennert<br>
>> >> >> > ><<a href="mailto:mlennert@club.worldonline.be">mlennert@club.worldonline.be</a><wbr>>:<br>
>> >> >> > ><br>
>> >> >> > >Hi,<br>
>> >> >> > ><br>
>> >> >> > >> [I imagine your mail was supposed to go onto the mailing list<br>
>> >> >> > >> and<br>
>> >> >> > >> not just to me...]<br>
>> >> >> > ><br>
>> >> >> > >sure my answer was for everyone to read, I believe I tried to<br>
>> >> >> > > send<br>
>> >> >> > > it<br>
>> >> >> > >again afterwards..<br>
>> >> >> > >something must have gone wrong.<br>
>> >> >> > ><br>
>> >> >> > >> I just presented GRASS and a short overview over GRASS on HPC<br>
>> >> >> > >> yesterday at the FOSS4F-FR and there was a lot of interest for<br>
>> >> >> > >> this. Several people asked me about specific documentation on<br>
>> >> >> > >> the<br>
>> >> >> > >> subject.<br>
>> >> >> > ><br>
>> >> >> > >What we did about GRASS + HPC was for specific production<br>
>> >> >> > > purposes<br>
>> >> >> > >and no documentation<br>
>> >> >> > >whatsoever wascreated, basically due to lack of time.. so I find<br>
>> >> >> > > it<br>
>> >> >> > >hard to say whether this is going<br>
>> >> >> > >to change in the near future:). Surely the topic is of wide<br>
>> >> >> > > interest<br>
>> >> >> > >and worth being discussed in<br>
>> >> >> > >several contexts.<br>
>> >> >> > ><br>
>> >> >> > >> Currently, I'm aware of the following wiki pages which each<br>
>> >> >> > >> potentially touches on some aspects of HPC:<br>
>> >> >> > ><br>
>> >> >> > >I must admit that existing documentation/papers did not help<br>
>> >> >> > > much.<br>
>> >> >> > >Well, did not help at all, actually.<br>
>> >> >> > >One major problem in my opinion/experience is that<br>
>> >> >> > >multi-core/multi-node machines can be really<br>
>> >> >> > >different from each other, and parallelization strategies very<br>
>> >> >> > >purpose-specific, so that creating<br>
>> >> >> > >general-purpose documents/papers, or even software, *may* be a<br>
>> >> >> > >hopeless effort. Smart ideas<br>
>> >> >> > >are most welcome, of course:)<br>
>> >> >> ><br>
>> >> >> > Dear Massimo and all,<br>
>> >> >> ><br>
>> >> >> > Being a beginner in massively processing Landsat 8 images using<br>
>> >> >> > JRC's<br>
>> >> >> > JEODPP system (which is designed for High-Throughput,<br>
>> >> >> > <a href="https://doi.org/10.1016/j.future.2017.11.007" rel="noreferrer" target="_blank">https://doi.org/10.1016/j.<wbr>future.2017.11.007</a>), I found useful<br>
>> >> >> > notes<br>
>> >> >> > in<br>
>> >> >> > the Wiki (notably Veronica's excellent tutorials) and elsewhere,<br>
>> >> >> > got<br>
>> >> >> > specific answers through the mailing lists and learned a lot in<br>
>> >> >> > on-site discussions during the last OSGeo sprint, for example.<br>
>> >> >> ><br>
>> >> >> > Nonetheless, I think to have learned quite some things the hard<br>
>> >> >> > way.<br>
>> >> >> > In this regard, some answers to even "non-sense" questions are<br>
>> >> >> > worth<br>
>> >> >> > documenting.<br>
>> >> >> ><br>
>> >> >> > My aim is to transfer notes of practical value. Having HPC and HTC<br>
>> >> >> > related notes in a wiki, will help to get started, promote best<br>
>> >> >> > practices, learn through common mistakes and give an overview for<br>
>> >> >> > the<br>
>> >> >> > points Peter put in this thread's first message.<br>
>> >> >><br>
>> >> >> +1<br>
>> >> >><br>
>> >> >> ><br>
>> >> >> > I hope it's fine to name the page "High Performance Computing".<br>
>> >> >> > Please<br>
>> >> >> > advise or create a page with another name if you think otherwise.<br>
>> >> >><br>
>> >> >><br>
>> >> >> +1<br>
>> >> >><br>
>> >> >> Moritz<br>
>> >> >> ______________________________<wbr>_________________<br>
>> >> >> grass-user mailing list<br>
>> >> >> <a href="mailto:grass-user@lists.osgeo.org">grass-user@lists.osgeo.org</a><br>
>> >> >> <a href="https://lists.osgeo.org/mailman/listinfo/grass-user" rel="noreferrer" target="_blank">https://lists.osgeo.org/<wbr>mailman/listinfo/grass-user</a><br>
>> >> ><br>
>> >> ><br>
>> >> ><br>
>> >> > ______________________________<wbr>_________________<br>
>> >> > grass-user mailing list<br>
>> >> > <a href="mailto:grass-user@lists.osgeo.org">grass-user@lists.osgeo.org</a><br>
>> >> > <a href="https://lists.osgeo.org/mailman/listinfo/grass-user" rel="noreferrer" target="_blank">https://lists.osgeo.org/<wbr>mailman/listinfo/grass-user</a><br>
>> ><br>
>> ><br>
><br>
><br>
</div></div></blockquote></div><br></div></div></div>