[Pywps-dev] Containerization PyWPS processes
Carsten Ehbrecht
ehbrecht at dkrz.de
Thu Sep 21 02:20:38 PDT 2017
I find it useful to have the possibility to launch processes via Docker
containers. I would see this as an *optional extension* to PyWPS ... it
shouldn't be the default, like I have done it for the scheduler
extension. We would have then three ways to launch a processing job:
1. running locally on the PyWPS server (default).
2. launching a docker container.
3. using a batch scheduler system like Slurm and GridEngine.
Option 2) and 3) might need additional Python dependencies ... and of
course a lot more infrastructure around, which needs to be installed
separately.
The "cancel" function necessary for WPS 2.0.0 needs to be implemented
differently for each of these "job delegation" mechanisms. I have
started this for the "batch scheduler" extension ... just the interface,
no implementation yet:
https://github.com/bird-house/pywps/blob/issue-277_scheduler-extension-v2/pywps/processing/basic.py#L21
I haven't looked into it yet ... but I guess the Docker extension might
look similar to the scheduler extension ... in the way how it is handled
by the PyWPS code:
https://github.com/bird-house/pywps/blob/issue-277_scheduler-extension-v2/docs/extensions.rst
Cheers,
Carsten
On 09/21/2017 08:45 AM, Jorge Mendes de Jesus wrote:
>
> Hi to all
>
> With new systems you have new problems but new possibilities, another
> possibility would be accountabilty, meaning a process (mianly
> scientific) can be run, all the logs are inside the docker instance and
> then commit it (freeze it) and used by another person to check the
> logs and/or data.
>
> Disk space in docker is a funny thing, can go between 6mb to 600mb in a
> blink of an eye by changing OS of not cleaning packages etc etc, so lot
> of effort has to be done in optimizing it. Another advantage is that
> you can determine CPU and resources on docker therefore we have a very
> refined Job resrouces control
>
> With the new support of Job batch we could extend things to run things
> in docker swarms.
>
> A bit from experience.... docker systems need a a bit of "love and
> attention" in the beginning and then things run without problems, other
> issue is the extremely fast speed of docker development, you prepare
> things and the docker community makes some changes and everything
> breaks, had situation once that docker-machine internally was calling
> some scripts for package update and during the night someone in the
> docker community made a small change and for a couple of hours your
> couldnt run docker-machine (until we discovered it was a big problem).
>
> If this project goes ahead I would ask if Geocat could sponsor it with
> working hours.
>
> Cheers
> Jorge
>
> On Wed, Sep 20, 2017 at 6:38 PM, Jachym Cepicky
> <jachym.cepicky at gmail.com <mailto:jachym.cepicky at gmail.com>> wrote:
>
> Hi,
>
> I'm in touch with Adam.
>
> It has big impact on the disc space, I agree - but afaik, it opens
> new possibilities (imagine, being able to deploy running job to e.g.
> open shift instaces ..) and other system resources impact should not
> be that big?
>
> What can I say, all the points raised by Jorge are valid - so let's
> give it a try?
>
> J
>
> st 20. 9. 2017 v 18:07 odesílatel jorge.dejesus
> <jorge.dejesus at geocat.net <mailto:jorge.dejesus at geocat.net>> napsal:
>
> Hi to all
>
> Interresting research topic, but you have a problem with that
> approach: starting the process will have a massive overhead
> (compared with a thread) and will consume alot of disk space and
> resources !!!
>
> You would have to create the docker image and process when you
> install pywps , and then start the docker when the user calls
> the process. I am a bit against using such a big system in
> PyWPS unless someone tries to implement it and run it and show
> or not that is feasable, *we dont know it until we try it*
>
> Those were my 2cent :)
>
> J.
>
>
>
> On 20-09-17 15:41, Adam Laža wrote:
>> Hi devs,
>>
>> I am student of geoinformatics at CTU in Prague. Currently I'm
>> looking for my ma final thesis topic. Yesterday I met with
>> Jachym and we discussed about containerization PyWPS processes
>> (probably with Docker). It could be handy for killing/pausing
>> a process which is as far as I know quite crucial in WPS 2.0.
>>
>> I'd like to know if somebody already researched this
>> posibility or whether you have any suggestion or advice.
>>
>> Thanks in advance.
>> Adam
>>
>>
>> _______________________________________________
>> pywps-dev mailing list
>> pywps-dev at lists.osgeo.org <mailto:pywps-dev at lists.osgeo.org>
>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>> <https://lists.osgeo.org/mailman/listinfo/pywps-dev>
>
> _______________________________________________
> pywps-dev mailing list
> pywps-dev at lists.osgeo.org <mailto:pywps-dev at lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/pywps-dev
> <https://lists.osgeo.org/mailman/listinfo/pywps-dev>
>
>
>
>
> _______________________________________________
> pywps-dev mailing list
> pywps-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>
--
Carsten Ehbrecht
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • D-20146 Hamburg • Germany
Phone: +49 40 460094-148
FAX: +49 40 460094-270
Email: ehbrecht at dkrz.de
URL: www.dkrz.de
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784
More information about the pywps-dev
mailing list