[Pywps-dev] Containerization PyWPS processes

Carsten Ehbrecht ehbrecht at dkrz.de
Thu Sep 21 02:20:38 PDT 2017


I find it useful to have the possibility to launch processes via Docker
containers. I would see this as an *optional extension* to PyWPS ... it
shouldn't be the default, like I have done it for the scheduler
extension. We would have then three ways to launch a processing job:

1. running locally on the PyWPS server (default).
2. launching a docker container.
3. using a batch scheduler system like Slurm and GridEngine.

Option 2) and 3) might need additional Python dependencies ... and of
course a lot more infrastructure around, which needs to be installed
separately.

The "cancel" function necessary for WPS 2.0.0 needs to be implemented
differently for each of these "job delegation" mechanisms. I have
started this for the "batch scheduler" extension ... just the interface,
no implementation yet:

https://github.com/bird-house/pywps/blob/issue-277_scheduler-extension-v2/pywps/processing/basic.py#L21

I haven't looked into it yet ... but I guess the Docker extension might
look similar to the scheduler extension ... in the way how it is handled
by the PyWPS code:

https://github.com/bird-house/pywps/blob/issue-277_scheduler-extension-v2/docs/extensions.rst

Cheers,
Carsten


On 09/21/2017 08:45 AM, Jorge Mendes de Jesus wrote:
> 
> Hi to all
> 
> With new systems you have new problems but new possibilities, another
> possibility would be accountabilty, meaning a process (mianly
> scientific)  can be run, all the logs are inside the docker instance and
> then commit it (freeze it)  and  used by another person to check the
> logs and/or data.
> 
> Disk space in docker is a funny thing, can go between 6mb to 600mb in a
> blink of an eye by changing OS of not cleaning packages etc etc, so lot
> of effort has to be done in optimizing it.  Another advantage is that
> you can determine CPU and resources on docker therefore we have a very
> refined Job resrouces control
> 
> With the new support of Job batch we could extend things to run things
> in docker swarms.
> 
> A bit from experience.... docker systems need a a bit of "love and
> attention" in the beginning and then things run without problems, other
> issue is the extremely fast speed of docker development, you prepare
> things and the docker community makes some changes and everything 
> breaks, had situation once that docker-machine internally was calling
> some scripts for package update and during the night someone in the
> docker community made a small change and for a couple of hours your
> couldnt run docker-machine (until we discovered  it was a big problem).
> 
> If this project goes ahead I would ask if Geocat could sponsor it with
> working hours. 
> 
> Cheers
> Jorge
> 
> On Wed, Sep 20, 2017 at 6:38 PM, Jachym Cepicky
> <jachym.cepicky at gmail.com <mailto:jachym.cepicky at gmail.com>> wrote:
> 
>     Hi,
> 
>     I'm in touch with Adam. 
> 
>      It has big impact on the disc space, I agree - but afaik, it opens
>     new possibilities (imagine, being able to deploy running job to e.g.
>     open shift instaces ..) and other system resources impact should not
>      be that big?
> 
>     What can I say, all the points raised by Jorge are valid - so let's
>     give it a try?
> 
>     J
> 
>     st 20. 9. 2017 v 18:07 odesílatel jorge.dejesus
>     <jorge.dejesus at geocat.net <mailto:jorge.dejesus at geocat.net>> napsal:
> 
>         Hi to all
> 
>         Interresting research topic, but you have a problem with that
>         approach:  starting the process will have a massive overhead
>         (compared with a thread) and will consume alot of disk space and
>         resources  !!!
> 
>         You would have to create the docker image and process  when you
>         install pywps , and then start the docker when the user calls
>         the process.  I am a bit against using such a big system in
>         PyWPS unless someone tries to implement it and run it and show
>         or not that is feasable, *we dont know it until we try it*
> 
>         Those were my 2cent :)
> 
>         J.
> 
> 
> 
>         On 20-09-17 15:41, Adam Laža wrote:
>>         Hi devs,
>>
>>         I am student of geoinformatics at CTU in Prague. Currently I'm
>>         looking for my ma final thesis topic. Yesterday I met with
>>         Jachym and we discussed about containerization PyWPS processes
>>         (probably with Docker). It could be handy for killing/pausing
>>         a process which is as far as I know quite crucial in WPS 2.0.
>>
>>         I'd like to know if somebody already researched this
>>         posibility or whether you have any suggestion or advice.
>>
>>         Thanks in advance.
>>         Adam
>>
>>
>>         _______________________________________________
>>         pywps-dev mailing list
>>         pywps-dev at lists.osgeo.org <mailto:pywps-dev at lists.osgeo.org>
>>         https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>         <https://lists.osgeo.org/mailman/listinfo/pywps-dev>
> 
>         _______________________________________________
>         pywps-dev mailing list
>         pywps-dev at lists.osgeo.org <mailto:pywps-dev at lists.osgeo.org>
>         https://lists.osgeo.org/mailman/listinfo/pywps-dev
>         <https://lists.osgeo.org/mailman/listinfo/pywps-dev>
> 
> 
> 
> 
> _______________________________________________
> pywps-dev mailing list
> pywps-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pywps-dev
> 

-- 
Carsten Ehbrecht
Abteilung Datenmanagement

Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • D-20146 Hamburg • Germany

Phone: +49 40 460094-148
FAX:   +49 40 460094-270
Email: ehbrecht at dkrz.de
URL:   www.dkrz.de

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784


More information about the pywps-dev mailing list