[Pywps-dev] Guidelines for netCDF file and opendap accesss within pywps

David Huard huard.david at ouranos.ca
Fri Jun 29 08:03:35 PDT 2018


The URL idea sounds good. Will try it.

How do you feel about the dependency issue though ?

One option I've been playing around is to dynamically add methods to
ComplexInput when the mimetype is discovered. That is, the various handlers
(href, file, data) could be methods of a MimeInput class that can be
specialized for different mimetypes. After create_complex_input determines
the input's mimetype, instead of doing a source.clone(), would instead
instantiate a mixin class combining ComplexInput and MimeInput. By creating
a registry of mimetypes and their associated class, users could special
case the handlers (and the validators) for mimetypes not supported out of
the box by pywps.

These functionalities could be provided as plugins, so that users would pip
install pywps.netcdf to get the netcdf support.





On Fri, Jun 29, 2018 at 10:42 AM Jachym Cepicky <jachym.cepicky at gmail.com>
wrote:

> Hi David,
> I do not have much insight view to netCDF format and opendap. I can
> imagine, that beside current validators, which do validate
> on-drive-available files, we could add some pre_fetch validators too.
>
>
> If I understand correctly, PyWPS first parses the request and makes
> WPSRequest object, then, based on this structure, Process instance along
> with in- and outputs is contstructed. We need to rewrite pywps, so it does
> not download data [2] and then the file object is set to the complex input
>
> We could add set_url and get_url setter and getter methods to IOHandler,
> which could behave like set_file and get_file or set_data and get_data (and
> memory_object), which could implement the special behaviour ?
>
> J
>
> [2]
> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L219
>
>
> st 27. 6. 2018 v 14:48 odesílatel David Huard <huard.david at ouranos.ca>
> napsal:
>
>> I've got something working, but it's not pretty... If the input mime type
>> is application/x-ogc-dods, the href handler skips the downloads and assigns
>> the link to the `data` attribute. If I ask for the file attribute, pywps
>> will download the file locally.
>>
>> Now if I want my process to support both netCDF files and opendap link,
>> for file input it's the file_handler that'll set the file attribute, but
>> then the data attribute will hold the actual file's content, not the path
>> to the file. I guess I could special case the netcdf mime type in the
>> file_handler to set data to the file path, but it feels clunky.
>>
>> I'm wondering if anyone has a better design idea in mind, that could
>> extend gracefully to other mime types? Should ComplexInput be subclassed by
>> mimetype, so that the file, stream and data handling as well as validation
>> is encapsulated in a class ?
>>
>> One problem I can see cropping up is that as pywps extends support for
>> other "special" mimetypes, the dependencies will become harder to maintain.
>> Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not
>> a light dependency. My guess is that pywps should support out of the box
>> the "light" mime types, and have a plugin mechanism for more complicated
>> ones.
>>
>> David
>>
>>
>> On Tue, Jun 26, 2018 at 9:23 AM David Huard <huard.david at ouranos.ca>
>> wrote:
>>
>>> Thanks !
>>>
>>> I'll look at it and come back with a PR.
>>>
>>> On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <jachym.cepicky at gmail.com>
>>> wrote:
>>>
>>>> I belive,
>>>>
>>>> here is the place, where data get downloaded
>>>> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L191
>>>>
>>>> út 26. 6. 2018 v 14:57 odesílatel David Huard <huard.david at ouranos.ca>
>>>> napsal:
>>>>
>>>>> Hi Jachym,
>>>>>
>>>>> Thanks for the pointers, I've started writing validators for netCDF.
>>>>> I'm still wondering where the decision to download a file is made? Can I
>>>>> shortcut that decision and avoid a file download if the href is a valid
>>>>> opendap link, ie it passes the validatenetcdf checks?
>>>>>
>>>>>
>>>>> On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <
>>>>> jachym.cepicky at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> yes ComplexInput should work for you - you can pass the url with the
>>>>>> data using "<Reference ... />" element.. see [1] for example
>>>>>>
>>>>>> Any Format can have (and has by default) `validator` function, which
>>>>>> return's, whether the input data are valid or no [3]. You can also use
>>>>>> `get_format` function [4] and set the validator there.
>>>>>>
>>>>>> Example, how validating function can look can be shapefile or gml
>>>>>> validators [5]
>>>>>>
>>>>>> You should probably extend foramts [2] with NetCDF mimetype
>>>>>>
>>>>>> But, this will check the file only after it was downloaded to PyWPS -
>>>>>> not the URL. Still. is that sufficient?
>>>>>>
>>>>>> Jachym
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/geopython/pywps/blob/master/tests/requests/wps_execute_request-responsedocument-1.xml#L24
>>>>>> [2]
>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py
>>>>>> [3]
>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L42
>>>>>> [4]
>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L215
>>>>>> [5]
>>>>>> https://github.com/geopython/pywps/blob/master/pywps/validator/complexvalidator.py
>>>>>>
>>>>>>
>>>>>>
>>>>>> čt 21. 6. 2018 v 17:15 odesílatel David Huard <huard.david at ouranos.ca>
>>>>>> napsal:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'd like to contribute a pull request to better handle netCDF files
>>>>>>> in pywps but I don't know where to start.
>>>>>>>
>>>>>>> We have a number of processes taking netCDF
>>>>>>> <https://www.unidata.ucar.edu/software/netcdf/> files as inputs.
>>>>>>> For those less familiar with the format, netCDF is based on HDF5 and a set
>>>>>>> of conventions <http://cfconventions.org/>. It is the standard data
>>>>>>> format in oceanography and climatology. netCDF files are usually stored on
>>>>>>> servers with support for opendap <https://www.opendap.org/>. This
>>>>>>> means that users can either download the netCDF file and then open it
>>>>>>> locally, or use the opendap protocol to open it remotely. What that means
>>>>>>> is that you can do
>>>>>>>
>>>>>>> from netCDF4 import nc
>>>>>>> ds1 = nc.Dataset("<path to local file>")
>>>>>>> ds2 = nc.Dataset("<link to opendap address>")
>>>>>>>
>>>>>>> and both ds1 and ds2 will behave identically. However ds2 is not
>>>>>>> downloaded locally, but rather read remotely on demand. If a file contains
>>>>>>> a 3D matrix (time, lat, lon), you can read one slice of the matrix without
>>>>>>> downloading it all.
>>>>>>>
>>>>>>> Some of our pywps.Process support both netCDF file and opendap
>>>>>>> access. We define a ComplexInput for the address to an actual netCDF file,
>>>>>>> and a LiteralInput for the opendap address.
>>>>>>>
>>>>>>> My question is whether there would be a clean way for pywps to
>>>>>>> support both modes with one ComplexInput? Internally, pywps would check if
>>>>>>> the address supports opendap (just check if nc.Dataset(url) works), and if
>>>>>>> not, would download the file locally to the server.
>>>>>>>
>>>>>>> In both cases, we could do
>>>>>>>
>>>>>>> ds = nc.Dataset(requests.inputs['resource'][0].file)
>>>>>>>
>>>>>>> I'm willing to put the time to do it, I just don't know where to
>>>>>>> start.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pywps-dev mailing list
>>>>>>> pywps-dev at lists.osgeo.org
>>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jachym Cepicky
>>>>>> e-mail: jachym.cepicky gmail com
>>>>>> URL: http://les-ejk.cz
>>>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>>>> _______________________________________________
>>>>>> pywps-dev mailing list
>>>>>> pywps-dev at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>
>>>>>
>>>>
>>>> --
>>>> Jachym Cepicky
>>>> e-mail: jachym.cepicky gmail com
>>>> URL: http://les-ejk.cz
>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>> _______________________________________________
>>>> pywps-dev mailing list
>>>> pywps-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>
>>>
>
> --
> Jachym Cepicky
> e-mail: jachym.cepicky gmail com
> URL: http://les-ejk.cz
> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
> _______________________________________________
> pywps-dev mailing list
> pywps-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pywps-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pywps-dev/attachments/20180629/0b10ed81/attachment-0001.html>


More information about the pywps-dev mailing list