[Pywps-dev] Guidelines for netCDF file and opendap accesss within pywps

Jachym Cepicky jachym.cepicky at gmail.com
Fri Jun 29 08:34:14 PDT 2018


Hmm,

I actaully like the idea, that Format [1] would "serve" the
file/data/memory_object methods to the input (and output). This btw could
be used for future implementation of WFS/WCS services as output option.

Registry of mimetypes should be in pywps/inout/formats/__init__.py [2] and
we have GSoC student Jan Pišl, working on it's extension too [3], so this
would fit IMHO

J

[1]
https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L26
[2]
https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L161
[3]
https://github.com/janpisl/pywps/blob/master/pywps/inout/formats/__init__.py#L213

pá 29. 6. 2018 v 17:03 odesílatel David Huard <huard.david at ouranos.ca>
napsal:

> The URL idea sounds good. Will try it.
>
> How do you feel about the dependency issue though ?
>
> One option I've been playing around is to dynamically add methods to
> ComplexInput when the mimetype is discovered. That is, the various handlers
> (href, file, data) could be methods of a MimeInput class that can be
> specialized for different mimetypes. After create_complex_input determines
> the input's mimetype, instead of doing a source.clone(), would instead
> instantiate a mixin class combining ComplexInput and MimeInput. By creating
> a registry of mimetypes and their associated class, users could special
> case the handlers (and the validators) for mimetypes not supported out of
> the box by pywps.
>
> These functionalities could be provided as plugins, so that users would
> pip install pywps.netcdf to get the netcdf support.
>
>
>
>
>
> On Fri, Jun 29, 2018 at 10:42 AM Jachym Cepicky <jachym.cepicky at gmail.com>
> wrote:
>
>> Hi David,
>> I do not have much insight view to netCDF format and opendap. I can
>> imagine, that beside current validators, which do validate
>> on-drive-available files, we could add some pre_fetch validators too.
>>
>>
>> If I understand correctly, PyWPS first parses the request and makes
>> WPSRequest object, then, based on this structure, Process instance along
>> with in- and outputs is contstructed. We need to rewrite pywps, so it does
>> not download data [2] and then the file object is set to the complex input
>>
>> We could add set_url and get_url setter and getter methods to IOHandler,
>> which could behave like set_file and get_file or set_data and get_data (and
>> memory_object), which could implement the special behaviour ?
>>
>> J
>>
>> [2]
>> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L219
>>
>>
>> st 27. 6. 2018 v 14:48 odesílatel David Huard <huard.david at ouranos.ca>
>> napsal:
>>
>>> I've got something working, but it's not pretty... If the input mime
>>> type is application/x-ogc-dods, the href handler skips the downloads and
>>> assigns the link to the `data` attribute. If I ask for the file attribute,
>>> pywps will download the file locally.
>>>
>>> Now if I want my process to support both netCDF files and opendap link,
>>> for file input it's the file_handler that'll set the file attribute, but
>>> then the data attribute will hold the actual file's content, not the path
>>> to the file. I guess I could special case the netcdf mime type in the
>>> file_handler to set data to the file path, but it feels clunky.
>>>
>>> I'm wondering if anyone has a better design idea in mind, that could
>>> extend gracefully to other mime types? Should ComplexInput be subclassed by
>>> mimetype, so that the file, stream and data handling as well as validation
>>> is encapsulated in a class ?
>>>
>>> One problem I can see cropping up is that as pywps extends support for
>>> other "special" mimetypes, the dependencies will become harder to maintain.
>>> Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not
>>> a light dependency. My guess is that pywps should support out of the box
>>> the "light" mime types, and have a plugin mechanism for more complicated
>>> ones.
>>>
>>> David
>>>
>>>
>>> On Tue, Jun 26, 2018 at 9:23 AM David Huard <huard.david at ouranos.ca>
>>> wrote:
>>>
>>>> Thanks !
>>>>
>>>> I'll look at it and come back with a PR.
>>>>
>>>> On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <
>>>> jachym.cepicky at gmail.com> wrote:
>>>>
>>>>> I belive,
>>>>>
>>>>> here is the place, where data get downloaded
>>>>> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L191
>>>>>
>>>>> út 26. 6. 2018 v 14:57 odesílatel David Huard <huard.david at ouranos.ca>
>>>>> napsal:
>>>>>
>>>>>> Hi Jachym,
>>>>>>
>>>>>> Thanks for the pointers, I've started writing validators for netCDF.
>>>>>> I'm still wondering where the decision to download a file is made? Can I
>>>>>> shortcut that decision and avoid a file download if the href is a valid
>>>>>> opendap link, ie it passes the validatenetcdf checks?
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <
>>>>>> jachym.cepicky at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> yes ComplexInput should work for you - you can pass the url with the
>>>>>>> data using "<Reference ... />" element.. see [1] for example
>>>>>>>
>>>>>>> Any Format can have (and has by default) `validator` function, which
>>>>>>> return's, whether the input data are valid or no [3]. You can also use
>>>>>>> `get_format` function [4] and set the validator there.
>>>>>>>
>>>>>>> Example, how validating function can look can be shapefile or gml
>>>>>>> validators [5]
>>>>>>>
>>>>>>> You should probably extend foramts [2] with NetCDF mimetype
>>>>>>>
>>>>>>> But, this will check the file only after it was downloaded to PyWPS
>>>>>>> - not the URL. Still. is that sufficient?
>>>>>>>
>>>>>>> Jachym
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/geopython/pywps/blob/master/tests/requests/wps_execute_request-responsedocument-1.xml#L24
>>>>>>> [2]
>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py
>>>>>>> [3]
>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L42
>>>>>>> [4]
>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L215
>>>>>>> [5]
>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/validator/complexvalidator.py
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> čt 21. 6. 2018 v 17:15 odesílatel David Huard <
>>>>>>> huard.david at ouranos.ca> napsal:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'd like to contribute a pull request to better handle netCDF files
>>>>>>>> in pywps but I don't know where to start.
>>>>>>>>
>>>>>>>> We have a number of processes taking netCDF
>>>>>>>> <https://www.unidata.ucar.edu/software/netcdf/> files as inputs.
>>>>>>>> For those less familiar with the format, netCDF is based on HDF5 and a set
>>>>>>>> of conventions <http://cfconventions.org/>. It is the standard
>>>>>>>> data format in oceanography and climatology. netCDF files are usually
>>>>>>>> stored on servers with support for opendap
>>>>>>>> <https://www.opendap.org/>. This means that users can either
>>>>>>>> download the netCDF file and then open it locally, or use the opendap
>>>>>>>> protocol to open it remotely. What that means is that you can do
>>>>>>>>
>>>>>>>> from netCDF4 import nc
>>>>>>>> ds1 = nc.Dataset("<path to local file>")
>>>>>>>> ds2 = nc.Dataset("<link to opendap address>")
>>>>>>>>
>>>>>>>> and both ds1 and ds2 will behave identically. However ds2 is not
>>>>>>>> downloaded locally, but rather read remotely on demand. If a file contains
>>>>>>>> a 3D matrix (time, lat, lon), you can read one slice of the matrix without
>>>>>>>> downloading it all.
>>>>>>>>
>>>>>>>> Some of our pywps.Process support both netCDF file and opendap
>>>>>>>> access. We define a ComplexInput for the address to an actual netCDF file,
>>>>>>>> and a LiteralInput for the opendap address.
>>>>>>>>
>>>>>>>> My question is whether there would be a clean way for pywps to
>>>>>>>> support both modes with one ComplexInput? Internally, pywps would check if
>>>>>>>> the address supports opendap (just check if nc.Dataset(url) works), and if
>>>>>>>> not, would download the file locally to the server.
>>>>>>>>
>>>>>>>> In both cases, we could do
>>>>>>>>
>>>>>>>> ds = nc.Dataset(requests.inputs['resource'][0].file)
>>>>>>>>
>>>>>>>> I'm willing to put the time to do it, I just don't know where to
>>>>>>>> start.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> pywps-dev mailing list
>>>>>>>> pywps-dev at lists.osgeo.org
>>>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jachym Cepicky
>>>>>>> e-mail: jachym.cepicky gmail com
>>>>>>> URL: http://les-ejk.cz
>>>>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>>>>> _______________________________________________
>>>>>>> pywps-dev mailing list
>>>>>>> pywps-dev at lists.osgeo.org
>>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Jachym Cepicky
>>>>> e-mail: jachym.cepicky gmail com
>>>>> URL: http://les-ejk.cz
>>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>>> _______________________________________________
>>>>> pywps-dev mailing list
>>>>> pywps-dev at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>
>>>>
>>
>> --
>> Jachym Cepicky
>> e-mail: jachym.cepicky gmail com
>> URL: http://les-ejk.cz
>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>> _______________________________________________
>> pywps-dev mailing list
>> pywps-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>
>

-- 
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pywps-dev/attachments/20180629/5ec6533e/attachment.html>


More information about the pywps-dev mailing list