[Pywps-dev] Guidelines for netCDF file and opendap accesss within pywps

David Huard huard.david at ouranos.ca
Fri Jun 29 08:46:21 PDT 2018


Excellent. Will do.

On Fri, Jun 29, 2018 at 11:34 AM Jachym Cepicky <jachym.cepicky at gmail.com>
wrote:

> Hmm,
>
> I actaully like the idea, that Format [1] would "serve" the
> file/data/memory_object methods to the input (and output). This btw could
> be used for future implementation of WFS/WCS services as output option.
>
> Registry of mimetypes should be in pywps/inout/formats/__init__.py [2] and
> we have GSoC student Jan Pišl, working on it's extension too [3], so this
> would fit IMHO
>
> J
>
> [1]
> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L26
> [2]
> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L161
> [3]
> https://github.com/janpisl/pywps/blob/master/pywps/inout/formats/__init__.py#L213
>
> pá 29. 6. 2018 v 17:03 odesílatel David Huard <huard.david at ouranos.ca>
> napsal:
>
>> The URL idea sounds good. Will try it.
>>
>> How do you feel about the dependency issue though ?
>>
>> One option I've been playing around is to dynamically add methods to
>> ComplexInput when the mimetype is discovered. That is, the various handlers
>> (href, file, data) could be methods of a MimeInput class that can be
>> specialized for different mimetypes. After create_complex_input determines
>> the input's mimetype, instead of doing a source.clone(), would instead
>> instantiate a mixin class combining ComplexInput and MimeInput. By creating
>> a registry of mimetypes and their associated class, users could special
>> case the handlers (and the validators) for mimetypes not supported out of
>> the box by pywps.
>>
>> These functionalities could be provided as plugins, so that users would
>> pip install pywps.netcdf to get the netcdf support.
>>
>>
>>
>>
>>
>> On Fri, Jun 29, 2018 at 10:42 AM Jachym Cepicky <jachym.cepicky at gmail.com>
>> wrote:
>>
>>> Hi David,
>>> I do not have much insight view to netCDF format and opendap. I can
>>> imagine, that beside current validators, which do validate
>>> on-drive-available files, we could add some pre_fetch validators too.
>>>
>>>
>>> If I understand correctly, PyWPS first parses the request and makes
>>> WPSRequest object, then, based on this structure, Process instance along
>>> with in- and outputs is contstructed. We need to rewrite pywps, so it does
>>> not download data [2] and then the file object is set to the complex input
>>>
>>> We could add set_url and get_url setter and getter methods to IOHandler,
>>> which could behave like set_file and get_file or set_data and get_data (and
>>> memory_object), which could implement the special behaviour ?
>>>
>>> J
>>>
>>> [2]
>>> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L219
>>>
>>>
>>> st 27. 6. 2018 v 14:48 odesílatel David Huard <huard.david at ouranos.ca>
>>> napsal:
>>>
>>>> I've got something working, but it's not pretty... If the input mime
>>>> type is application/x-ogc-dods, the href handler skips the downloads and
>>>> assigns the link to the `data` attribute. If I ask for the file attribute,
>>>> pywps will download the file locally.
>>>>
>>>> Now if I want my process to support both netCDF files and opendap link,
>>>> for file input it's the file_handler that'll set the file attribute, but
>>>> then the data attribute will hold the actual file's content, not the path
>>>> to the file. I guess I could special case the netcdf mime type in the
>>>> file_handler to set data to the file path, but it feels clunky.
>>>>
>>>> I'm wondering if anyone has a better design idea in mind, that could
>>>> extend gracefully to other mime types? Should ComplexInput be subclassed by
>>>> mimetype, so that the file, stream and data handling as well as validation
>>>> is encapsulated in a class ?
>>>>
>>>> One problem I can see cropping up is that as pywps extends support for
>>>> other "special" mimetypes, the dependencies will become harder to maintain.
>>>> Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not
>>>> a light dependency. My guess is that pywps should support out of the box
>>>> the "light" mime types, and have a plugin mechanism for more complicated
>>>> ones.
>>>>
>>>> David
>>>>
>>>>
>>>> On Tue, Jun 26, 2018 at 9:23 AM David Huard <huard.david at ouranos.ca>
>>>> wrote:
>>>>
>>>>> Thanks !
>>>>>
>>>>> I'll look at it and come back with a PR.
>>>>>
>>>>> On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <
>>>>> jachym.cepicky at gmail.com> wrote:
>>>>>
>>>>>> I belive,
>>>>>>
>>>>>> here is the place, where data get downloaded
>>>>>> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L191
>>>>>>
>>>>>> út 26. 6. 2018 v 14:57 odesílatel David Huard <huard.david at ouranos.ca>
>>>>>> napsal:
>>>>>>
>>>>>>> Hi Jachym,
>>>>>>>
>>>>>>> Thanks for the pointers, I've started writing validators for netCDF.
>>>>>>> I'm still wondering where the decision to download a file is made? Can I
>>>>>>> shortcut that decision and avoid a file download if the href is a valid
>>>>>>> opendap link, ie it passes the validatenetcdf checks?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <
>>>>>>> jachym.cepicky at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> yes ComplexInput should work for you - you can pass the url with
>>>>>>>> the data using "<Reference ... />" element.. see [1] for example
>>>>>>>>
>>>>>>>> Any Format can have (and has by default) `validator` function,
>>>>>>>> which return's, whether the input data are valid or no [3]. You can also
>>>>>>>> use `get_format` function [4] and set the validator there.
>>>>>>>>
>>>>>>>> Example, how validating function can look can be shapefile or gml
>>>>>>>> validators [5]
>>>>>>>>
>>>>>>>> You should probably extend foramts [2] with NetCDF mimetype
>>>>>>>>
>>>>>>>> But, this will check the file only after it was downloaded to PyWPS
>>>>>>>> - not the URL. Still. is that sufficient?
>>>>>>>>
>>>>>>>> Jachym
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/geopython/pywps/blob/master/tests/requests/wps_execute_request-responsedocument-1.xml#L24
>>>>>>>> [2]
>>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py
>>>>>>>> [3]
>>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L42
>>>>>>>> [4]
>>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L215
>>>>>>>> [5]
>>>>>>>> https://github.com/geopython/pywps/blob/master/pywps/validator/complexvalidator.py
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> čt 21. 6. 2018 v 17:15 odesílatel David Huard <
>>>>>>>> huard.david at ouranos.ca> napsal:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'd like to contribute a pull request to better handle netCDF
>>>>>>>>> files in pywps but I don't know where to start.
>>>>>>>>>
>>>>>>>>> We have a number of processes taking netCDF
>>>>>>>>> <https://www.unidata.ucar.edu/software/netcdf/> files as inputs.
>>>>>>>>> For those less familiar with the format, netCDF is based on HDF5 and a set
>>>>>>>>> of conventions <http://cfconventions.org/>. It is the standard
>>>>>>>>> data format in oceanography and climatology. netCDF files are usually
>>>>>>>>> stored on servers with support for opendap
>>>>>>>>> <https://www.opendap.org/>. This means that users can either
>>>>>>>>> download the netCDF file and then open it locally, or use the opendap
>>>>>>>>> protocol to open it remotely. What that means is that you can do
>>>>>>>>>
>>>>>>>>> from netCDF4 import nc
>>>>>>>>> ds1 = nc.Dataset("<path to local file>")
>>>>>>>>> ds2 = nc.Dataset("<link to opendap address>")
>>>>>>>>>
>>>>>>>>> and both ds1 and ds2 will behave identically. However ds2 is not
>>>>>>>>> downloaded locally, but rather read remotely on demand. If a file contains
>>>>>>>>> a 3D matrix (time, lat, lon), you can read one slice of the matrix without
>>>>>>>>> downloading it all.
>>>>>>>>>
>>>>>>>>> Some of our pywps.Process support both netCDF file and opendap
>>>>>>>>> access. We define a ComplexInput for the address to an actual netCDF file,
>>>>>>>>> and a LiteralInput for the opendap address.
>>>>>>>>>
>>>>>>>>> My question is whether there would be a clean way for pywps to
>>>>>>>>> support both modes with one ComplexInput? Internally, pywps would check if
>>>>>>>>> the address supports opendap (just check if nc.Dataset(url) works), and if
>>>>>>>>> not, would download the file locally to the server.
>>>>>>>>>
>>>>>>>>> In both cases, we could do
>>>>>>>>>
>>>>>>>>> ds = nc.Dataset(requests.inputs['resource'][0].file)
>>>>>>>>>
>>>>>>>>> I'm willing to put the time to do it, I just don't know where to
>>>>>>>>> start.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> pywps-dev mailing list
>>>>>>>>> pywps-dev at lists.osgeo.org
>>>>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jachym Cepicky
>>>>>>>> e-mail: jachym.cepicky gmail com
>>>>>>>> URL: http://les-ejk.cz
>>>>>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>>>>>> _______________________________________________
>>>>>>>> pywps-dev mailing list
>>>>>>>> pywps-dev at lists.osgeo.org
>>>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jachym Cepicky
>>>>>> e-mail: jachym.cepicky gmail com
>>>>>> URL: http://les-ejk.cz
>>>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>>>> _______________________________________________
>>>>>> pywps-dev mailing list
>>>>>> pywps-dev at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>
>>>>>
>>>
>>> --
>>> Jachym Cepicky
>>> e-mail: jachym.cepicky gmail com
>>> URL: http://les-ejk.cz
>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>> _______________________________________________
>>> pywps-dev mailing list
>>> pywps-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>
>>
>
> --
> Jachym Cepicky
> e-mail: jachym.cepicky gmail com
> URL: http://les-ejk.cz
> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
> _______________________________________________
> pywps-dev mailing list
> pywps-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pywps-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pywps-dev/attachments/20180629/2d13caee/attachment-0001.html>


More information about the pywps-dev mailing list