[Pywps-dev] Guidelines for netCDF file and opendap accesss within pywps

Jachym Cepicky jachym.cepicky at gmail.com
Fri Jun 29 07:42:18 PDT 2018


Hi David,
I do not have much insight view to netCDF format and opendap. I can
imagine, that beside current validators, which do validate
on-drive-available files, we could add some pre_fetch validators too.


If I understand correctly, PyWPS first parses the request and makes
WPSRequest object, then, based on this structure, Process instance along
with in- and outputs is contstructed. We need to rewrite pywps, so it does
not download data [2] and then the file object is set to the complex input

We could add set_url and get_url setter and getter methods to IOHandler,
which could behave like set_file and get_file or set_data and get_data (and
memory_object), which could implement the special behaviour ?

J

[2] https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L219


st 27. 6. 2018 v 14:48 odesílatel David Huard <huard.david at ouranos.ca>
napsal:

> I've got something working, but it's not pretty... If the input mime type
> is application/x-ogc-dods, the href handler skips the downloads and assigns
> the link to the `data` attribute. If I ask for the file attribute, pywps
> will download the file locally.
>
> Now if I want my process to support both netCDF files and opendap link,
> for file input it's the file_handler that'll set the file attribute, but
> then the data attribute will hold the actual file's content, not the path
> to the file. I guess I could special case the netcdf mime type in the
> file_handler to set data to the file path, but it feels clunky.
>
> I'm wondering if anyone has a better design idea in mind, that could
> extend gracefully to other mime types? Should ComplexInput be subclassed by
> mimetype, so that the file, stream and data handling as well as validation
> is encapsulated in a class ?
>
> One problem I can see cropping up is that as pywps extends support for
> other "special" mimetypes, the dependencies will become harder to maintain.
> Indeed, the netcdfvalidator requires netCDF4 to be installed, which is not
> a light dependency. My guess is that pywps should support out of the box
> the "light" mime types, and have a plugin mechanism for more complicated
> ones.
>
> David
>
>
> On Tue, Jun 26, 2018 at 9:23 AM David Huard <huard.david at ouranos.ca>
> wrote:
>
>> Thanks !
>>
>> I'll look at it and come back with a PR.
>>
>> On Tue, Jun 26, 2018 at 9:16 AM Jachym Cepicky <jachym.cepicky at gmail.com>
>> wrote:
>>
>>> I belive,
>>>
>>> here is the place, where data get downloaded
>>> https://github.com/geopython/pywps/blob/master/pywps/app/Service.py#L191
>>>
>>> út 26. 6. 2018 v 14:57 odesílatel David Huard <huard.david at ouranos.ca>
>>> napsal:
>>>
>>>> Hi Jachym,
>>>>
>>>> Thanks for the pointers, I've started writing validators for netCDF.
>>>> I'm still wondering where the decision to download a file is made? Can I
>>>> shortcut that decision and avoid a file download if the href is a valid
>>>> opendap link, ie it passes the validatenetcdf checks?
>>>>
>>>>
>>>> On Fri, Jun 22, 2018 at 4:53 AM Jachym Cepicky <
>>>> jachym.cepicky at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> yes ComplexInput should work for you - you can pass the url with the
>>>>> data using "<Reference ... />" element.. see [1] for example
>>>>>
>>>>> Any Format can have (and has by default) `validator` function, which
>>>>> return's, whether the input data are valid or no [3]. You can also use
>>>>> `get_format` function [4] and set the validator there.
>>>>>
>>>>> Example, how validating function can look can be shapefile or gml
>>>>> validators [5]
>>>>>
>>>>> You should probably extend foramts [2] with NetCDF mimetype
>>>>>
>>>>> But, this will check the file only after it was downloaded to PyWPS -
>>>>> not the URL. Still. is that sufficient?
>>>>>
>>>>> Jachym
>>>>>
>>>>> [1]
>>>>> https://github.com/geopython/pywps/blob/master/tests/requests/wps_execute_request-responsedocument-1.xml#L24
>>>>> [2]
>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py
>>>>> [3]
>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L42
>>>>> [4]
>>>>> https://github.com/geopython/pywps/blob/master/pywps/inout/formats/__init__.py#L215
>>>>> [5]
>>>>> https://github.com/geopython/pywps/blob/master/pywps/validator/complexvalidator.py
>>>>>
>>>>>
>>>>>
>>>>> čt 21. 6. 2018 v 17:15 odesílatel David Huard <huard.david at ouranos.ca>
>>>>> napsal:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'd like to contribute a pull request to better handle netCDF files
>>>>>> in pywps but I don't know where to start.
>>>>>>
>>>>>> We have a number of processes taking netCDF
>>>>>> <https://www.unidata.ucar.edu/software/netcdf/> files as inputs. For
>>>>>> those less familiar with the format, netCDF is based on HDF5 and a set of
>>>>>> conventions <http://cfconventions.org/>. It is the standard data
>>>>>> format in oceanography and climatology. netCDF files are usually stored on
>>>>>> servers with support for opendap <https://www.opendap.org/>. This
>>>>>> means that users can either download the netCDF file and then open it
>>>>>> locally, or use the opendap protocol to open it remotely. What that means
>>>>>> is that you can do
>>>>>>
>>>>>> from netCDF4 import nc
>>>>>> ds1 = nc.Dataset("<path to local file>")
>>>>>> ds2 = nc.Dataset("<link to opendap address>")
>>>>>>
>>>>>> and both ds1 and ds2 will behave identically. However ds2 is not
>>>>>> downloaded locally, but rather read remotely on demand. If a file contains
>>>>>> a 3D matrix (time, lat, lon), you can read one slice of the matrix without
>>>>>> downloading it all.
>>>>>>
>>>>>> Some of our pywps.Process support both netCDF file and opendap
>>>>>> access. We define a ComplexInput for the address to an actual netCDF file,
>>>>>> and a LiteralInput for the opendap address.
>>>>>>
>>>>>> My question is whether there would be a clean way for pywps to
>>>>>> support both modes with one ComplexInput? Internally, pywps would check if
>>>>>> the address supports opendap (just check if nc.Dataset(url) works), and if
>>>>>> not, would download the file locally to the server.
>>>>>>
>>>>>> In both cases, we could do
>>>>>>
>>>>>> ds = nc.Dataset(requests.inputs['resource'][0].file)
>>>>>>
>>>>>> I'm willing to put the time to do it, I just don't know where to
>>>>>> start.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> pywps-dev mailing list
>>>>>> pywps-dev at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jachym Cepicky
>>>>> e-mail: jachym.cepicky gmail com
>>>>> URL: http://les-ejk.cz
>>>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>>>> _______________________________________________
>>>>> pywps-dev mailing list
>>>>> pywps-dev at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>>>
>>>>
>>>
>>> --
>>> Jachym Cepicky
>>> e-mail: jachym.cepicky gmail com
>>> URL: http://les-ejk.cz
>>> GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
>>> _______________________________________________
>>> pywps-dev mailing list
>>> pywps-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/pywps-dev
>>
>>

-- 
Jachym Cepicky
e-mail: jachym.cepicky gmail com
URL: http://les-ejk.cz
GPG: http://les-ejk.cz/pgp/JachymCepicky.pgp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pywps-dev/attachments/20180629/ade98f20/attachment.html>


More information about the pywps-dev mailing list