[pycsw-devel] Query endpoints like THREDDS, CKAN, GEONETWORK and harvest the results in a PyCSW catalog

Tom Kralidis tomkralidis at gmail.com
Wed Nov 27 03:03:10 PST 2013


On Tue, Nov 26, 2013 at 7:13 PM, epi <massimodisasha at gmail.com> wrote:
> Hi All,
>
> i’m pretty new to PyCSW and css itself, for my job i need to work with data
> coming from a multitude od data services
>
> US data.gov,  THREDDS (in use but usgs), Geonetwork opensource for global
> data etc ..
>
> i’m slowly going into this interesting world .. and my first aim is :
>
> is it possible to Query an endpoint based on a Spatio-Temporal query plus
> some keywords  and use the retrieved metadata to generate an pycsw catalog ?
>

An example of this query is at
https://github.com/geopython/pycsw/blob/master/tests/suites/default/post/GetRecords-filter-and-bbox-freetext.xml.

> at this link i started to use owslib to query  the US data.gov endpoint :
> 'http://geo.gov.ckan.org/csw'
>

I believe Angelos has more information on this, but FYI this is a beta
deployment of pycsw which does not include native PostGIS support, so
spatial queries on a catalog of this size *may* be slow.  The
forthcoming deployment solves this.

> where  i’m asking for dataset in this area :
>
> bbox=[-141,42,-52,84]
>
> with keywords : temperature
>
> i limit the results to 20 ..
>
> http://nbviewer.ipython.org/urls/gist.github.com/anonymous/d06033eb4363d2ac8403/raw/e09a30fd1870c45ce84c844ff85e62130e8fcede/csw_test.ipynb
>
> now i’m stuck on understand how to use the owslib information in order to
> harves the data retrieved by the query .. in a fresh new catalog in pycsw.
>

So the idea here is to run some targeted queries and store the results
to a local pycsw instance (kind of like a filtered harvest).

>From your IPython notebook example, the next step would be to insert
each metadata XML result into pycsw.

Assuming you have pycsw installed and configured, this is then (using
OWSLib) simply:

>>> mypycsw = 'http://localhost:8000/'
>>> csw2 = CatalogueServiceWeb(mypycsw)
>>> csw2.transaction(ttype='insert', typename='gmd:MD_Metadata', record=rec.xml)

This is a pure CSW-T insert operation.  Your pycsw instance needs to
have transactions enabled, as well as allow for the IP making the
CSW-T request to have transaction privileges.  See the manager section
in the pycsw configuration
(http://pycsw.org/docs/latest/configuration.html) for more info.



> i’ll need to do the same also for THREDDS and  Geonetwork, but to get me
> started .. i chased CKAN.
>

GeoNetwork: use the GN CSW support here to acheive the same as the above

THREDDS: there is an open ticket for THREDDS harvesting
(https://github.com/geopython/pycsw/issues/155), which looks like most
of the work is already there, and what remains is an integration into
pycsw proper.  Any help/contributions are certainly welcome


> Thanks for any help hints, i’d love to share the approach with you trough
> IPython notebooks .. so i can keep track of advancement in documenting the
> needed steps to achieve the final target

Great idea -- what a cool concept!


More information about the pycsw-devel mailing list