[pycsw-devel] Making Harvesting in a dedicated server PyCSW from catalogs CSW in GeoNode servers

Tom Kralidis tomkralidis at gmail.com
Fri Jun 19 12:28:12 PDT 2015


Hi Davi: thanks for moving this discussion here from geonode-users.
Comments interleaved.

On Fri, Jun 19, 2015 at 3:09 PM, Davi Custodio <davicustodio at gmail.com> wrote:
> Hello. I have a scenario consists of 8 GeoNode servers with their respective
> PyCSW assets.
> To test, Getrecords use as:
>
> http://localhost/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
>
> and receive the corresponding xml correctly.
>
> You can also configure the harvest within the Geonetwork using:
>
> http: // localhost / catalog / csw version = 2.0.2 & request =
> GetCapabilities & service = CSW
>
> and also the harvest runs properly.
>
> I created a dedicated server with the default installation of PyCSW 1.10.0,
> and I'm trying to get this server collect all the metadata records of 8
> servers GeoNode.
>
> My intention is to have a PyCSW to gather all the metadata of my
> organization. I'm avoiding using "federatedcatalogues" because I do not want
> the user who will rep csw need to specify additional parameters.
>
> In a first test, I used the command:
>
> pycsw-admin.py -c -u post_xml http: -x //localhost/pycsw/csw.py
> /var/www/html/pycsw/bin/request.xml
>
> where request.xml =
>
> <? Xml version = "1.0" encoding = "UTF-8"?>
> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org / 2005
> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
> http://purl.org/dc/elements/1.1/ "xmlns: dct =" http://purl.org/dc/terms/
> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net / csw /
> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>   <Source>http://demo.geonode.org//catalogue/csw </ Source>
>   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>   <ResourceFormat> application / xml </ ResourceFormat>
> </ Harvest>
>
> pointing to http://demo.geonode.org//catalogue/csw ..
>
> I get the following error:
>
> Initializing static context
> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
> http: //localhost/pycsw/csw.py
> Traceback (most recent call last):
>    File "/usr/bin/pycsw-admin.py", line 246, in <module>
>      print admin.post_xml (CSW_URL, XML, TIMEOUT)
>    File "/usr/lib/python2.7/dist-packages/pycsw/admin.py", line 495, in
> post_xml
>      raise RuntimeError (err)
> RuntimeError: timed out

I'm not getting any timeout when running here (using 1.10.1 -- any
chance you can upgrade your single non-GeoNode pycsw instance to
1.10.1?) using the Harvest XML request like
https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml,
but substituting the Source with http://demo.geonode.org/catalogue/csw

pycsw harvests 36 records (1 for the service and 35 metadata records).
Are you able to see http://demo.geonode.org/catalogue/csw from the box
on which you are running your harvests from?

> when I point to one of my GeoNode-PyCSW servers using request.xml as:
>
> <? Xml version = "1.0" encoding = "UTF-8"?>
> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org / 2005
> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
> http://purl.org/dc/elements/1.1/ "xmlns: dct =" http://purl.org/dc/terms/
> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net / csw /
> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>   <Source> http://aguai.cnpm.embrapa.br/catalogue/csw </ Source>
>   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>   <ResourceFormat> application / xml </ ResourceFormat>
> </ Harvest>
>
> pointing to my server: http://aguai.cnpm.embrapa.br/catalogue/csw
>
> I get the error:
>
> Initializing static context
> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
> http: //localhost/pycsw/csw.py
> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
> <! - Pycsw 1.10.0 ->
> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" xmlns:
> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
> atom = "http://www.w3.org/2005/Atom" xmlns: xs =
> "http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/ terms /
> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http:
> //www.opengis.net/gml "xmlns: diff
> ="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
> http://www.w3.org/1999 / xlink "xmlns: gco ="
> http://www.isotc211.org/2005/gco "xmlns: gmd ="
> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http:
> //www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis .net /
> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns: inspire_ds
> ="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw =
> "http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi =
> "http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http:
> //a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US "version ="
> 1.2.0 "xsi: schemaLocation ="
> http://www.opengis.net/owshttp://schemas.opengis.net/ows
> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest (insert)
> failed: ERROR: null value in column "identifier" violates not-null
> constraint
> DETAIL: Failing row contains (null, csw: Record,
> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-19T15: 19: 28Z & lt;
> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
> Missing keyword: service, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null,
> http://aguai.cnpm.embrapa.br/catalogue/csw, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, 'keyword': 2 'miss': 1' servic '3, null).
> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
> Done
>
>
> Can anyone help me explaining how to best implement it? And because of
> errors?
>

It looks like one of the layers in your pycsw serving GeoNode CSW does
not have an identifier?  Can you turn on pycsw logging from your
single pycsw to DEBUG and run the harvest again?  There might be
valuable information in the log that can help.  Feel free to send me
the log output offline.

As well, wow many records does
http://aguai.cnpm.embrapa.br/catalogue/csw have?  Might be valuable to
do a full GetRecords request, i.e.:

http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=FOO

to inspect the actual CSW output that pycsw is trying to harvest.
Feel free to send me the result offline.

> --
> Davi de O. Custódio
>


More information about the pycsw-devel mailing list