[pycsw-devel] Making Harvesting in a dedicated server PyCSW from catalogs CSW in GeoNode servers

Tom Kralidis tomkralidis at gmail.com
Mon Jun 22 07:39:34 PDT 2015


Hi Davi: my mistake: can you send GetRecords output on this request instead:

http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=full&maxrecords=1000

Please email me the XML directly and not through the mailing list.

Perhaps Angelos can comment on 1.10.1 packaging for UbuntuGIS.

Thanks

..Tom



On Mon, 22 Jun 2015, Davi Custodio wrote:

> Date: Mon, 22 Jun 2015 11:30:58 -0300
> From: Davi Custodio <davicustodio at gmail.com>
> To: Tom Kralidis <tomkralidis at gmail.com>
> Cc: "pycsw-devel at lists.osgeo.org" <pycsw-devel at lists.osgeo.org>
> Subject: Re: [pycsw-devel] Making Harvesting in a dedicated server PyCSW from
>     catalogs CSW in GeoNode servers
> 
> Hi Tom.
>
> I do this having trouble accessing external urls in my server. Fix, and did
> the test again pointing to the xml you stated:
>
> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
>
> .. But the error was the same:
>
> pycsw-admin.py -c -u post_xml http: //localhost/pycsw/csw.py -x / var / www
> / html / pycsw / bin / request .xml
>
> Initializing static context
> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
> http: //localhost/pycsw/csw.py
> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
> <! - Pycsw 1.10.0 ->
> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" xmlns:
> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
> atom = "http://www.w3.org/2005/Atom" xmlns: xs = "
> http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/ terms /
> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http: //
> www.opengis.net/gml "xmlns: diff ="
> http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
> http://www.w3.org/1999 / xlink "xmlns: gco ="
> http://www.isotc211.org/2005/gco "xmlns: gmd =" ​​
> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http: //
> www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis .net /
> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns: inspire_ds
> =" http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw = "
> http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi = "
> http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http: //
> a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US "version
> =" 1.2.0 "xsi: schemaLocation =" http://www.opengis.net/ows
> http://schemas.opengis.net/ows
> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
> (insert) failed: ERROR: null value in column "identifier" violates not-null
> constraint
> DETAIL: Failing row contains (null, csw: Record,
> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-22T11: 23: 47Z & lt;
> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
> Missing keyword: service, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, http://demo.geonode.org/catalogue/csw,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, 'keyword': 2
> 'miss': 1 'servic': 3, null).
> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
> Done
>
> Attached, I am sending the response for
>
> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=1000
>
> in which you suggested.
>
> I am still using the 1.10.0 version. Could this being resolved by upgrading
> to 1.10.1? I'm using an Ubuntu 4.14 server. I tried to update the pycsw
> using apt-get update and then apt-get upgrade, but the pycsw remained in
> version 1.10.0. How do I upgrade using the Ubuntu packages?
>
> Thanks
>
> On Fri, Jun 19, 2015 at 4:28 PM, Tom Kralidis <tomkralidis at gmail.com> wrote:
>
>> Hi Davi: thanks for moving this discussion here from geonode-users.
>> Comments interleaved.
>>
>> On Fri, Jun 19, 2015 at 3:09 PM, Davi Custodio <davicustodio at gmail.com>
>> wrote:
>>> Hello. I have a scenario consists of 8 GeoNode servers with their
>> respective
>>> PyCSW assets.
>>> To test, Getrecords use as:
>>>
>>>
>> http://localhost/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
>>>
>>> and receive the corresponding xml correctly.
>>>
>>> You can also configure the harvest within the Geonetwork using:
>>>
>>> http: // localhost / catalog / csw version = 2.0.2 & request =
>>> GetCapabilities & service = CSW
>>>
>>> and also the harvest runs properly.
>>>
>>> I created a dedicated server with the default installation of PyCSW
>> 1.10.0,
>>> and I'm trying to get this server collect all the metadata records of 8
>>> servers GeoNode.
>>>
>>> My intention is to have a PyCSW to gather all the metadata of my
>>> organization. I'm avoiding using "federatedcatalogues" because I do not
>> want
>>> the user who will rep csw need to specify additional parameters.
>>>
>>> In a first test, I used the command:
>>>
>>> pycsw-admin.py -c -u post_xml http: -x //localhost/pycsw/csw.py
>>> /var/www/html/pycsw/bin/request.xml
>>>
>>> where request.xml =
>>>
>>> <? Xml version = "1.0" encoding = "UTF-8"?>
>>> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
>>> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
>> 2005
>>> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
>>> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
>>> http://purl.org/dc/elements/1.1/ "xmlns: dct ="
>> http://purl.org/dc/terms/
>>> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
>>> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
>>> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
>> csw /
>>> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>>>   <Source>http://demo.geonode.org//catalogue/csw </ Source>
>>>   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>>>   <ResourceFormat> application / xml </ ResourceFormat>
>>> </ Harvest>
>>>
>>> pointing to http://demo.geonode.org//catalogue/csw ..
>>>
>>> I get the following error:
>>>
>>> Initializing static context
>>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>>> http: //localhost/pycsw/csw.py
>>> Traceback (most recent call last):
>>>    File "/usr/bin/pycsw-admin.py", line 246, in <module>
>>>      print admin.post_xml (CSW_URL, XML, TIMEOUT)
>>>    File "/usr/lib/python2.7/dist-packages/pycsw/admin.py", line 495, in
>>> post_xml
>>>      raise RuntimeError (err)
>>> RuntimeError: timed out
>>
>> I'm not getting any timeout when running here (using 1.10.1 -- any
>> chance you can upgrade your single non-GeoNode pycsw instance to
>> 1.10.1?) using the Harvest XML request like
>>
>> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
>> ,
>> but substituting the Source with http://demo.geonode.org/catalogue/csw
>>
>> pycsw harvests 36 records (1 for the service and 35 metadata records).
>> Are you able to see http://demo.geonode.org/catalogue/csw from the box
>> on which you are running your harvests from?
>>
>>> when I point to one of my GeoNode-PyCSW servers using request.xml as:
>>>
>>> <? Xml version = "1.0" encoding = "UTF-8"?>
>>> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
>>> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
>> 2005
>>> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
>>> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
>>> http://purl.org/dc/elements/1.1/ "xmlns: dct ="
>> http://purl.org/dc/terms/
>>> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
>>> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
>>> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
>> csw /
>>> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>>>   <Source> http://aguai.cnpm.embrapa.br/catalogue/csw </ Source>
>>>   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>>>   <ResourceFormat> application / xml </ ResourceFormat>
>>> </ Harvest>
>>>
>>> pointing to my server: http://aguai.cnpm.embrapa.br/catalogue/csw
>>>
>>> I get the error:
>>>
>>> Initializing static context
>>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>>> http: //localhost/pycsw/csw.py
>>> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
>>> <! - Pycsw 1.10.0 ->
>>> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/"
>> xmlns:
>>> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
>>> atom = "http://www.w3.org/2005/Atom" xmlns: xs =
>>> "http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/
>> terms /
>>> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
>>> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http:
>>> //www.opengis.net/gml "xmlns: diff
>>> ="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
>>> http://www.w3.org/1999 / xlink "xmlns: gco ="
>>> http://www.isotc211.org/2005/gco "xmlns: gmd ="
>>> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http:
>>> //www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
>>> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis
>> .net /
>>> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns:
>> inspire_ds
>>> ="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw =
>>> "http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi =
>>> "http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http:
>>> //a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
>>> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
>>> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US
>> "version ="
>>> 1.2.0 "xsi: schemaLocation ="
>>> http://www.opengis.net/owshttp://schemas.opengis.net/ows
>>> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
>>> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
>> (insert)
>>> failed: ERROR: null value in column "identifier" violates not-null
>>> constraint
>>> DETAIL: Failing row contains (null, csw: Record,
>>> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-19T15: 19: 28Z &
>> lt;
>>> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
>>> Missing keyword: service, null, null, null, null, null, null, null, null,
>>> null, null, null, null, null, null, null, null, null, null, null, null,
>>> null, null, null, null, null, null,
>>> http://aguai.cnpm.embrapa.br/catalogue/csw, null, null, null, null,
>> null,
>>> null, null, null, null, null, null, null, null, null, null, null, null,
>>> null, null, null, null, null, 'keyword': 2 'miss': 1' servic '3, null).
>>> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
>>> Done
>>>
>>>
>>> Can anyone help me explaining how to best implement it? And because of
>>> errors?
>>>
>>
>> It looks like one of the layers in your pycsw serving GeoNode CSW does
>> not have an identifier?  Can you turn on pycsw logging from your
>> single pycsw to DEBUG and run the harvest again?  There might be
>> valuable information in the log that can help.  Feel free to send me
>> the log output offline.
>>
>> As well, wow many records does
>> http://aguai.cnpm.embrapa.br/catalogue/csw have?  Might be valuable to
>> do a full GetRecords request, i.e.:
>>
>>
>> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=FOO
>>
>> to inspect the actual CSW output that pycsw is trying to harvest.
>> Feel free to send me the result offline.
>>
>>> --
>>> Davi de O. Custódio
>>>
>>
>
>
>
> -- 
> Davi de O. Custódio
>


More information about the pycsw-devel mailing list