[pycsw-devel] Making Harvesting in a dedicated server PyCSW from catalogs CSW in GeoNode servers

Angelos Tzotsos gcpp.kalxas at gmail.com
Mon Jun 22 08:24:24 PDT 2015


pycsw 1.10.1 has just been pushed in ubuntugis unstable. It will be 
available in ~2h

On 06/22/2015 05:30 PM, Davi Custodio wrote:
> Hi Tom.
>
> I do this having trouble accessing external urls in my server. Fix, and did
> the test again pointing to the xml you stated:
>
> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
>
> .. But the error was the same:
>
> pycsw-admin.py -c -u post_xml http: //localhost/pycsw/csw.py -x / var / www
> / html / pycsw / bin / request .xml
>
> Initializing static context
> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
> http: //localhost/pycsw/csw.py
> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
> <! - Pycsw 1.10.0 ->
> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" xmlns:
> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
> atom = "http://www.w3.org/2005/Atom" xmlns: xs = "
> http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/ terms /
> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http: //
> www.opengis.net/gml "xmlns: diff ="
> http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
> http://www.w3.org/1999 / xlink "xmlns: gco ="
> http://www.isotc211.org/2005/gco "xmlns: gmd =" ​​
> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http: //
> www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis .net /
> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns: inspire_ds
> =" http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw = "
> http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi = "
> http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http: //
> a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US "version
> =" 1.2.0 "xsi: schemaLocation =" http://www.opengis.net/ows
> http://schemas.opengis.net/ows
> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
> (insert) failed: ERROR: null value in column "identifier" violates not-null
> constraint
> DETAIL: Failing row contains (null, csw: Record,
> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-22T11: 23: 47Z & lt;
> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
> Missing keyword: service, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, http://demo.geonode.org/catalogue/csw,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, 'keyword': 2
> 'miss': 1 'servic': 3, null).
> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
> Done
>
> Attached, I am sending the response for
>
> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=1000
>
> in which you suggested.
>
> I am still using the 1.10.0 version. Could this being resolved by upgrading
> to 1.10.1? I'm using an Ubuntu 4.14 server. I tried to update the pycsw
> using apt-get update and then apt-get upgrade, but the pycsw remained in
> version 1.10.0. How do I upgrade using the Ubuntu packages?
>
> Thanks
>
> On Fri, Jun 19, 2015 at 4:28 PM, Tom Kralidis <tomkralidis at gmail.com> wrote:
>
>> Hi Davi: thanks for moving this discussion here from geonode-users.
>> Comments interleaved.
>>
>> On Fri, Jun 19, 2015 at 3:09 PM, Davi Custodio <davicustodio at gmail.com>
>> wrote:
>>> Hello. I have a scenario consists of 8 GeoNode servers with their
>> respective
>>> PyCSW assets.
>>> To test, Getrecords use as:
>>>
>>>
>> http://localhost/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
>>> and receive the corresponding xml correctly.
>>>
>>> You can also configure the harvest within the Geonetwork using:
>>>
>>> http: // localhost / catalog / csw version = 2.0.2 & request =
>>> GetCapabilities & service = CSW
>>>
>>> and also the harvest runs properly.
>>>
>>> I created a dedicated server with the default installation of PyCSW
>> 1.10.0,
>>> and I'm trying to get this server collect all the metadata records of 8
>>> servers GeoNode.
>>>
>>> My intention is to have a PyCSW to gather all the metadata of my
>>> organization. I'm avoiding using "federatedcatalogues" because I do not
>> want
>>> the user who will rep csw need to specify additional parameters.
>>>
>>> In a first test, I used the command:
>>>
>>> pycsw-admin.py -c -u post_xml http: -x //localhost/pycsw/csw.py
>>> /var/www/html/pycsw/bin/request.xml
>>>
>>> where request.xml =
>>>
>>> <? Xml version = "1.0" encoding = "UTF-8"?>
>>> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
>>> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
>> 2005
>>> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
>>> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
>>> http://purl.org/dc/elements/1.1/ "xmlns: dct ="
>> http://purl.org/dc/terms/
>>> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
>>> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
>>> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
>> csw /
>>> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>>>    <Source>http://demo.geonode.org//catalogue/csw </ Source>
>>>    <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>>>    <ResourceFormat> application / xml </ ResourceFormat>
>>> </ Harvest>
>>>
>>> pointing to http://demo.geonode.org//catalogue/csw ..
>>>
>>> I get the following error:
>>>
>>> Initializing static context
>>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>>> http: //localhost/pycsw/csw.py
>>> Traceback (most recent call last):
>>>     File "/usr/bin/pycsw-admin.py", line 246, in <module>
>>>       print admin.post_xml (CSW_URL, XML, TIMEOUT)
>>>     File "/usr/lib/python2.7/dist-packages/pycsw/admin.py", line 495, in
>>> post_xml
>>>       raise RuntimeError (err)
>>> RuntimeError: timed out
>> I'm not getting any timeout when running here (using 1.10.1 -- any
>> chance you can upgrade your single non-GeoNode pycsw instance to
>> 1.10.1?) using the Harvest XML request like
>>
>> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
>> ,
>> but substituting the Source with http://demo.geonode.org/catalogue/csw
>>
>> pycsw harvests 36 records (1 for the service and 35 metadata records).
>> Are you able to see http://demo.geonode.org/catalogue/csw from the box
>> on which you are running your harvests from?
>>
>>> when I point to one of my GeoNode-PyCSW servers using request.xml as:
>>>
>>> <? Xml version = "1.0" encoding = "UTF-8"?>
>>> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
>>> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
>> 2005
>>> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
>>> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
>>> http://purl.org/dc/elements/1.1/ "xmlns: dct ="
>> http://purl.org/dc/terms/
>>> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
>>> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
>>> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
>> csw /
>>> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>>>    <Source> http://aguai.cnpm.embrapa.br/catalogue/csw </ Source>
>>>    <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>>>    <ResourceFormat> application / xml </ ResourceFormat>
>>> </ Harvest>
>>>
>>> pointing to my server: http://aguai.cnpm.embrapa.br/catalogue/csw
>>>
>>> I get the error:
>>>
>>> Initializing static context
>>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>>> http: //localhost/pycsw/csw.py
>>> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
>>> <! - Pycsw 1.10.0 ->
>>> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/"
>> xmlns:
>>> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
>>> atom = "http://www.w3.org/2005/Atom" xmlns: xs =
>>> "http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/
>> terms /
>>> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
>>> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http:
>>> //www.opengis.net/gml "xmlns: diff
>>> ="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
>>> http://www.w3.org/1999 / xlink "xmlns: gco ="
>>> http://www.isotc211.org/2005/gco "xmlns: gmd ="
>>> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http:
>>> //www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
>>> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis
>> .net /
>>> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns:
>> inspire_ds
>>> ="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw =
>>> "http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi =
>>> "http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http:
>>> //a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
>>> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
>>> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US
>> "version ="
>>> 1.2.0 "xsi: schemaLocation ="
>>> http://www.opengis.net/owshttp://schemas.opengis.net/ows
>>> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
>>> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
>> (insert)
>>> failed: ERROR: null value in column "identifier" violates not-null
>>> constraint
>>> DETAIL: Failing row contains (null, csw: Record,
>>> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-19T15: 19: 28Z &
>> lt;
>>> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
>>> Missing keyword: service, null, null, null, null, null, null, null, null,
>>> null, null, null, null, null, null, null, null, null, null, null, null,
>>> null, null, null, null, null, null,
>>> http://aguai.cnpm.embrapa.br/catalogue/csw, null, null, null, null,
>> null,
>>> null, null, null, null, null, null, null, null, null, null, null, null,
>>> null, null, null, null, null, 'keyword': 2 'miss': 1' servic '3, null).
>>> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
>>> Done
>>>
>>>
>>> Can anyone help me explaining how to best implement it? And because of
>>> errors?
>>>
>> It looks like one of the layers in your pycsw serving GeoNode CSW does
>> not have an identifier?  Can you turn on pycsw logging from your
>> single pycsw to DEBUG and run the harvest again?  There might be
>> valuable information in the log that can help.  Feel free to send me
>> the log output offline.
>>
>> As well, wow many records does
>> http://aguai.cnpm.embrapa.br/catalogue/csw have?  Might be valuable to
>> do a full GetRecords request, i.e.:
>>
>>
>> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=FOO
>>
>> to inspect the actual CSW output that pycsw is trying to harvest.
>> Feel free to send me the result offline.
>>
>>> --
>>> Davi de O. Custódio
>>>
>
>
>
>
> _______________________________________________
> pycsw-devel mailing list
> pycsw-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pycsw-devel


-- 
Angelos Tzotsos
Remote Sensing Laboratory
National Technical University of Athens
http://users.ntua.gr/tzotsos

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20150622/ca60d3a1/attachment-0001.html>


More information about the pycsw-devel mailing list