[pycsw-devel] Making Harvesting in a dedicated server PyCSW from catalogs CSW in GeoNode servers

Tom Kralidis tomkralidis at gmail.com
Mon Jun 29 17:26:14 PDT 2015


Hi Davi: FYI I was able to track this down.  For some reason, some
records are malformed from demo.geonode.org:

http://demo.geonode.org/catalogue/csw?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=full&outputschema=http://www.isotc211.org/2005/gmd&resulttype=results&maxrecords=5

I've updated master branch to catch this gracefully and skip
harvesting if this is the case [1].  If you can test master branch and
verify, I will also backport to 1.10 branch.

Thanks

..Tom

[1] https://github.com/geopython/pycsw/commit/64876838e39ccdaac59172ffc01e026a6ad4e0a1

On Tue, Jun 23, 2015 at 12:57 PM, Davi Custodio <davicustodio at gmail.com> wrote:
> Tom, I was able to upgrade to version 1.10.1, but the problem continues. I
> realized that I can point to any CSW that the error is the same. Below the
> error I get to the point to http://demo.geonode.org/catalogue/csw
>
> <! - Pycsw 1.10.1 ->
> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" xmlns:
> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
> atom = "http://www.w3.org/2005/Atom" xmlns: xs =
> "http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/ terms /
> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http:
> //www.opengis.net/gml "xmlns: diff ="
> http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
> http://www.w3.org/1999 / xlink "xmlns: gco ="
> http://www.isotc211.org/2005/gco "xmlns: gmd ="
> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http: //
> www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis .net /
> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns: inspire_ds
> =" http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw =
> "http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi =
> "http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http:
> //a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US "version ="
> 1.2.0 "xsi: schemaLocation =" http://www.opengis.net/ows
> http://schemas.opengis.net/ows /1.0.0/owsExceptionReport.xsd"><ows:Exception
> ExceptionCode = "NoApplicableCode" locator = "source"> <ows: ExceptionText>
> Harvest (insert) failed: ERROR: null value in column "identifier" violates
> not-null constraint
> DETAIL: Failing row contains (null, csw: Record,
> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-23T13: 53: 57Z & lt;
> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
> Missing keyword: service, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, http://demo.geonode.org/catalogue/csw,
> null, null, null, null, null, null, null, null, null, null, null, null,
> null, null, null, null, null, null, null, null, null, null, 'keyword': 2
> 'miss': 1 'servic': 3, null).
> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
>
> In the apache logs also I found nothing. You suspect something?
>
> thanks
>
> On Mon, Jun 22, 2015 at 12:24 PM, Angelos Tzotsos <gcpp.kalxas at gmail.com>
> wrote:
>>
>> pycsw 1.10.1 has just been pushed in ubuntugis unstable. It will be
>> available in ~2h
>>
>>
>> On 06/22/2015 05:30 PM, Davi Custodio wrote:
>>
>> Hi Tom.
>>
>> I do this having trouble accessing external urls in my server. Fix, and
>> did
>> the test again pointing to the xml you stated:
>>
>>
>> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
>>
>> .. But the error was the same:
>>
>> pycsw-admin.py -c -u post_xml http: //localhost/pycsw/csw.py -x / var /
>> www
>> / html / pycsw / bin / request .xml
>>
>> Initializing static context
>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>> http: //localhost/pycsw/csw.py
>> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
>> <! - Pycsw 1.10.0 ->
>> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/"
>> xmlns:
>> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
>> atom = "http://www.w3.org/2005/Atom" xmlns: xs = "
>> http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/ terms
>> /
>> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
>> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http: //
>> www.opengis.net/gml "xmlns: diff ="
>> http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
>> http://www.w3.org/1999 / xlink "xmlns: gco ="
>> http://www.isotc211.org/2005/gco "xmlns: gmd ="
>> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http: //
>> www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
>> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis .net /
>> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns:
>> inspire_ds
>> =" http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw = "
>> http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi = "
>> http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http: //
>> a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
>> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
>> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US "version
>> =" 1.2.0 "xsi: schemaLocation =" http://www.opengis.net/ows
>> http://schemas.opengis.net/ows
>> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
>> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
>> (insert) failed: ERROR: null value in column "identifier" violates
>> not-null
>> constraint
>> DETAIL: Failing row contains (null, csw: Record,
>> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-22T11: 23: 47Z & lt;
>> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
>> Missing keyword: service, null, null, null, null, null, null, null, null,
>> null, null, null, null, null, null, null, null, null, null, null, null,
>> null, null, null, null, null, null, http://demo.geonode.org/catalogue/csw,
>> null, null, null, null, null, null, null, null, null, null, null, null,
>> null, null, null, null, null, null, null, null, null, null, 'keyword': 2
>> 'miss': 1 'servic': 3, null).
>> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
>> Done
>>
>> Attached, I am sending the response for
>>
>>
>> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=1000
>>
>> in which you suggested.
>>
>> I am still using the 1.10.0 version. Could this being resolved by
>> upgrading
>> to 1.10.1? I'm using an Ubuntu 4.14 server. I tried to update the pycsw
>> using apt-get update and then apt-get upgrade, but the pycsw remained in
>> version 1.10.0. How do I upgrade using the Ubuntu packages?
>>
>> Thanks
>>
>> On Fri, Jun 19, 2015 at 4:28 PM, Tom Kralidis <tomkralidis at gmail.com>
>> wrote:
>>
>> Hi Davi: thanks for moving this discussion here from geonode-users.
>> Comments interleaved.
>>
>> On Fri, Jun 19, 2015 at 3:09 PM, Davi Custodio <davicustodio at gmail.com>
>> wrote:
>>
>> Hello. I have a scenario consists of 8 GeoNode servers with their
>>
>> respective
>>
>> PyCSW assets.
>> To test, Getrecords use as:
>>
>>
>>
>> http://localhost/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
>>
>> and receive the corresponding xml correctly.
>>
>> You can also configure the harvest within the Geonetwork using:
>>
>> http: // localhost / catalog / csw version = 2.0.2 & request =
>> GetCapabilities & service = CSW
>>
>> and also the harvest runs properly.
>>
>> I created a dedicated server with the default installation of PyCSW
>>
>> 1.10.0,
>>
>> and I'm trying to get this server collect all the metadata records of 8
>> servers GeoNode.
>>
>> My intention is to have a PyCSW to gather all the metadata of my
>> organization. I'm avoiding using "federatedcatalogues" because I do not
>>
>> want
>>
>> the user who will rep csw need to specify additional parameters.
>>
>> In a first test, I used the command:
>>
>> pycsw-admin.py -c -u post_xml http: -x //localhost/pycsw/csw.py
>> /var/www/html/pycsw/bin/request.xml
>>
>> where request.xml =
>>
>> <? Xml version = "1.0" encoding = "UTF-8"?>
>> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
>> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
>>
>> 2005
>>
>> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
>> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
>> http://purl.org/dc/elements/1.1/ "xmlns: dct ="
>>
>> http://purl.org/dc/terms/
>>
>> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
>> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
>> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
>>
>> csw /
>>
>> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>>   <Source>http://demo.geonode.org//catalogue/csw </ Source>
>>   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>>   <ResourceFormat> application / xml </ ResourceFormat>
>> </ Harvest>
>>
>> pointing to http://demo.geonode.org//catalogue/csw ..
>>
>> I get the following error:
>>
>> Initializing static context
>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>> http: //localhost/pycsw/csw.py
>> Traceback (most recent call last):
>>    File "/usr/bin/pycsw-admin.py", line 246, in <module>
>>      print admin.post_xml (CSW_URL, XML, TIMEOUT)
>>    File "/usr/lib/python2.7/dist-packages/pycsw/admin.py", line 495, in
>> post_xml
>>      raise RuntimeError (err)
>> RuntimeError: timed out
>>
>> I'm not getting any timeout when running here (using 1.10.1 -- any
>> chance you can upgrade your single non-GeoNode pycsw instance to
>> 1.10.1?) using the Harvest XML request like
>>
>>
>> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
>> ,
>> but substituting the Source with http://demo.geonode.org/catalogue/csw
>>
>> pycsw harvests 36 records (1 for the service and 35 metadata records).
>> Are you able to see http://demo.geonode.org/catalogue/csw from the box
>> on which you are running your harvests from?
>>
>> when I point to one of my GeoNode-PyCSW servers using request.xml as:
>>
>> <? Xml version = "1.0" encoding = "UTF-8"?>
>> <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
>> "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
>>
>> 2005
>>
>> / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
>> ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
>> http://purl.org/dc/elements/1.1/ "xmlns: dct ="
>>
>> http://purl.org/dc/terms/
>>
>> "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
>> "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
>> "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
>>
>> csw /
>>
>> 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
>>   <Source> http://aguai.cnpm.embrapa.br/catalogue/csw </ Source>
>>   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
>>   <ResourceFormat> application / xml </ ResourceFormat>
>> </ Harvest>
>>
>> pointing to my server: http://aguai.cnpm.embrapa.br/catalogue/csw
>>
>> I get the error:
>>
>> Initializing static context
>> Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
>> http: //localhost/pycsw/csw.py
>> <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
>> <! - Pycsw 1.10.0 ->
>> <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/"
>>
>> xmlns:
>>
>> inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
>> atom = "http://www.w3.org/2005/Atom" xmlns: xs =
>> "http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/
>>
>> terms /
>>
>> "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
>> http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http:
>> //www.opengis.net/gml "xmlns: diff
>> ="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
>> http://www.w3.org/1999 / xlink "xmlns: gco ="
>> http://www.isotc211.org/2005/gco "xmlns: gmd ="
>> http://www.isotc211.org/2005/gmd "xmlns: rdf =" http:
>> //www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
>> http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis
>>
>> .net /
>>
>> ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns:
>>
>> inspire_ds
>>
>> ="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw =
>> "http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi =
>> "http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http:
>> //a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
>> http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
>> //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US
>>
>> "version ="
>>
>> 1.2.0 "xsi: schemaLocation ="
>> http://www.opengis.net/owshttp://schemas.opengis.net/ows
>> /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
>> "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
>>
>> (insert)
>>
>> failed: ERROR: null value in column "identifier" violates not-null
>> constraint
>> DETAIL: Failing row contains (null, csw: Record,
>> http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-19T15: 19: 28Z &
>>
>> lt;
>>
>> ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
>> Missing keyword: service, null, null, null, null, null, null, null, null,
>> null, null, null, null, null, null, null, null, null, null, null, null,
>> null, null, null, null, null, null,
>> http://aguai.cnpm.embrapa.br/catalogue/csw, null, null, null, null,
>>
>> null,
>>
>> null, null, null, null, null, null, null, null, null, null, null, null,
>> null, null, null, null, null, 'keyword': 2 'miss': 1' servic '3, null).
>> . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
>> Done
>>
>>
>> Can anyone help me explaining how to best implement it? And because of
>> errors?
>>
>> It looks like one of the layers in your pycsw serving GeoNode CSW does
>> not have an identifier?  Can you turn on pycsw logging from your
>> single pycsw to DEBUG and run the harvest again?  There might be
>> valuable information in the log that can help.  Feel free to send me
>> the log output offline.
>>
>> As well, wow many records does
>> http://aguai.cnpm.embrapa.br/catalogue/csw have?  Might be valuable to
>> do a full GetRecords request, i.e.:
>>
>>
>>
>> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=FOO
>>
>> to inspect the actual CSW output that pycsw is trying to harvest.
>> Feel free to send me the result offline.
>>
>> --
>> Davi de O. Custódio
>>
>>
>>
>>
>> _______________________________________________
>> pycsw-devel mailing list
>> pycsw-devel at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/pycsw-devel
>>
>>
>>
>> --
>> Angelos Tzotsos
>> Remote Sensing Laboratory
>> National Technical University of Athens
>> http://users.ntua.gr/tzotsos
>>
>>
>> _______________________________________________
>> pycsw-devel mailing list
>> pycsw-devel at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/pycsw-devel
>
>
>
>
> --
> Davi de O. Custódio


More information about the pycsw-devel mailing list