[pycsw-devel] Making Harvesting in a dedicated server PyCSW from catalogs CSW in GeoNode servers

Davi Custodio davicustodio at gmail.com
Mon Jun 22 07:30:58 PDT 2015


Hi Tom.

I do this having trouble accessing external urls in my server. Fix, and did
the test again pointing to the xml you stated:

https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml

.. But the error was the same:

pycsw-admin.py -c -u post_xml http: //localhost/pycsw/csw.py -x / var / www
/ html / pycsw / bin / request .xml

Initializing static context
Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
http: //localhost/pycsw/csw.py
<? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
<! - Pycsw 1.10.0 ->
<Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" xmlns:
inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
atom = "http://www.w3.org/2005/Atom" xmlns: xs = "
http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/ terms /
"xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http: //
www.opengis.net/gml "xmlns: diff ="
http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
http://www.w3.org/1999 / xlink "xmlns: gco ="
http://www.isotc211.org/2005/gco "xmlns: gmd =" ​​
http://www.isotc211.org/2005/gmd "xmlns: rdf =" http: //
www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis .net /
ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns: inspire_ds
=" http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw = "
http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi = "
http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http: //
a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
//www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US "version
=" 1.2.0 "xsi: schemaLocation =" http://www.opengis.net/ows
http://schemas.opengis.net/ows
/1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
"NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
(insert) failed: ERROR: null value in column "identifier" violates not-null
constraint
DETAIL: Failing row contains (null, csw: Record,
http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-22T11: 23: 47Z & lt;
ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
Missing keyword: service, null, null, null, null, null, null, null, null,
null, null, null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, null, http://demo.geonode.org/catalogue/csw,
null, null, null, null, null, null, null, null, null, null, null, null,
null, null, null, null, null, null, null, null, null, null, 'keyword': 2
'miss': 1 'servic': 3, null).
. </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
Done

Attached, I am sending the response for

http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=1000

in which you suggested.

I am still using the 1.10.0 version. Could this being resolved by upgrading
to 1.10.1? I'm using an Ubuntu 4.14 server. I tried to update the pycsw
using apt-get update and then apt-get upgrade, but the pycsw remained in
version 1.10.0. How do I upgrade using the Ubuntu packages?

Thanks

On Fri, Jun 19, 2015 at 4:28 PM, Tom Kralidis <tomkralidis at gmail.com> wrote:

> Hi Davi: thanks for moving this discussion here from geonode-users.
> Comments interleaved.
>
> On Fri, Jun 19, 2015 at 3:09 PM, Davi Custodio <davicustodio at gmail.com>
> wrote:
> > Hello. I have a scenario consists of 8 GeoNode servers with their
> respective
> > PyCSW assets.
> > To test, Getrecords use as:
> >
> >
> http://localhost/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary
> >
> > and receive the corresponding xml correctly.
> >
> > You can also configure the harvest within the Geonetwork using:
> >
> > http: // localhost / catalog / csw version = 2.0.2 & request =
> > GetCapabilities & service = CSW
> >
> > and also the harvest runs properly.
> >
> > I created a dedicated server with the default installation of PyCSW
> 1.10.0,
> > and I'm trying to get this server collect all the metadata records of 8
> > servers GeoNode.
> >
> > My intention is to have a PyCSW to gather all the metadata of my
> > organization. I'm avoiding using "federatedcatalogues" because I do not
> want
> > the user who will rep csw need to specify additional parameters.
> >
> > In a first test, I used the command:
> >
> > pycsw-admin.py -c -u post_xml http: -x //localhost/pycsw/csw.py
> > /var/www/html/pycsw/bin/request.xml
> >
> > where request.xml =
> >
> > <? Xml version = "1.0" encoding = "UTF-8"?>
> > <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
> > "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
> 2005
> > / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
> > ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
> > http://purl.org/dc/elements/1.1/ "xmlns: dct ="
> http://purl.org/dc/terms/
> > "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
> > "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
> > "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
> csw /
> > 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
> >   <Source>http://demo.geonode.org//catalogue/csw </ Source>
> >   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
> >   <ResourceFormat> application / xml </ ResourceFormat>
> > </ Harvest>
> >
> > pointing to http://demo.geonode.org//catalogue/csw ..
> >
> > I get the following error:
> >
> > Initializing static context
> > Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
> > http: //localhost/pycsw/csw.py
> > Traceback (most recent call last):
> >    File "/usr/bin/pycsw-admin.py", line 246, in <module>
> >      print admin.post_xml (CSW_URL, XML, TIMEOUT)
> >    File "/usr/lib/python2.7/dist-packages/pycsw/admin.py", line 495, in
> > post_xml
> >      raise RuntimeError (err)
> > RuntimeError: timed out
>
> I'm not getting any timeout when running here (using 1.10.1 -- any
> chance you can upgrade your single non-GeoNode pycsw instance to
> 1.10.1?) using the Harvest XML request like
>
> https://github.com/geopython/pycsw/blob/master/tests/suites/harvesting/post/Harvest-csw-run1.xml
> ,
> but substituting the Source with http://demo.geonode.org/catalogue/csw
>
> pycsw harvests 36 records (1 for the service and 35 metadata records).
> Are you able to see http://demo.geonode.org/catalogue/csw from the box
> on which you are running your harvests from?
>
> > when I point to one of my GeoNode-PyCSW servers using request.xml as:
> >
> > <? Xml version = "1.0" encoding = "UTF-8"?>
> > <Harvest xmlns = "http://www.opengis.net/cat/csw/2.0.2" xmlns: ogc =
> > "http://www.opengis.net/ogc" xmlns: gmd = "http: // www .isotc211.org /
> 2005
> > / gmd "xmlns: ows =" http://www.opengis.net/ows "xmlns: xsd
> > ="http://www.w3.org/2001/XMLSchema "xmlns: dc ="
> > http://purl.org/dc/elements/1.1/ "xmlns: dct ="
> http://purl.org/dc/terms/
> > "xmlns: gml =" http://www.opengis.net/gml " xmlns: xsi =
> > "http://www.w3.org/2001/XMLSchema-instance" xsi: schemaLocation =
> > "http://www.opengis.net/cat/csw/2.0.2 http: //schemas.opengis .net /
> csw /
> > 2.0.2 / CSW-publication.xsd "service =" CSW "version =" 2.0.2 ">
> >   <Source> http://aguai.cnpm.embrapa.br/catalogue/csw </ Source>
> >   <ResourceType> http://www.opengis.net/cat/csw/2.0.2 </ ResourceType>
> >   <ResourceFormat> application / xml </ ResourceFormat>
> > </ Harvest>
> >
> > pointing to my server: http://aguai.cnpm.embrapa.br/catalogue/csw
> >
> > I get the error:
> >
> > Initializing static context
> > Executing HTTP POST request /var/www/html/pycsw/bin/request.xml on server
> > http: //localhost/pycsw/csw.py
> > <? Xml version = "1.0" encoding = "UTF-8" standalone = "no"?>
> > <! - Pycsw 1.10.0 ->
> > <Ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/"
> xmlns:
> > inspire_common = "http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:
> > atom = "http://www.w3.org/2005/Atom" xmlns: xs =
> > "http://www.w3.org/2001/XMLSchema" xmlns: dct = "http://purl.org/dc/
> terms /
> > "xmlns: ows =" http://www.opengis.net/ows "xmlns: apiso ="
> > http://www.opengis.net/cat/csw/apiso/1.0 "xmlns: gml =" http:
> > //www.opengis.net/gml "xmlns: diff
> > ="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ "xmlns: xlink ="
> > http://www.w3.org/1999 / xlink "xmlns: gco ="
> > http://www.isotc211.org/2005/gco "xmlns: gmd ="
> > http://www.isotc211.org/2005/gmd "xmlns: rdf =" http:
> > //www.w3.org/1999/02/22-rdf-syntax-ns# "xmlns: srv ="
> > http://www.isotc211.org/2005/srv "xmlns: ogc =" http: //www.opengis
> .net /
> > ogc "xmlns: FGDC =" http://www.opengis.net/cat/csw/csdgm "xmlns:
> inspire_ds
> > ="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0 "xmlns : csw =
> > "http://www.opengis.net/cat/csw/2.0.2" xmlns: xsi =
> > "http://www.w3.org/2001/XMLSchema-instance" xmlns: the = "http:
> > //a9.com/-/spec/opensearch/1.1/ "xmlns: soapenv ="
> > http://www.w3.org/2003/05/soap-envelope "xmlns: sitemap =" http:
> > //www.sitemaps .org / schemas / sitemap / 0.9 "language =" en-US
> "version ="
> > 1.2.0 "xsi: schemaLocation ="
> > http://www.opengis.net/owshttp://schemas.opengis.net/ows
> > /1.0.0/owsExceptionReport.xsd"><ows:Exception ExceptionCode =
> > "NoApplicableCode" locator = "source"> <ows: ExceptionText> Harvest
> (insert)
> > failed: ERROR: null value in column "identifier" violates not-null
> > constraint
> > DETAIL: Failing row contains (null, csw: Record,
> > http://www.opengis.net/cat/csw/2.0.2, local, 2015-06-19T15: 19: 28Z &
> lt;
> > ows: ExceptionReport xmlns: dc = "http://purl.org/dc/elements/1.1/" ...
> > Missing keyword: service, null, null, null, null, null, null, null, null,
> > null, null, null, null, null, null, null, null, null, null, null, null,
> > null, null, null, null, null, null,
> > http://aguai.cnpm.embrapa.br/catalogue/csw, null, null, null, null,
> null,
> > null, null, null, null, null, null, null, null, null, null, null, null,
> > null, null, null, null, null, 'keyword': 2 'miss': 1' servic '3, null).
> > . </ Ows: ExceptionText> </ ows: Exception> </ ows: ExceptionReport>
> > Done
> >
> >
> > Can anyone help me explaining how to best implement it? And because of
> > errors?
> >
>
> It looks like one of the layers in your pycsw serving GeoNode CSW does
> not have an identifier?  Can you turn on pycsw logging from your
> single pycsw to DEBUG and run the harvest again?  There might be
> valuable information in the log that can help.  Feel free to send me
> the log output offline.
>
> As well, wow many records does
> http://aguai.cnpm.embrapa.br/catalogue/csw have?  Might be valuable to
> do a full GetRecords request, i.e.:
>
>
> http://aguai.cnpm.embrapa.br/catalogue/csw?request=GetRecords&service=CSW&version=2.0.2&resultType=results&outputSchema=http://www.isotc211.org/2005/gmd&typeNames=csw:Record&elementSetName=summary&maxrecords=FOO
>
> to inspect the actual CSW output that pycsw is trying to harvest.
> Feel free to send me the result offline.
>
> > --
> > Davi de O. Custódio
> >
>



-- 
Davi de O. Custódio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20150622/24054095/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pycsw_getrecords_result.xml
Type: text/xml
Size: 623972 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20150622/24054095/attachment-0001.xml>


More information about the pycsw-devel mailing list