[pycsw-devel] PyCSW Harvesting

Tom Kralidis tomkralidis at hotmail.com
Mon Dec 17 19:32:40 PST 2012






> From: rhodges at ecotrust.org
> To: pycsw-devel at lists.osgeo.org
> Date: Sun, 16 Dec 2012 12:03:56 -0800
> Subject: [pycsw-devel] PyCSW Harvesting
> 
> I’ve been testing out harvesting with PyCSW and I’m pretty impressed, right off the bat harvesting from CSW sources (including GeoNetwork and GeoPortal, ISO19139 and FGDC) was fairly smooth. However, I didn’t have any success harvesting a CKAN 1.8 instance or between PyCSW instances, and I also failed to harvest from PyCSW using CKAN 1.8, GeoPortal, or GeoNetwork (all of the failed tests were with ISO 19139 documents).
> 
> Does this sound about right?
> 
> I can understand failing on harvesting CKAN 1.8, as it was (as I understand) only set up to serve CSW to be >harvested from by GeoNetwork instances, though CKAN 2.0 should be better thanks to PyCSW. 

Yes, for the record, some of the CKAN issues were discussed on the ckan-dev mailing list last month:

http://lists.okfn.org/pipermail/ckan-dev/2012-November/003406.html

Having
 said this, CKAN is moving ahead with pycsw integration (Adrià: any 
update?), so I imagine some of these issues may go away in terms of CSW 
interoperability.

>However I'm not sure if the problems I'm having harvesting between PyCSW instances is due to me improperly >configuring one/both of my PyCSW instances, or if it's indicative of something bigger. Has anyone successfully >performed CSW harvesting between to PyCSW instances, and if so, do you have a server somewhere I could >attempt to harvest from (preferably with only a few records in it)?
> 

You can always test out the endpoints at http://demo.pycsw.org, which serve Dublin Core, FGDC, and ISO documents as examples. 


> If there is something specific that you'd like to see in my logs from these tests, let me know. I'll be better able to get those on Monday.
> 

Ryan and I discussed these issues on IRC today, and found the following issues:

- sbin/pycsw-admin.py's post_xml operation has a default HTTP timeout of 10 seconds, which was giving timeout errors.  Since fixed (see https://github.com/geopython/pycsw/issues/96)

- pycsw does CSW harvesting with the following logic:
 - client sends CSW Harvest request asking server A to harvest server B
 - server A sends 1..n GetRecords server B asking for all typenames supported by server A
 - if server B does not support all server A's typenames, an exception is thrown by server B, and subsequently server A back to the client.  Initial tests of a fix indeed fixed the issue, but this needs more thought before a fix is committed.  I have since filed an issue at https://github.com/geopython/pycsw/issues/99

Thanks for the testing and reporting.

..Tom

> Thanks,
> Ryan Hodges
> Applications Developer at Ecotrust 
> rhodges at ecotrust.org | +1-503-467-0800 | www.ecotrust.org
> _______________________________________________
> pycsw-devel mailing list
> pycsw-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pycsw-devel

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20121217/0f4cb58d/attachment.html>


More information about the pycsw-devel mailing list