[pycsw-devel] PyCSW Harvesting
Tom Kralidis
tomkralidis at hotmail.com
Mon Dec 17 19:32:40 PST 2012
> From: rhodges at ecotrust.org
> To: pycsw-devel at lists.osgeo.org
> Date: Sun, 16 Dec 2012 12:03:56 -0800
> Subject: [pycsw-devel] PyCSW Harvesting
>
> I’ve been testing out harvesting with PyCSW and I’m pretty impressed, right off the bat harvesting from CSW sources (including GeoNetwork and GeoPortal, ISO19139 and FGDC) was fairly smooth. However, I didn’t have any success harvesting a CKAN 1.8 instance or between PyCSW instances, and I also failed to harvest from PyCSW using CKAN 1.8, GeoPortal, or GeoNetwork (all of the failed tests were with ISO 19139 documents).
>
> Does this sound about right?
>
> I can understand failing on harvesting CKAN 1.8, as it was (as I understand) only set up to serve CSW to be >harvested from by GeoNetwork instances, though CKAN 2.0 should be better thanks to PyCSW.
Yes, for the record, some of the CKAN issues were discussed on the ckan-dev mailing list last month:
http://lists.okfn.org/pipermail/ckan-dev/2012-November/003406.html
Having
said this, CKAN is moving ahead with pycsw integration (Adrià: any
update?), so I imagine some of these issues may go away in terms of CSW
interoperability.
>However I'm not sure if the problems I'm having harvesting between PyCSW instances is due to me improperly >configuring one/both of my PyCSW instances, or if it's indicative of something bigger. Has anyone successfully >performed CSW harvesting between to PyCSW instances, and if so, do you have a server somewhere I could >attempt to harvest from (preferably with only a few records in it)?
>
You can always test out the endpoints at http://demo.pycsw.org, which serve Dublin Core, FGDC, and ISO documents as examples.
> If there is something specific that you'd like to see in my logs from these tests, let me know. I'll be better able to get those on Monday.
>
Ryan and I discussed these issues on IRC today, and found the following issues:
- sbin/pycsw-admin.py's post_xml operation has a default HTTP timeout of 10 seconds, which was giving timeout errors. Since fixed (see https://github.com/geopython/pycsw/issues/96)
- pycsw does CSW harvesting with the following logic:
- client sends CSW Harvest request asking server A to harvest server B
- server A sends 1..n GetRecords server B asking for all typenames supported by server A
- if server B does not support all server A's typenames, an exception is thrown by server B, and subsequently server A back to the client. Initial tests of a fix indeed fixed the issue, but this needs more thought before a fix is committed. I have since filed an issue at https://github.com/geopython/pycsw/issues/99
Thanks for the testing and reporting.
..Tom
> Thanks,
> Ryan Hodges
> Applications Developer at Ecotrust
> rhodges at ecotrust.org | +1-503-467-0800 | www.ecotrust.org
> _______________________________________________
> pycsw-devel mailing list
> pycsw-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pycsw-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20121217/0f4cb58d/attachment.html>
More information about the pycsw-devel
mailing list