[pycsw-devel] pycsw does not store all the harvested metadata into the database

Tomas Kliment tomas.kliment at gmail.com
Sat Sep 3 10:59:44 PDT 2016


Hi Tom,

Thanks for quick answer.

Attached is the export from my database for the services in concern - WMS
version 1.1.1 harvested by pycsw (980 records)

As I mentioned I collected and stored all the info received from CSW
Harvest response issued from pycsw.

MetadataInserted - column L, metadataUpdate and metadataDeleted - those are
0 since it was a first harvest.

The column K provides a sum, which is (metadataInserted + metadataUpdated)
- metadataDeleted.

If you sum column K you get *16525*.

If you send GetRecords for all to sane endpoint
<https://bolegweb.geof.unizg.hr/pycsw_wms>which was used to harvest those
980 WMS services you get *231*

What do you think?

Would like to resolve this since I wish to use your implementation to
create metadata for thousands other OGC services I collect.

Thank you!
Tomas


On 3 September 2016 at 19:42, Tom Kralidis <tomkralidis at gmail.com> wrote:

> Hi Tomas:
>
> On Fri, Sep 2, 2016 at 4:26 PM, Tomas Kliment <tomas.kliment at gmail.com>
> wrote:
>
>> Hi Tom,
>>
>> First of all let me thank you for a great and light CSW implementation, I
>> recently switched from Geonetwork.
>>
>>
> Thank you for the kind words!
>
>
>> However, in my project Bolegweb <https://bolegweb.geof.unizg.hr> I have
>> planned top use pycsw as a catalogue of metadata harvested from the OGC
>> Services I collected from Google SE using scrapper and crawler.
>>
>> Today, I finished and run one PHP script (attached) which sends CSW
>> harvesting request to my pycsw <https://bolegweb.geof.unizg.hr/pycsw_wms>
>> instance for all online WMS 1.1.1 services available in my database. I
>> collect info about metadata inserted, updated and deleted from each service
>> response and collected more than 16 thousands metadata based on sum of CSW
>> responses XML outputs.
>>
>> However, when I look at the underneath postgres database, as well as when
>> I try to call GetRecords for all the records I receive only 231 records.
>>
>> Could you please advice me what I do wrong, when the pycsw CSW Harvesting
>> responses report more than 16k records, and actually the database contains
>> only 231?
>>
>>
> With the exception of WPS, all other OWS harvesting should store all of a
> service's resources as well as a record for the service itself (e.g. for a
> WMS with 16 layers, 16+1 records).  Can you send a sample list of services
> so I can test locally and try to reproduce?
>
> Thanks
>
> ..Tom
>
>
>
>> Thank you in advance for any hints,
>> Tomas
>>
>> --
>> ------------------------------------------------------------------
>>
>>
>> [image: Tomas Kliment on about.me]
>>
>> Ing. Tomas Kliment, PhD.
>> about.me/klimeto
>>   <http://about.me/klimeto>
>>
>>
>>
>


-- 
------------------------------------------------------------------


[image: Tomas Kliment on about.me]

Ing. Tomas Kliment, PhD.
about.me/klimeto
  <http://about.me/klimeto>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20160903/e6ac84cf/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wms_services_harvested_pycsw.csv
Type: text/csv
Size: 277980 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20160903/e6ac84cf/attachment-0001.csv>


More information about the pycsw-devel mailing list