[pycsw-devel] SOS Harvesting Error

Tom Kralidis tomkralidis at gmail.com
Wed Oct 29 12:05:39 PDT 2014


Dan: thanks for the report and issuing a ticket on GitHub.  This is
tough to deal with, given that it's a very specific case for failure
(PostgreSQL backend, CPU/VM configuration, big SOS), in terms of
stuffing such a big Capabilities response into a CSW backend  Perhaps
we can lessen what is actually harvested (let's continue in the
ticket).

https://github.com/geopython/pycsw/issues/279

Thanks

..Tom




On Tue, Oct 28, 2014 at 3:36 PM,  <dan at inlet.geol.sc.edu> wrote:
> The virtual server I was running only had 1Gb of memory and it was running
> out. I bumped it up to 4Gb and the processing is now working much better.
>
> Since the sos parsing is grabbing all the records, this could continue to
> be an issue. I don't know the entire data flow, but I was thinking a less
> memory intensive processing would be to run through the offerings wholly
> processing one station, then the next so the memory footprint would not
> continue to grow depending on the station count.
>
>
> Dan
>> SOme additional logging on line 1792 of server.py
>> turned up a traceback of:
>> Traceback (most recent call last):
>>   File "/home/madrona/src/pycsw/pycsw/server.py", line 1792, in harvest
>>     pagesize=self.csw_harvest_pagesize)
>>   File "/home/madrona/src/pycsw/pycsw/metadata.py", line 91, in
>> parse_record
>>     return _parse_sos(context, repos, record, identifier, '1.0.0')
>>   File "/home/madrona/src/pycsw/pycsw/metadata.py", line 700, in
>> _parse_sos
>>     _set(context, recobj, 'pycsw:XML', etree.tostring(md._capabilities))
>>   File "lxml.etree.pyx", line 3157, in lxml.etree.tostring
>> (src/lxml/lxml.etree.c:69517)
>>   File "serializer.pxi", line 143, in lxml.etree._tostring
>> (src/lxml/lxml.etree.c:114600)
>> MemoryError
>>
>> Doing a down and dirty "top" I could see that the server was most likely
>> running out of memory. The NDBC station where it finally died was
>> station-42915, I am harvesting against the NDBC SOS still.
>>
>>
>> Dan
>>
>>> I've apparently taken a step further back, I can't make the parsing
>>> happen
>>> at all now.
>>> On the "client end" when I run the command python bin/pycsw-admin.py -c
>>> post_xml -u http://129.252.139.196:8080 -x Harvest-sos100.xml
>>>
>>> I get the error:
>>> Executing HTTP POST request Harvest-sos100.xml on server
>>> http://129.252.139.196:8080
>>> Traceback (most recent call last):
>>>   File "bin/pycsw-admin.py", line 246, in <module>
>>>     print admin.post_xml(CSW_URL, XML, TIMEOUT)
>>>   File
>>> "/usr/local/virtualenv/venv-2.7.8/lib/python2.7/site-packages/pycsw/admin.py",
>>> line 495, in post_xml
>>>     raise RuntimeError(err)
>>> RuntimeError: timed out
>>>
>>> On the local server I see:
>>> Traceback (most recent call last):
>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 86,
>>> in run
>>>     self.finish_response()
>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 128, in finish_response
>>>     self.write(data)
>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 212, in write
>>>     self.send_headers()
>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 270, in send_headers
>>>     self.send_preamble()
>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 194, in send_preamble
>>>     'Date: %s\r\n' % format_date_time(time.time())
>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 324, in
>>> write
>>>     self.flush()
>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 303, in
>>> flush
>>>     self._sock.sendall(view[write_offset:write_offset+buffer_size])
>>> error: [Errno 32] Broken pipe
>>> 129.252.139.68 - - [28/Oct/2014 08:38:15] "POST / HTTP/1.1" 500 59
>>> ----------------------------------------
>>> Exception happened during processing of request from ('129.252.139.68',
>>> 51289)
>>> Traceback (most recent call last):
>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 295,
>>> in
>>> _handle_request_noblock
>>>     self.process_request(request, client_address)
>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 321,
>>> in
>>> process_request
>>>     self.finish_request(request, client_address)
>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 334,
>>> in
>>> finish_request
>>>     self.RequestHandlerClass(request, client_address, self)
>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 653,
>>> in
>>> __init__
>>>     self.finish()
>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 712,
>>> in
>>> finish
>>>     self.wfile.close()
>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 279, in
>>> close
>>>     self.flush()
>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 303, in
>>> flush
>>>     self._sock.sendall(view[write_offset:write_offset+buffer_size])
>>> error: [Errno 32] Broken pipe
>>> ----------------------------------------
>>>
>>> and finally in the log:
>>> file=/home/madrona/src/pycsw/pycsw/server.py line=2331 module=server
>>> function=_write_response Response:
>>> <ows:ExceptionReport xmlns:dc="http://purl.org/dc/elements/1.1/"
>>> xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0"
>>> xmlns:atom="http://www.w3.org/2005/Atom"
>>> xmlns:xs="http://www.w3.org/2001/XMLSchema"
>>> xmlns:dct="http://purl.org/dc/terms/"
>>> xmlns:ows="http://www.opengis.net/ows"
>>> xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0"
>>> xmlns:gml="http://www.opengis.net/gml"
>>> xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/"
>>> xmlns:xlink="http://www.w3.org/1999/xlink"
>>> xmlns:gco="http://www.isotc211.org/2005/gco"
>>> xmlns:gmd="http://www.isotc211.org/2005/gmd"
>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>> xmlns:srv="http://www.isotc211.org/2005/srv"
>>> xmlns:ogc="http://www.opengis.net/ogc"
>>> xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm"
>>> xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0"
>>> xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> xmlns:os="http://a9.com/-/spec/opensearch/1.1/"
>>> xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"
>>> xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
>>> language="en-US" version="1.2.0"
>>> xsi:schemaLocation="http://www.opengis.net/ows
>>> http://schemas.opengis.net/ows/1.0.0/owsExceptionReport.xsd">
>>>   <ows:Exception exceptionCode="NoApplicableCode" locator="source">
>>>     <ows:ExceptionText>Harvest failed: record parsing failed:
>>> </ows:ExceptionText>
>>>   </ows:Exception>
>>> </ows:ExceptionReport>
>>>
>>>
>>> _______________________________________________
>>> pycsw-devel mailing list
>>> pycsw-devel at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/pycsw-devel
>>>
>>
>>
>>
>
>


More information about the pycsw-devel mailing list