[pycsw-devel] SOS Harvesting Error

dan at inlet.geol.sc.edu dan at inlet.geol.sc.edu
Wed Oct 29 13:19:35 PDT 2014


Tom,

Again, I have not followed the flow all the way through, but instead of
building all the records at one time in _parse_sos, the problem could be
alleviated greatly by batching them, or doing one at a time.
I suspect this is a non starter since the other parse_xxxx methods are
built around running an array of records.

Dan

> Dan: thanks for the report and issuing a ticket on GitHub.  This is
> tough to deal with, given that it's a very specific case for failure
> (PostgreSQL backend, CPU/VM configuration, big SOS), in terms of
> stuffing such a big Capabilities response into a CSW backend  Perhaps
> we can lessen what is actually harvested (let's continue in the
> ticket).
>
> https://github.com/geopython/pycsw/issues/279
>
> Thanks
>
> ..Tom
>
>
>
>
> On Tue, Oct 28, 2014 at 3:36 PM,  <dan at inlet.geol.sc.edu> wrote:
>> The virtual server I was running only had 1Gb of memory and it was
>> running
>> out. I bumped it up to 4Gb and the processing is now working much
>> better.
>>
>> Since the sos parsing is grabbing all the records, this could continue
>> to
>> be an issue. I don't know the entire data flow, but I was thinking a
>> less
>> memory intensive processing would be to run through the offerings wholly
>> processing one station, then the next so the memory footprint would not
>> continue to grow depending on the station count.
>>
>>
>> Dan
>>> SOme additional logging on line 1792 of server.py
>>> turned up a traceback of:
>>> Traceback (most recent call last):
>>>   File "/home/madrona/src/pycsw/pycsw/server.py", line 1792, in harvest
>>>     pagesize=self.csw_harvest_pagesize)
>>>   File "/home/madrona/src/pycsw/pycsw/metadata.py", line 91, in
>>> parse_record
>>>     return _parse_sos(context, repos, record, identifier, '1.0.0')
>>>   File "/home/madrona/src/pycsw/pycsw/metadata.py", line 700, in
>>> _parse_sos
>>>     _set(context, recobj, 'pycsw:XML',
>>> etree.tostring(md._capabilities))
>>>   File "lxml.etree.pyx", line 3157, in lxml.etree.tostring
>>> (src/lxml/lxml.etree.c:69517)
>>>   File "serializer.pxi", line 143, in lxml.etree._tostring
>>> (src/lxml/lxml.etree.c:114600)
>>> MemoryError
>>>
>>> Doing a down and dirty "top" I could see that the server was most
>>> likely
>>> running out of memory. The NDBC station where it finally died was
>>> station-42915, I am harvesting against the NDBC SOS still.
>>>
>>>
>>> Dan
>>>
>>>> I've apparently taken a step further back, I can't make the parsing
>>>> happen
>>>> at all now.
>>>> On the "client end" when I run the command python bin/pycsw-admin.py
>>>> -c
>>>> post_xml -u http://129.252.139.196:8080 -x Harvest-sos100.xml
>>>>
>>>> I get the error:
>>>> Executing HTTP POST request Harvest-sos100.xml on server
>>>> http://129.252.139.196:8080
>>>> Traceback (most recent call last):
>>>>   File "bin/pycsw-admin.py", line 246, in <module>
>>>>     print admin.post_xml(CSW_URL, XML, TIMEOUT)
>>>>   File
>>>> "/usr/local/virtualenv/venv-2.7.8/lib/python2.7/site-packages/pycsw/admin.py",
>>>> line 495, in post_xml
>>>>     raise RuntimeError(err)
>>>> RuntimeError: timed out
>>>>
>>>> On the local server I see:
>>>> Traceback (most recent call last):
>>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>>> 86,
>>>> in run
>>>>     self.finish_response()
>>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>>> 128, in finish_response
>>>>     self.write(data)
>>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>>> 212, in write
>>>>     self.send_headers()
>>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>>> 270, in send_headers
>>>>     self.send_preamble()
>>>>   File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>>> 194, in send_preamble
>>>>     'Date: %s\r\n' % format_date_time(time.time())
>>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 324, in
>>>> write
>>>>     self.flush()
>>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 303, in
>>>> flush
>>>>     self._sock.sendall(view[write_offset:write_offset+buffer_size])
>>>> error: [Errno 32] Broken pipe
>>>> 129.252.139.68 - - [28/Oct/2014 08:38:15] "POST / HTTP/1.1" 500 59
>>>> ----------------------------------------
>>>> Exception happened during processing of request from
>>>> ('129.252.139.68',
>>>> 51289)
>>>> Traceback (most recent call last):
>>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line
>>>> 295,
>>>> in
>>>> _handle_request_noblock
>>>>     self.process_request(request, client_address)
>>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line
>>>> 321,
>>>> in
>>>> process_request
>>>>     self.finish_request(request, client_address)
>>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line
>>>> 334,
>>>> in
>>>> finish_request
>>>>     self.RequestHandlerClass(request, client_address, self)
>>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line
>>>> 653,
>>>> in
>>>> __init__
>>>>     self.finish()
>>>>   File "/usr/local/src/python/lib/python2.7/SocketServer.py", line
>>>> 712,
>>>> in
>>>> finish
>>>>     self.wfile.close()
>>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 279, in
>>>> close
>>>>     self.flush()
>>>>   File "/usr/local/src/python/lib/python2.7/socket.py", line 303, in
>>>> flush
>>>>     self._sock.sendall(view[write_offset:write_offset+buffer_size])
>>>> error: [Errno 32] Broken pipe
>>>> ----------------------------------------
>>>>
>>>> and finally in the log:
>>>> file=/home/madrona/src/pycsw/pycsw/server.py line=2331 module=server
>>>> function=_write_response Response:
>>>> <ows:ExceptionReport xmlns:dc="http://purl.org/dc/elements/1.1/"
>>>> xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0"
>>>> xmlns:atom="http://www.w3.org/2005/Atom"
>>>> xmlns:xs="http://www.w3.org/2001/XMLSchema"
>>>> xmlns:dct="http://purl.org/dc/terms/"
>>>> xmlns:ows="http://www.opengis.net/ows"
>>>> xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0"
>>>> xmlns:gml="http://www.opengis.net/gml"
>>>> xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/"
>>>> xmlns:xlink="http://www.w3.org/1999/xlink"
>>>> xmlns:gco="http://www.isotc211.org/2005/gco"
>>>> xmlns:gmd="http://www.isotc211.org/2005/gmd"
>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>> xmlns:srv="http://www.isotc211.org/2005/srv"
>>>> xmlns:ogc="http://www.opengis.net/ogc"
>>>> xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm"
>>>> xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0"
>>>> xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xmlns:os="http://a9.com/-/spec/opensearch/1.1/"
>>>> xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"
>>>> xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
>>>> language="en-US" version="1.2.0"
>>>> xsi:schemaLocation="http://www.opengis.net/ows
>>>> http://schemas.opengis.net/ows/1.0.0/owsExceptionReport.xsd">
>>>>   <ows:Exception exceptionCode="NoApplicableCode" locator="source">
>>>>     <ows:ExceptionText>Harvest failed: record parsing failed:
>>>> </ows:ExceptionText>
>>>>   </ows:Exception>
>>>> </ows:ExceptionReport>
>>>>
>>>>
>>>> _______________________________________________
>>>> pycsw-devel mailing list
>>>> pycsw-devel at lists.osgeo.org
>>>> http://lists.osgeo.org/mailman/listinfo/pycsw-devel
>>>>
>>>
>>>
>>>
>>
>>
>




More information about the pycsw-devel mailing list