[pycsw-devel] SOS Harvesting Error
Tom Kralidis
tomkralidis at gmail.com
Wed Oct 29 12:05:39 PDT 2014
Dan: thanks for the report and issuing a ticket on GitHub. This is
tough to deal with, given that it's a very specific case for failure
(PostgreSQL backend, CPU/VM configuration, big SOS), in terms of
stuffing such a big Capabilities response into a CSW backend Perhaps
we can lessen what is actually harvested (let's continue in the
ticket).
https://github.com/geopython/pycsw/issues/279
Thanks
..Tom
On Tue, Oct 28, 2014 at 3:36 PM, <dan at inlet.geol.sc.edu> wrote:
> The virtual server I was running only had 1Gb of memory and it was running
> out. I bumped it up to 4Gb and the processing is now working much better.
>
> Since the sos parsing is grabbing all the records, this could continue to
> be an issue. I don't know the entire data flow, but I was thinking a less
> memory intensive processing would be to run through the offerings wholly
> processing one station, then the next so the memory footprint would not
> continue to grow depending on the station count.
>
>
> Dan
>> SOme additional logging on line 1792 of server.py
>> turned up a traceback of:
>> Traceback (most recent call last):
>> File "/home/madrona/src/pycsw/pycsw/server.py", line 1792, in harvest
>> pagesize=self.csw_harvest_pagesize)
>> File "/home/madrona/src/pycsw/pycsw/metadata.py", line 91, in
>> parse_record
>> return _parse_sos(context, repos, record, identifier, '1.0.0')
>> File "/home/madrona/src/pycsw/pycsw/metadata.py", line 700, in
>> _parse_sos
>> _set(context, recobj, 'pycsw:XML', etree.tostring(md._capabilities))
>> File "lxml.etree.pyx", line 3157, in lxml.etree.tostring
>> (src/lxml/lxml.etree.c:69517)
>> File "serializer.pxi", line 143, in lxml.etree._tostring
>> (src/lxml/lxml.etree.c:114600)
>> MemoryError
>>
>> Doing a down and dirty "top" I could see that the server was most likely
>> running out of memory. The NDBC station where it finally died was
>> station-42915, I am harvesting against the NDBC SOS still.
>>
>>
>> Dan
>>
>>> I've apparently taken a step further back, I can't make the parsing
>>> happen
>>> at all now.
>>> On the "client end" when I run the command python bin/pycsw-admin.py -c
>>> post_xml -u http://129.252.139.196:8080 -x Harvest-sos100.xml
>>>
>>> I get the error:
>>> Executing HTTP POST request Harvest-sos100.xml on server
>>> http://129.252.139.196:8080
>>> Traceback (most recent call last):
>>> File "bin/pycsw-admin.py", line 246, in <module>
>>> print admin.post_xml(CSW_URL, XML, TIMEOUT)
>>> File
>>> "/usr/local/virtualenv/venv-2.7.8/lib/python2.7/site-packages/pycsw/admin.py",
>>> line 495, in post_xml
>>> raise RuntimeError(err)
>>> RuntimeError: timed out
>>>
>>> On the local server I see:
>>> Traceback (most recent call last):
>>> File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 86,
>>> in run
>>> self.finish_response()
>>> File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 128, in finish_response
>>> self.write(data)
>>> File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 212, in write
>>> self.send_headers()
>>> File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 270, in send_headers
>>> self.send_preamble()
>>> File "/usr/local/src/python/lib/python2.7/wsgiref/handlers.py", line
>>> 194, in send_preamble
>>> 'Date: %s\r\n' % format_date_time(time.time())
>>> File "/usr/local/src/python/lib/python2.7/socket.py", line 324, in
>>> write
>>> self.flush()
>>> File "/usr/local/src/python/lib/python2.7/socket.py", line 303, in
>>> flush
>>> self._sock.sendall(view[write_offset:write_offset+buffer_size])
>>> error: [Errno 32] Broken pipe
>>> 129.252.139.68 - - [28/Oct/2014 08:38:15] "POST / HTTP/1.1" 500 59
>>> ----------------------------------------
>>> Exception happened during processing of request from ('129.252.139.68',
>>> 51289)
>>> Traceback (most recent call last):
>>> File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 295,
>>> in
>>> _handle_request_noblock
>>> self.process_request(request, client_address)
>>> File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 321,
>>> in
>>> process_request
>>> self.finish_request(request, client_address)
>>> File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 334,
>>> in
>>> finish_request
>>> self.RequestHandlerClass(request, client_address, self)
>>> File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 653,
>>> in
>>> __init__
>>> self.finish()
>>> File "/usr/local/src/python/lib/python2.7/SocketServer.py", line 712,
>>> in
>>> finish
>>> self.wfile.close()
>>> File "/usr/local/src/python/lib/python2.7/socket.py", line 279, in
>>> close
>>> self.flush()
>>> File "/usr/local/src/python/lib/python2.7/socket.py", line 303, in
>>> flush
>>> self._sock.sendall(view[write_offset:write_offset+buffer_size])
>>> error: [Errno 32] Broken pipe
>>> ----------------------------------------
>>>
>>> and finally in the log:
>>> file=/home/madrona/src/pycsw/pycsw/server.py line=2331 module=server
>>> function=_write_response Response:
>>> <ows:ExceptionReport xmlns:dc="http://purl.org/dc/elements/1.1/"
>>> xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0"
>>> xmlns:atom="http://www.w3.org/2005/Atom"
>>> xmlns:xs="http://www.w3.org/2001/XMLSchema"
>>> xmlns:dct="http://purl.org/dc/terms/"
>>> xmlns:ows="http://www.opengis.net/ows"
>>> xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0"
>>> xmlns:gml="http://www.opengis.net/gml"
>>> xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/"
>>> xmlns:xlink="http://www.w3.org/1999/xlink"
>>> xmlns:gco="http://www.isotc211.org/2005/gco"
>>> xmlns:gmd="http://www.isotc211.org/2005/gmd"
>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>> xmlns:srv="http://www.isotc211.org/2005/srv"
>>> xmlns:ogc="http://www.opengis.net/ogc"
>>> xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm"
>>> xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0"
>>> xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> xmlns:os="http://a9.com/-/spec/opensearch/1.1/"
>>> xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"
>>> xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
>>> language="en-US" version="1.2.0"
>>> xsi:schemaLocation="http://www.opengis.net/ows
>>> http://schemas.opengis.net/ows/1.0.0/owsExceptionReport.xsd">
>>> <ows:Exception exceptionCode="NoApplicableCode" locator="source">
>>> <ows:ExceptionText>Harvest failed: record parsing failed:
>>> </ows:ExceptionText>
>>> </ows:Exception>
>>> </ows:ExceptionReport>
>>>
>>>
>>> _______________________________________________
>>> pycsw-devel mailing list
>>> pycsw-devel at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/pycsw-devel
>>>
>>
>>
>>
>
>
More information about the pycsw-devel
mailing list