[pycsw-devel] configuration design options

Tom Kralidis tomkralidis at hotmail.com
Wed Apr 27 15:32:33 EDT 2011


Hi: I'd like to get some input and thoughts w.r.t. our current configuration mechanism.

Currently, we use the ConfigParser approach to set runtime options.  This has proven to be a simple and lightweight approach in the spirit of pycsw.

Reasoning: I would like to move APISO as core (not option) functionality by 
default.  The APISO code can stay where it is, however I think most uses will 
call for APISO implementation.  Having said this, we load APISO 
(repository, core queryables, etc.) in a separate space (initially) to 
core.  So what happens if a user wants to load APISO and does not have any DC records to load?  The code currently _always_ loads csw:Record.  I'd like repositories to be set/accessed as a list (self.repositories).  As well, I think harmonizing to one main configuration (instead of core config, then APISO config) will be cleaner in the long run.

With the above in mind, there are issues with the ConfigParser approach in the code:

- numerous checks against missing/not set options
- ConfigParser does not allow repeatable section names or options.  How can we set multiple [repository] objects?

Options:
1)  still use ConfigParser, and namespace [repository] objects like [repository-csw:Record], [repository-gmd:MD_Metadata] to make them unique.  This would take some tweaking in the config parsing, but certainly doable.  Not sure how user-friendly or error prone this would be
2) use JSON as a config format.  This can be integrated in the code easy enough, but I think this is prone to error given the complexity of the format
3) XML.  Using XML (with XML Schema) allows us to:
 - perform validation (offline with sbin/validate_xml.py or at runtime) of the configuration to ensure validity
 - have repeatable objects (like repository) and properties
 - gives us the option to parse the XML and convert to a Python dict (which is what we do with ConfigParser), or work directly with the etree object(s) in the code (and bypass extra parsing).  This would gives us some performance gains (although parsing an XML file is more overhead than ConfigParser).  I've attached sample XML and XSD files as example.

In the end, the goal of one main configuration and repeatable repository objects would benefit pycsw, and even opens up options to enable OGC:WFS support (not that this is in scope for pycsw, but perhaps a Python OGC Web Services framework which can implement n service types).  Just a thought.

I'm mildly leaning towards option 3 (would take some major code rework, which I am willing to implement this), but would like to see what others think in terms of functionality and user-friendliness.  OGC CITE support wouldn't be affected, but this would be a change to current configuration design; we are still in early phases, so better now than later :)

Thoughts?  I hope this explanation is clear enough.  Are there other options we can/should consider?

..Tom

 		 	   		  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.xml
Type: text/xml
Size: 2041 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20110427/f48cf7cb/config.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.xsd
Type: application/octet-stream
Size: 5019 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20110427/f48cf7cb/config.obj


More information about the Pycsw-devel mailing list