MapFile Parsing into DOM, MapFile Schema,
and MapFile multilingual support
jdoyon at NRCAN.GC.CA
Thu Oct 4 17:45:20 EDT 2007
Good to see many of you at FOSS4G!
As discussed with Steve & Hobu and others, I wanted to put in writing my thoughts on this concept I have of parsing MapFiles into a DOM, making it look like XML without it actually BEING XML.
- I was writing, just for fun, a MapFile editor (In Python/wxPython). I ended up settling on the idea that simply treating the MapFile as a structured document (XML-style) was the best and simplest way for me to go. The mapfile being an already fairly structured document, this shouldn't be very hard. And it wasn't.
- Based on Python's ElementTree functionality (a minimalist, python specific XML API, see here http://effbot.org/zone/element-index.htm and here http://docs.python.org/lib/module-xml.etree.ElementTree.html)
What I did:
- Implemented my own parser for ElementTree. The MapFile is parsed into a DOM like structure, and Xpath like expressions can be used to get to nodes, edit their text, etc ...
- Problem: It's new-line sensitive, which it shouldn't be. The benefit is that it is not syntax aware at all, it doesn't care what text content of the map file is, just uses some basic rules around what's expected on a line. (keyword, or keyword + 1-> values).
The larger, more abstract idea:
- There have been many debates around XML mapfiles. Implementing something like this for Python, and possibly other languages, could achieve the best of both worlds: Simple, fast, human-friendly MapFiles, that are loadable with XML tools! Most decent XML libraries should allow you to re-implement the parser somewhat, so I have to hope this could be done for Java, Perl, C#, etc ... My personal interest is only focused on Python and ElementTree and maybe lxml. Python is good for prototyping anyways :)
Load/Save Mapfile to/from DOM or DOM like structures. Some examples of how this might be useful:
- Batch processing of mapfiles (find and replace, for example). I've used my implementation to migrate ~500 mapfiles from MapServer 3.6.6 to 4.8 for example. A real life saver.
- Loading into XML tools/sdks of your liking, to do all those XML things you might like (Editing, validation, transformation, etc ...)
- MapFile editors: GUIs and Web systems for editing of structured documents are nice. wxPython's Tree implementation for example, can be tied to a Document tree. Similar concepts can apply on the web (Think MapStorer)
Has this question ever been addressed? One short term interest for us in this is devising a better way to handle multiple languages. Unless there's interest in putting this in the mapserver core, my best idea so far is to use the above mentionned approach to have a master map file, and 2 language specific (en/fr) separate files, and a way to merge and generate 1 mapfile per language.
The initial challenge:
The parser shouldn't be line aware. In that case, it means it has to be keyword/token aware, which means that information has to come from somewhere. Right now it's in the lexer definition it looks like. Maybe this could be used by other parser generators, or something like that? TBD ...
Also, defining the exact schema. Once a MapFile has been DOM-ified, should COLOR have a string "255 255 255"? Or elements for "Red", "Green" and "Blue" with separate values. How would this be expressed? An XSD (I think that's what I'd done in my MapFile editor, I'll have to dig it up)? Not at all (Dumb keyword -> value setup)?
Oh and right, the first thing I'd asked Steve: replace "END" with something matching the opening tag such as "MAPEND" or "CLASSEND" or maybe using Apache conf style "<Map></Map>" (though these would be tokens NOT XML, and the rest of the map file would retain its current syntax).
Could a language neutral way to define lexical and grammar rules be put in place, that both the C core, and other languages could use?
If I don't work on this before then, I just might get a student in January to work on precisely this!
Thoughts, comments, constructive criticism?
Data Dissemination Division | Division de la diffusion des données
Data Management and Dissemination Branch | Direction de la gestion et de la diffusion des données
Earth Sciences Sector | Secteur des sciences de la Terre
Natural Resources Canada | Ressources naturelles Canada
Ottawa, Canada K1A 0E9
jdoyon at nrcan-rncan.gc.ca
Telephone | Téléphone 613-992-4902
Facsimile | Télécopieur 613-947-2410
Teletypewriter | Téléimprimeur 613-996-4397
Government of Canada | Gouvernement du Canada
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mapserver-dev