[pycsw-devel] Initial plan to implement ISO and INSPIRE
compatibility for PyCSW
Tom.Kralidis at ec.gc.ca
Fri Mar 4 15:32:39 EST 2011
> -----Original Message-----
> From: Angelos Tzotsos [mailto:gcpp.kalxas at gmail.com]
> Sent: Tuesday, 01 March 2011 18:43
> To: pycsw-devel at lists.sourceforge.net
> Subject: [pycsw-devel] Initial plan to implement ISO and
> INSPIRE compatibility for PyCSW
> Hi all,
> I was reading the code lately and was wondering how should we
> proceed in order to implement ISO and INSPIRE for PyCSW.
> I think the first thing that has to be seen is the database schema:
> - For now, Tom has implemented Dublin Core + CSW 2.0.2 using
> SQLAlchemy, Shapely and SQLite (which can be any db
> actually). The basic db schema for this includes the Dublin
> Core queryable columns in one single table, as well as the
> Geographic Extend in an extra field. The rest of the xml
> metadata file is stored in a separate db column (which can be
> queryable through Xpath).
> Almost the same strategy is implemented in GeoNetwork CSW.
FYI clarification: Although SQLAlchemy saves us from being tied to a
single database, Python's sqlite lib allows for calling Python functions
(via create_function), which the code uses for bbox (passing to Shapely)
at this point. So we'll have to work on moving away from this and find.
Having said this, you are right; the strategy is similar to GN in terms
of storing an entire XML document in a db record. We are additionally
storing the core queryables (Dublin Core) so as to be able to query more
efficiently. When returning GetRecords results, if
elementsetname='full', then we return the entire XML document as stored;
else, we present (either brief or summary) as per
> - Another solution would be to store only the xml files in
> the database and use only xpath queries. I suppose this would
> have an effect in performance for large datasets.
We should test this further, although yes it may be bad for performance.
> - Also, there is the example of MDWeb project, another open
> source Java implementation of CSW that has implemented the
> full ISO 19115 schema within postgres (more than 1300 lines
> of SQL) and stores all the info in this schema. My opinion is
> that this would lead to unmaintainable code or we would lose
> the current advantage to use all RDBMS systems available
> through SQLAlchemy.
Ouch! IMHO that would be prone to error along the metadata lifecycle.
> - Finally, I see another possible solution, a middle path.
> Store the main queryables in db columns (with extra tables
> for "one to many" connections) but without following the ISO
> UMLs. At the same time the xml will be stored in separate
> column in the main table. This will make the xml import and
> export functions more complicated, but will lead to better
> performance. I think this direction would lead to ~15 extra
> tables (eg one for "Keywords", one for "Resource Language",
> one for "Topic Category" etc)
Good idea. For ISO/INSPIRE support, we could start by mapping the ISO
core queryables to the Dublin Core core queryables. Core queryables
over and above would need to be exposed specifically for ISO/INSPIRE
I would think this work would be best be done as a plugin to pycsw.
Having said this, a plugin architecture would be valuable here, so pycsw
can support n profiles over time.
> But what kind of data need to be stored in the db in order to
> have basic compliance with ISO 19115 and INSPIRE for datasets?
> For ISO core needs are: (with * are the core queryables in CSW)
> 1.* Dataset Title (M - Mandatory)
> 2. Dataset reference date (M)
> 3. Dataset responsible party (O - Optional)
> 4.* Geographic location of the dataset (C - Conditional) 5.
> Dataset language (M) 6. Dataset character set (C)
> 7.* Dataset topic category (M) (includes keywords in CSW
> queryable) 8. Spatial resolution of the dataset (O)
> 9.* Abstract describing the dataset (M)
> 10.* Distribution format (O)
> 11. Additional extend information for the dataset (O) 12.
> Spatial representation type (O)
> 13.* Reference system (O)
> 14. Lineage (O)
> 15. On-line resource (O)
> 16.* Metadata file identifier (O)
> 17. Metadata standard name (O)
> 18. Metadata standard version (O)
> 19. Metadata language (C)
> 20. Metadata character set (C)
> 21. Metadata point of contact (M)
> 22.* Metadata date stamp (M)
> plus CSW queryables
> * "Any text"
> * Type (default "dataset")
> For INSPIRE the same list is: (numbers indicate mapping with
> the above and * is for queryables)
> 1.* Resource title (M) 
> 2.* Temporal reference (C) [0..n]
> 3.* Responsible organization (M) including both name of the
> organization and contact e-mail 
> 4.* Geographic Bounding Box (M) [1..n]
> 5. Resource language (C) [0..n]
> 7.* Topic category (M) [1..n]
> 8.* Spatial resolution (C) [0..n]
> 9.* Resource abstract (M) 
> 11. Temporal extend (C) [0..n]
> 14.* Lineage (M) 
> 15. Resource Locator (C) [0...n]
> 19. Metadata Language (M) 
> 21. Metadata point of contact (M) including both name of the
> organization and contact e-mail [1..n]
> 22. Metadata Date (M) 
> 23.* Resource Type (M) 
> 24.* Unique Resource Identifier (M) [1..n]
> 25.* Keyword (M) [1..n]
> 26.* Conformity (M) 
> 27.* Conditions for access and use (M) [1..n]
> 28.* Limitations on public access (M) [1..n]
> [1..n] indicates "1 to many"
> The resources of the above are:
> Another interesting document to read is:
> Any thoughts, ideas, proposals on how to proceed?
I think it would be a good idea to:
- start mapping out the core queryables (we should start with the ISO
profile first IMHO, since it looks like INSPIRE is an extension based on
the ISO profile)
- establish a framework on adding plugins to the code
We could use the wiki at https://sourceforge.net/apps/trac/pycsw/wiki to
start and flush out requirements.
> Angelos Tzotsos
> Remote Sensing Laboratory
> National Technical University of Athens
More information about the Pycsw-devel