[pycsw-devel] Initial plan to implement ISO and INSPIRE compatibility for PyCSW

Fri Mar 4 15:32:39 EST 2011

> -----Original Message-----
> From: Angelos Tzotsos [mailto:gcpp.kalxas at gmail.com] 
> Sent: Tuesday, 01 March 2011 18:43
> To: pycsw-devel at lists.sourceforge.net
> Subject: [pycsw-devel] Initial plan to implement ISO and 
> INSPIRE compatibility for PyCSW
> 
> Hi all,
> 
> I was reading the code lately and was wondering how should we 
> proceed in order to implement ISO and INSPIRE for PyCSW.
> 
> I think the first thing that has to be seen is the database schema:
> 
>  - For now, Tom has implemented Dublin Core + CSW 2.0.2 using 
> SQLAlchemy, Shapely and SQLite (which can be any db 
> actually). The basic db schema for this includes the Dublin 
> Core queryable columns in one single table, as well as the 
> Geographic Extend in an extra field. The rest of the xml 
> metadata file is stored in a separate db column (which can be 
> queryable through Xpath). 
> Almost the same strategy is implemented in GeoNetwork CSW.
> 

FYI clarification: Although SQLAlchemy saves us from being tied to a
single database, Python's sqlite lib allows for calling Python functions
(via create_function), which the code uses for bbox (passing to Shapely)
at this point.  So we'll have to work on moving away from this and find.

Having said this, you are right; the strategy is similar to GN in terms
of storing an entire XML document in a db record.  We are additionally
storing the core queryables (Dublin Core) so as to be able to query more
efficiently.  When returning GetRecords results, if
elementsetname='full', then we return the entire XML document as stored;
else, we present (either brief or summary) as per
http://schemas.opengis.net/csw/2.0.2/record.xsd.

>  - Another solution would be to store only the xml files in 
> the database and use only xpath queries. I suppose this would 
> have an effect in performance for large datasets.
> 

We should test this further, although yes it may be bad for performance.

>  - Also, there is the example of MDWeb project, another open 
> source Java implementation of CSW that has implemented the 
> full ISO 19115 schema within postgres (more than 1300 lines 
> of SQL) and stores all the info in this schema. My opinion is 
> that this would lead to unmaintainable code or we would lose 
> the current advantage to use all RDBMS systems available 
> through SQLAlchemy.

Ouch!  IMHO that would be prone to error along the metadata lifecycle.

>  - Finally, I see another possible solution, a middle path. 
> Store the main queryables in db columns (with extra tables 
> for "one to many" connections) but without following the ISO 
> UMLs. At the same time the xml will be stored in separate 
> column in the main table. This will make the xml import and 
> export functions more complicated, but will lead to better 
> performance. I think this direction would lead to ~15 extra 
> tables (eg one for "Keywords",  one for "Resource Language", 
> one for "Topic Category" etc)
> 
> 

Good idea.  For ISO/INSPIRE support, we could start by mapping the ISO
core queryables to the Dublin Core core queryables.  Core queryables
over and above would need to be exposed specifically for ISO/INSPIRE
support.

I would think this work would be best be done as a plugin to pycsw.
Having said this, a plugin architecture would be valuable here, so pycsw
can support n profiles over time.

> But what kind of data need to be stored in the db in order to 
> have basic compliance with ISO 19115 and INSPIRE for datasets?
> 
> For ISO core needs are: (with * are the core queryables in CSW)
> 1.* Dataset Title (M - Mandatory)
> 2. Dataset reference date (M)
> 3. Dataset responsible party (O - Optional)
> 4.* Geographic location of the dataset (C - Conditional) 5. 
> Dataset language (M) 6. Dataset character set (C)
> 7.* Dataset topic category (M) (includes keywords in CSW 
> queryable) 8. Spatial resolution of the dataset (O)
> 9.* Abstract describing the dataset (M)
> 10.* Distribution format (O)
> 11. Additional extend information for the dataset (O) 12. 
> Spatial representation type (O)
> 13.* Reference system (O)
> 14. Lineage (O)
> 15. On-line resource (O)
> 16.* Metadata file identifier (O)
> 17. Metadata standard name (O)
> 18. Metadata standard version (O)
> 19. Metadata language (C)
> 20. Metadata character set (C)
> 21. Metadata point of contact (M)
> 22.* Metadata date stamp (M)
> 
> plus CSW queryables
> * "Any text" 
> * Type (default "dataset")
> 
> 
> For INSPIRE the same list is: (numbers indicate mapping with 
> the above and * is for queryables)
> 1.* Resource title (M) [1]
> 2.* Temporal reference (C) [0..n] 
> 3.* Responsible organization (M) including both name of the 
> organization and contact e-mail [1]
> 4.* Geographic Bounding Box (M) [1..n]
> 5. Resource language (C) [0..n]
> 7.* Topic category (M) [1..n]
> 8.* Spatial resolution (C) [0..n]
> 9.* Resource abstract (M) [1]
> 11. Temporal extend (C) [0..n]
> 14.* Lineage (M) [1]
> 15. Resource Locator (C) [0...n]
> 19. Metadata Language (M) [1]
> 21. Metadata point of contact (M) including both name of the 
> organization and contact e-mail [1..n]
> 22. Metadata Date (M) [1]
> 23.* Resource Type (M) [1]
> 24.* Unique Resource Identifier (M) [1..n]
> 25.* Keyword (M) [1..n]
> 26.* Conformity (M) [1]
> 27.* Conditions for access and use (M) [1..n]
> 28.* Limitations on public access (M) [1..n]
> 
>  [1..n] indicates "1 to many" 
> 
> 
> The resources of the above are:
> http://portal.opengeospatial.org/files/?artifact_id=21460
> http://inspire.jrc.ec.europa.eu/documents/Metadata/INSPIRE_MD_
> IR_and_ISO_v1_2_20100616.pdf
> http://inspire.jrc.ec.europa.eu/documents/Network_Services/Tec
> hnical_Guidance_Discovery_Services_v2.12.pdf
> 
> Another interesting document to read is:
> http://www.neogeo-online.net/blog/wp-content/uploads/2011/01/2
01012_geonetwork_inspire_v0.6.pdf
> 
> Any thoughts, ideas, proposals on how to proceed? 
> 

I think it would be a good idea to:

- start mapping out the core queryables (we should start with the ISO
profile first IMHO, since it looks like INSPIRE is an extension based on
the ISO profile)
- establish a framework on adding plugins to the code

We could use the wiki at https://sourceforge.net/apps/trac/pycsw/wiki to
start and flush out requirements.

Thoughts?

..Tom

> Regards,
> Angelos
> 
> -- 
> Angelos Tzotsos
> Remote Sensing Laboratory
> National Technical University of Athens
> http://users.ntua.gr/tzotsos
>