[pycsw-devel] Fwd: [geonode-users] Semantically enriched search in GeoNode using Ontologies

Angelos Tzotsos gcpp.kalxas at gmail.com
Thu Jan 10 15:27:31 PST 2013


I thought to forward this here.
We should keep an eye on CSW/OWL

Angelos


-------- Original Message --------
Subject: 	[geonode-users] Semantically enriched search in GeoNode using 
Ontologies
Date: 	Thu, 10 Jan 2013 08:16:00 -0800 (PST)
From: 	LJauregui <leojauregui.geog at gmail.com>
Reply-To: 	geonode-users at googlegroups.com
To: 	geonode-users at googlegroups.com



* *

*In this note, written by Heikki Doeleman from Amsterdam, there are 4
possible actions in other to have ontology based semantic search in
GeoNode. Please lets discus them.....*

*
*
* Introduction*

In my opinion, there are 4 possible solutions:


    - GeoNetwork keyword support
    - CSW/ebRIM
    - CSW/OWL
    - Data Catalog Vocabulary services


I'll shortly describe each of these.


*GeoNetwork keyword support*

This is by far the easiest (and cheapest) option, but it is also by far the
weakest. As you probably know, you can upload a RDF thesaurus in
GeoNetwork, and subsequently the keywords that it contains can be used to
"tag" each metadata. These tags can be used as search criteria. However,
GeoNetwork does not use any of the more interesting semantic information
from the thesaurus (e.g. it completely ignores information about
subclasses, synonyms, and other thesaurus relations).

pros:

    - already fully available in GeoNetwork
    - no need to do any development


cons:

    - ignores all semantic information except actual keywords



*CSW/ebRIM*

This is the solution we've researched and described in
http://geonetwork.tv/owl. It makes use of the CSW/ebRIM implementation that
I and others have developed in a project for the European Space Agency, you
can see a presentation of that here: http://geonetwork.tv/ebrim. In brief,
it works like this:

ebRIM (Enterprise Business Registry Information Model, also known as ebXML)
is a highly generic data model, which can make use of "Extension Packs" to
model data and relations. The specification was developed by OASIS and is
now a ISO standard (ISO 15000). Separate specifications (OGC 07-038,
07-110, 07-144) describe how CSW should be used with ebRIM.

The implementation we made for ESA implements fully the required parts of
the OGC specifications. This implementation, although developed under the
umbrella of GeoNetwork, is actually a completely separate web application
that maintains its own database and Lucene index. It takes as input
ISO19139 metadata coming from GeoNetwork (or from any other source, for
that matter) and transforms that into ebRIM objects according to the OGC
specifications. Then this metadata can be searched and browsed using the
CSW discovery operations.

You can find the OGC specifications here
http://www.opengeospatial.org/standards/cat, and the OASIS specification in
this list here https://www.oasis-open.org/standards.

The work we did with Juliet Gwenzi built on this in the following way. We
created a transformation from OWL thesaurus documents to ebRIM objects,
preserving semantic information such as subClassOf, synonymOf etc.). Once
the thesaurus is loaded into the ebRIM application, metadata can be tagged
using its keywords. Then you can search for this using the keywords
directly (as in the standard GeoNetwork solution) but also using the
semantic relations from the thesaurus. For example, suppose you have a
thesaurus which expresses that both "typhoon" and "cyclone" are a subclass
of "tropical storm". You can then search for "tropical storm", and find all
metadata tagged with either "typhoon" or "cyclone".

This works quite well, though the transformation of OWL to ebRIM does not
yet completely cover the rich OWL language, so some semantic information is
now lost in the transformation. Depending on your use cases you might want
to increase the OWL constructs covered by this transformation. If you want
to choose this solution, I would foresee that you need to do some
development work to integrate it well with GeoNode. In particular I think
the following points:

- link GeoNetwork to the ebRIM application so that all create, update and
delete actions on metadata are propagated to ebRIM
- link the GeoNetwork (or GeoNode) search/browse function (at least the
parts that are about keywords and semantic relations) to generate CSW/ebRIM
queries and search in ebRIM
- (optional) extend OWL coverage in ebRIM transformation
- (probably) improve the GUI we made to enable tagging metadata with
keywords from the ontology in ebRIM

pros:

    - ebRIM store already implemented and tested
    - OWL to ebRIM proof of concept already done


cons:

    - requires development efforts to integrate in GeoNode



*CSW/OWL*

A new OGC specification (09-010) aims to integrate CSW directly with OWL.
Although I don't think this is already considered a "standard" (09-010 is a
"discussion paper") it looks to be a very promising integration, as both
CSW and OWL are standards that are very widely used in their respective
domains. However, I'm not aware of any efforts to this date of implementing
this specification; so to use this solution, you would need to develop it.
You can find the specification here:
http://portal.opengeospatial.org/files/?artifact_id=32620.

pros:

    - probably the best solution


cons:

    - not implemented



*Data Catalog Vocabulary services*

These are new services available in development versions of GeoNetwork,
which allow exporting all or part of the catalogue content in an RDF
format, which then can be loaded into separate software such as Virtuoso
which knows how to do SPARQL queries against this RDF content. In this way
it has full support for querying using all semantic functions in the
thesaurus. Drawbacks to use this in your GeoNode architecture are that
you'll need to extend it with SPARQL search software and integrate that,
both to "harvest" the GeoNetwork catalogue in RDF format, and to link the
GeoNetwork (or GeoNode) GUI for keyword search, to this. You can see a
description of these new services here:
http://trac.osgeo.org/geonetwork/wiki/proposals/DCATandRDFServices.

pros:

    - most development work already done


cons:

    - requires development work for integration
    - requires additional SPARQL-aware software in your architecture



*Conclusions*

I think any of the four solutions can be valid for you, but it completely
depends of course what your requirements are and how much effort you want
to expend to fulfill them.

My preferred solution would be CSW/OWL, as I think that is the solution
that completely fits the use case of ontological semantic enrichment;
however I have no idea how much development effort would be needed for it
without doing a more detailed analysis. This would give you a solution that
fully implements a standard CSW interface.

Second best may be Data Catalog Vocabulary Services, as it would support
querying for all semantic relations in your ontology. Development efforts
at first glance seem to be less than for CSW/OWL (although analysis is
needed of course). Downside is that it does not use or expose any standard
interfaces.

Third, CSW/ebRIM can solve the requirement, though I think the development
effort to support more fully all OWL relations (if you want that) is
possibly larger than for Data Catalog Vocabulary Services. On the plus
side, you'll have a fully functional ebRIM registry which can also be used
for other purposes, and which is exposed using an official standard CSW
interface.

Lastly you could just use the existing keyword support in GeoNetwork. This
is by far the cheapest as you don't need to do any development efforts for
it, but it will also give you by far the weakest semantic support, as not
any ontological relations are supported.

-- 
You received this message because you are subscribed to the Google Groups "geonode-users" group.
To unsubscribe from this group, send email to geonode-users+unsubscribe at googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pycsw-devel/attachments/20130111/cae65106/attachment.html>


More information about the pycsw-devel mailing list