[cat-interop] GeoBlacklight Solr schema

Steve Richard steve.richard at azgs.az.gov
Fri Mar 14 10:30:47 PDT 2014


I am looking through Chris's comments sent to Kim and Darren, and the discussion about how to put distribution information in metadata for service-based access to datasets is one I've wrestled with as well. There's a current activity at http://github/cat-interop , and a related discussion in the data.gov projectOpenData (https://github.com/project-open-data/project-open-data.github.io/issues/291).

My own thinking about this is summarized in a discussion paper on gitHub (https://github.com/usgin/usginspecs/raw/master/MetadataAsHypermediaApp.docx , MS Word doc... :( )

I think the best course of action for putting links to resources in Dublin Core XML (should they be in dct:references or in dc:relation???) is to make the dc element content a JSON blob that provides a machine-actionable description of the link


Here's a few examples:
Here's some examples of what the content of a dct:references (or dc:relation) element would look like if a JSON representation of this link representation was used as the encoding for a machine actionable link:

{ "link":"http://mirador.gsfc.nasa.gov/mirador-plugin-search.xml", 
"rel":"documentation", 
"title":"OpenSearch description for accessing this collection", 
"type":"application/opensearchdescription+xml", 
"profile":"http://commons.esipfed.org/ns/discovery/1.2/collectionCast#", 
"description":"point to a search service from a ESIP collection cast entry" }

{ "link":"http://kgs.uky.edu/arcgis/services/aasggeothermal/ARSeismicHypocenters/MapServer/WFSServer?request=GetFeature&service=WFS&TypeName=Hypocenter&MaxFeatures=10", 
"rel":"download", 
"title":"Example WFS getFeature request for NGDS seismic event hypocenters", 
"type":"application/gml+xml", 
"overlayAPI":"OGC:WFS", 
"profile":"http://stategeothermaldata.org/uri-gin/aasg/xmlschema/hypocenter/1.7", 
"parameter":"typeName=Hypocenter" }

{ "link":"http://kgs.uky.edu/arcgis/services/aasggeothermal/ARSeismicHypocenters/MapServer/WFSServer?request=GetCapabilities", 
"rel":"documentation", 
"title":"Get the capabilities document for seismic event hypocenter WFS service", 
"type":"application/xml", 
"overlayAPI":"OGC:WFS" }

/* OpenSearch link in ESIP discovery/data cast

{ "link":"http:// http://mirador.gsfc.nasa.gov/cgi-bin/mirador/collectionlist.pl?keyword={searchTerms}", 
"rel":"search", 
"title":"Search template for Mirador keyword search", 
"type":"application/atom+xml", 
"template":"http://a9.com/-/spec/opensearch/1.1", 
"profile":"http://commons.esipfed.org/ns/discovery/1.2/collectionCast#", 
"description":"Search service for a collection cast entry" }

Stephen M Richard
Arizona Geological Survey
416 W. Congress  #100
Tucson, AZ
AZGS: 520-770-3500
Office: 520-209-4127
FAX: 520-770-3505

> -----Original Message-----
> From: opengeoportal-request at elist.tufts.edu [mailto:opengeoportal-
> request at elist.tufts.edu] On Behalf Of Darren Hardy
> Sent: Thursday, March 13, 2014 1:43 PM
> To: opengeoportal at elist.tufts.edu
> Subject: Re: GeoBlacklight Solr schema
> 
> Hi Chris,
> 
> Thanks so much for the great feedback. I've attempted to address your feedback
> below...
> 
> On Mar 13, 2014, at 11:38 AM, Barnett, Christopher S
> <Christopher.Barnett at tufts.edu> wrote:
> 
> > Hi Kim and Darren,
> >
> > I think this looks great!  In particular I'm happy to see the use of Dublin Core,
> as it seems like a good fit for the terms we're hoping to search.  It also has good
> mappings with CSW core queryables, which might become important.  Building
> on an existing standard also has the potential to aid in compatibility with other
> similar projects.
> >
> > I have a few specific questions, most of which have to do with the layer
> specific portions. Some of these are questions that I've also been asking myself
> about the current OGP schema, as I've thought about ways in which it may be
> improved.
> >
> >> dc_relation_url: URL to related item: Multiple values allowed. Example:
> >> "http://purl.stanford.edu/vr593vj7147"
> >
> > In the example you've given, the url is to the (ISO) metadata for the data
> object?  Is this common convention in Dublin Core?  I'm wondering if there is
> room in the schema for semantics to define what type of object is being linked
> to.
> >
> 
> Kim elaborated on this point in an earlier email. The semantics of DC.Relation is
> quite vague unfortunately. I like the idea of looking at the INSPIRE profile for
> discovery. Perhaps they have a solution for links to additional resources.
> 
> 
> >
> >
> >> layer_geom: Shape of the layer as a Point, LineString, or Polygon WKT.
> >> Example: "POLYGON((76.76 19.91705, 84.76618 19.91705, 84.76618
> 12.62309, 76.76 12.62309, 76.76 19.91705))"
> >
> > Is this the actual geometry for the layer? A generalized representation?  If the
> former, how are you handling layers with millions of multipart features?  If the
> latter, how are you generalizing?
> >
> >
> 
> My example is just the bounding box. If you wanted to represent the actual
> geometry, you could generalize it at indexing time, or compute a convex hull.
> This field is more experimental in my evaluation document. When you use Solr4
> + JTS you can represent (more or less) arbirary geometry WKTs and issue spatial
> predicates on them. Solr 4.7, however, upgraded to Spatial4J 0.4 which has
> some initial support for geometry WKTs, so this avenue may be promising as far
> as the core Solr implementation goes.
> 
> 
> >
> >> layer_id_s. The complete identifier for the WMS/WFS/WCS layer.
> >> Example: "druid:vr593vj7147",
> >
> > This makes the assumption that the layer name will be the same for all OGC
> services (as OGP does).  It's probably a reasonable assumption, but worth
> thinking about.  I'm glad to see WorkspaceName go away.
> >
> 
> Yes, we're assuming the layer name is the same across all WxS services.
> 
> 
> >
> >> layer_srs_s: The spatial reference system for the layer. Example: EPSG:4326.
> >
> > Do you get this from the web service or use a library to translate WKT from the
> metadata to EPSG, or something else?  I wouldn't be surprised to hear that ISO
> has a place to put EPSG codes, since ISO can represent virtually anything, but
> I've also seen a lot of ISO metadata that does not have this info.  Also, is this the
> native EPSG for the original object or the web service? The web service of
> course, can have many available projections.  The end user may want to know
> about the layer's projection, but the front end also needs to know what
> projections are available for display.  I've been dismayed to find services that
> don't support web mercator or 4326.
> >
> 
> We've found this to be a nasty problem. In our "data wrangling" phase, we
> manually normalize the data into a 4326 projection, so in our case, the SRS for
> the web services is always 4326. If they want the original projection, we have
> another page outside the scope of the discovery service where they can
> download that. But overall, my assumption is that layer_srs_s is the projection
> used by the web service.
> 
> >> layer_geom_type_s. Valid values are: "Point", "Line", "Polygon", and
> "Raster".
> >
> > Is there room in the schema for other "data types"?  Something like "Scanned
> Map"  is easy enough (maybe not... how do you differentiate between a
> georeferenced and ungeoreferenced map? Are these distinct data types?), since
> it could be classified as a geometry type.  What about documents or data sets
> with clear geospatial extents, but nothing that could rightly be called
> "geometry"? A lot of folks in the scientific community use what we would call
> geospatial metadata to document such things.  You are also likely to run into
> metadata that won't specify more than "raster" or "vector".
> >
> 
> I was hoping to use the OGC simple feature types for these geometry types. The
> extensions to GML provide for all sorts of coverage types which might be
> suitable to handle a non-georectified scanned map. We do, however, want to
> use a controlled vocabulary for the geometry type.
> 
> Part of our initial assumption for the discovery service was that if the record
> does not have a WMS service, then it's not cataloged in the discovery service.
> We're revisiting that to require only a bounding box and not both a bounding
> box and WMS service.
> 
> For the "vector" data sets, we look at the actual data to determine the
> geometry type during our "data wrangling" phase.
> 
> 
> >
> >> layer_wcs_url: Service root for the WCS service that holds this layer. If
> applicable. Example:
> >> "http://geowebservices-restricted.stanford.edu/geoserver/wcs"
> >> layer_wfs_url: Service root for the WFS service that holds this layer. If
> applicable. Example:
> >> "http://geowebservices-restricted.stanford.edu/geoserver/wfs"
> >> layer_wms_url: Service root for the WMS service that holds this layer
> "http://geowebservices-restricted.stanford.edu/geoserver/wms"
> >
> > If there are non-ogc services (ArcGIS Server REST services, HGL's Open Delivery
> for scanned maps, Berkeley's service for ungeoreferenced maps, etc.),  links to
> zip files, browse graphics, or other resources would those be represented here
> (with additional elements like "layer_arcgisrest_url" ) or in the repeatable
> "dc_relation_url" element? One could also imagine a multipart/multilayer object
> more properly represented by an OGC WMC or (upcoming) OWS Context.  I
> wonder if there is a more generic way to define service url as a schema element
> that would at least pair a url with a descriptor.   It may be that there is not a
> good way to do this in Solr and "layer_${service_type}_url" is the best current
> approach.  Is there a way of crafting a Solr query that would return all
> "layer_${service_type}_url" fields, but not, say "layer_geom"?
> >
> 
> As Kim mentioned, we're revisiting this issue and I'm hoping to use DC.Relation
> plus a verb to manage these links. A preview image, for example. I'm not sure
> about using other non-WxS services for our web map -- we haven't thought
> about that frankly.
> 
> 
> > Two last non-field specific questions:
> >
> > CSW supports "anytext" queries.  It seems like that would require an indexed
> field in Solr with the entire text of the metadata record (ISO or FGDC, etc.),
> minus the xml entities. it's not something that tools like OGP or GeoBlacklight
> must support as a matter of course, but I'm interested in thinking through what
> possibilities/problems might emerge.  I'm just curious if this came up in
> discussions, and if so, what were the key decision points?
> 
> I'm not familiar with the semantics of the CSW anytext queries, but what we do
> is copy various fields into our generic "text" field -- akin to an any text query
> perhaps. We copy about a dozen fields into the text field. If you look at the
> bottom of the Solr schema.xml you will see the specific copyField directives.
> 
>   https://github.com/sul-dlss/geohydra/blob/master/solr/kurma-app-
> test/conf/schema.xml
> 
> 
> >
> > Mike Graves has talked some in the past about separating the OGP schema
> from its Solr implementation.  Think ISO 19115-1 vs. its implementation in ISO
> 19139;  at least that's my read on what I've heard Mike say.  Did you guys have
> any thoughts about having a more conceptual schema that's an information
> model of sorts vs. concrete implementation as a Solr XML schema? I don't
> anticipate OGP moving away from Solr anytime soon, but there may be other
> potential partners using different search technologies.
> >
> 
> This was part of my thinking behind using Dublin Core as they do have a
> conceptual schema, plus they have various extensions and profiles, etc.  Perhaps
> that INSPIRE discovery profile might have some of this information model-level
> work. But it would be a great asset to have a conceptual model for what
> information is required for geospatial discovery services.
> 
> Thanks,
> -Darren
> 
> 
> --
> Darren Hardy, Ph.D.
> GIS Software Engineer
> Digital Library Systems & Services
> Stanford University
> drh at stanford.edu
> www.stanford.edu/~drh
> 
> 
> > Thanks for this!  There are a lot of great things here.  I understand that there
> are many intractable problems that won't be solved in one iteration, or many!  I
> look forward to exploring the spatial search components in more detail.
> >
> > Just as a side-note, I was at a meeting last week for an NSF initiative with
> some ISO luminaries and there was a lot of talk about discovery profiles for ISO.
> Unfortunately, I can't contribute much more than to say that it's a thing that
> exists or may/will exist.  Anyone on the list know anything about ISO discovery
> profiles?
> >
> > Chris
> >
> > --
> > Christopher Barnett
> > Geospatial Analyst, Research & Geospatial Technology Services Tufts
> > Technology Services (TTS)
> > 16 Dearborn Rd.
> > Somerville, MA 02144
> > http://gis.tufts.edu
> >
> >
> >
> >
> >
> > On Mar 12, 2014, at 3:05 PM, Kimberly A Durante <kdurante at stanford.edu>
> wrote:
> >
> >> Hello OGP,
> >>
> >> Stanford Libraries is in the process of developing GeoBlacklight- a geospatial
> data discovery application, and a plugin to Blacklight.
> >>
> >> As part of this development our GIS developer, Darren Hardy, has created a
> Solr metadata schema which we would like to share with the larger OGP
> community in the hopes of gathering feeedback on the schema design and the
> proposed elements.
> >> The schema is based on elements from Dublin Core for the descriptive
> >> metadata and also contains a set of layer-specific fields (prefixed
> >> with 'layer_'.)
> >>
> >> The current version of the schema can be found here:
> >> http://goo.gl/UTIzRl
> >>
> >> Comments regarding the proposed GeoBlacklight schema are welcome from
> members of entire OGP community. We are interested in any feedback or insight
> into the elements, their definitions, as well as the possibilities for future use by
> other institutions. Please feel free to review the schema and add your comments
> to directly to the Google doc using the comment feature.
> >> You can also send comments to this list, or directly to us us by email
> (kdurante at stanford.edu, drh at stanford.edu).
> >>
> >> In order to promote metadata exchange, this schema is intended to
> crosswalk with the OGP schema. We are also developing an FGDC to MODS
> crosswalk to manage the source metadata transformation.
> >>
> >> Any comments or feedback are appreciated. Thanks for your time.
> >>
> >> Kim Durante
> >> Metadata Librarian for Geographic and Scientific Data Stanford
> >> University Libraries
> >> 650.724.5686
> >>
> >>
> >>
> >>
> >
> 
> 



More information about the cat-interop mailing list