[cat-interop] GeoBlacklight Solr schema

Susan Powell spowell at library.berkeley.edu
Fri Mar 14 11:47:21 PDT 2014


Hi all,
This a very interesting discussion. Thanks for your answers about the
choice of Dublin Core, Darren. I also like the fact that it is widely used
outside of the geospatial realm.

One further question/comment: In looking at the semantic definition of
dc:source, I wonder if dc:provenance might be a better fit for the source
institution? In the DC definitions, dc:source is "A Reference to a resource
from which the present resource is derived," which could be stretched to
include an institution, but does seem like a bit of a stretch. Provenance,
on the other hand, is more about the chain of custody, and seems to me like
perhaps a better fit.

Best,
Susan

Susan Powell
GIS & Map Librarian
UC Berkeley


On Fri, Mar 14, 2014 at 10:30 AM, Steve Richard
<steve.richard at azgs.az.gov>wrote:

> I am looking through Chris's comments sent to Kim and Darren, and the
> discussion about how to put distribution information in metadata for
> service-based access to datasets is one I've wrestled with as well. There's
> a current activity at http://github/cat-interop , and a related
> discussion in the data.gov projectOpenData (
> https://github.com/project-open-data/project-open-data.github.io/issues/291
> ).
>
> My own thinking about this is summarized in a discussion paper on gitHub (
> https://github.com/usgin/usginspecs/raw/master/MetadataAsHypermediaApp.docx, MS Word doc... :( )
>
> I think the best course of action for putting links to resources in Dublin
> Core XML (should they be in dct:references or in dc:relation???) is to make
> the dc element content a JSON blob that provides a machine-actionable
> description of the link
>
>
> Here's a few examples:
> Here's some examples of what the content of a dct:references (or
> dc:relation) element would look like if a JSON representation of this link
> representation was used as the encoding for a machine actionable link:
>
> { "link":"http://mirador.gsfc.nasa.gov/mirador-plugin-search.xml",
> "rel":"documentation",
> "title":"OpenSearch description for accessing this collection",
> "type":"application/opensearchdescription+xml",
> "profile":"http://commons.esipfed.org/ns/discovery/1.2/collectionCast#",
> "description":"point to a search service from a ESIP collection cast
> entry" }
>
> { "link":"
> http://kgs.uky.edu/arcgis/services/aasggeothermal/ARSeismicHypocenters/MapServer/WFSServer?request=GetFeature&service=WFS&TypeName=Hypocenter&MaxFeatures=10
> ",
> "rel":"download",
> "title":"Example WFS getFeature request for NGDS seismic event
> hypocenters",
> "type":"application/gml+xml",
> "overlayAPI":"OGC:WFS",
> "profile":"
> http://stategeothermaldata.org/uri-gin/aasg/xmlschema/hypocenter/1.7",
> "parameter":"typeName=Hypocenter" }
>
> { "link":"
> http://kgs.uky.edu/arcgis/services/aasggeothermal/ARSeismicHypocenters/MapServer/WFSServer?request=GetCapabilities
> ",
> "rel":"documentation",
> "title":"Get the capabilities document for seismic event hypocenter WFS
> service",
> "type":"application/xml",
> "overlayAPI":"OGC:WFS" }
>
> /* OpenSearch link in ESIP discovery/data cast
>
> { "link":"http://
> http://mirador.gsfc.nasa.gov/cgi-bin/mirador/collectionlist.pl?keyword={searchTerms}
> ",
> "rel":"search",
> "title":"Search template for Mirador keyword search",
> "type":"application/atom+xml",
> "template":"http://a9.com/-/spec/opensearch/1.1",
> "profile":"http://commons.esipfed.org/ns/discovery/1.2/collectionCast#",
> "description":"Search service for a collection cast entry" }
>
> Stephen M Richard
> Arizona Geological Survey
> 416 W. Congress  #100
> Tucson, AZ
> AZGS: 520-770-3500
> Office: 520-209-4127
> FAX: 520-770-3505
>
> > -----Original Message-----
> > From: opengeoportal-request at elist.tufts.edu [mailto:opengeoportal-
> > request at elist.tufts.edu] On Behalf Of Darren Hardy
> > Sent: Thursday, March 13, 2014 1:43 PM
> > To: opengeoportal at elist.tufts.edu
> > Subject: Re: GeoBlacklight Solr schema
> >
> > Hi Chris,
> >
> > Thanks so much for the great feedback. I've attempted to address your
> feedback
> > below...
> >
> > On Mar 13, 2014, at 11:38 AM, Barnett, Christopher S
> > <Christopher.Barnett at tufts.edu> wrote:
> >
> > > Hi Kim and Darren,
> > >
> > > I think this looks great!  In particular I'm happy to see the use of
> Dublin Core,
> > as it seems like a good fit for the terms we're hoping to search.  It
> also has good
> > mappings with CSW core queryables, which might become important.
>  Building
> > on an existing standard also has the potential to aid in compatibility
> with other
> > similar projects.
> > >
> > > I have a few specific questions, most of which have to do with the
> layer
> > specific portions. Some of these are questions that I've also been
> asking myself
> > about the current OGP schema, as I've thought about ways in which it may
> be
> > improved.
> > >
> > >> dc_relation_url: URL to related item: Multiple values allowed.
> Example:
> > >> "http://purl.stanford.edu/vr593vj7147"
> > >
> > > In the example you've given, the url is to the (ISO) metadata for the
> data
> > object?  Is this common convention in Dublin Core?  I'm wondering if
> there is
> > room in the schema for semantics to define what type of object is being
> linked
> > to.
> > >
> >
> > Kim elaborated on this point in an earlier email. The semantics of
> DC.Relation is
> > quite vague unfortunately. I like the idea of looking at the INSPIRE
> profile for
> > discovery. Perhaps they have a solution for links to additional
> resources.
> >
> >
> > >
> > >
> > >> layer_geom: Shape of the layer as a Point, LineString, or Polygon WKT.
> > >> Example: "POLYGON((76.76 19.91705, 84.76618 19.91705, 84.76618
> > 12.62309, 76.76 12.62309, 76.76 19.91705))"
> > >
> > > Is this the actual geometry for the layer? A generalized
> representation?  If the
> > former, how are you handling layers with millions of multipart features?
>  If the
> > latter, how are you generalizing?
> > >
> > >
> >
> > My example is just the bounding box. If you wanted to represent the
> actual
> > geometry, you could generalize it at indexing time, or compute a convex
> hull.
> > This field is more experimental in my evaluation document. When you use
> Solr4
> > + JTS you can represent (more or less) arbirary geometry WKTs and issue
> spatial
> > predicates on them. Solr 4.7, however, upgraded to Spatial4J 0.4 which
> has
> > some initial support for geometry WKTs, so this avenue may be promising
> as far
> > as the core Solr implementation goes.
> >
> >
> > >
> > >> layer_id_s. The complete identifier for the WMS/WFS/WCS layer.
> > >> Example: "druid:vr593vj7147",
> > >
> > > This makes the assumption that the layer name will be the same for all
> OGC
> > services (as OGP does).  It's probably a reasonable assumption, but worth
> > thinking about.  I'm glad to see WorkspaceName go away.
> > >
> >
> > Yes, we're assuming the layer name is the same across all WxS services.
> >
> >
> > >
> > >> layer_srs_s: The spatial reference system for the layer. Example:
> EPSG:4326.
> > >
> > > Do you get this from the web service or use a library to translate WKT
> from the
> > metadata to EPSG, or something else?  I wouldn't be surprised to hear
> that ISO
> > has a place to put EPSG codes, since ISO can represent virtually
> anything, but
> > I've also seen a lot of ISO metadata that does not have this info.
>  Also, is this the
> > native EPSG for the original object or the web service? The web service
> of
> > course, can have many available projections.  The end user may want to
> know
> > about the layer's projection, but the front end also needs to know what
> > projections are available for display.  I've been dismayed to find
> services that
> > don't support web mercator or 4326.
> > >
> >
> > We've found this to be a nasty problem. In our "data wrangling" phase, we
> > manually normalize the data into a 4326 projection, so in our case, the
> SRS for
> > the web services is always 4326. If they want the original projection,
> we have
> > another page outside the scope of the discovery service where they can
> > download that. But overall, my assumption is that layer_srs_s is the
> projection
> > used by the web service.
> >
> > >> layer_geom_type_s. Valid values are: "Point", "Line", "Polygon", and
> > "Raster".
> > >
> > > Is there room in the schema for other "data types"?  Something like
> "Scanned
> > Map"  is easy enough (maybe not... how do you differentiate between a
> > georeferenced and ungeoreferenced map? Are these distinct data types?),
> since
> > it could be classified as a geometry type.  What about documents or data
> sets
> > with clear geospatial extents, but nothing that could rightly be called
> > "geometry"? A lot of folks in the scientific community use what we would
> call
> > geospatial metadata to document such things.  You are also likely to run
> into
> > metadata that won't specify more than "raster" or "vector".
> > >
> >
> > I was hoping to use the OGC simple feature types for these geometry
> types. The
> > extensions to GML provide for all sorts of coverage types which might be
> > suitable to handle a non-georectified scanned map. We do, however, want
> to
> > use a controlled vocabulary for the geometry type.
> >
> > Part of our initial assumption for the discovery service was that if the
> record
> > does not have a WMS service, then it's not cataloged in the discovery
> service.
> > We're revisiting that to require only a bounding box and not both a
> bounding
> > box and WMS service.
> >
> > For the "vector" data sets, we look at the actual data to determine the
> > geometry type during our "data wrangling" phase.
> >
> >
> > >
> > >> layer_wcs_url: Service root for the WCS service that holds this
> layer. If
> > applicable. Example:
> > >> "http://geowebservices-restricted.stanford.edu/geoserver/wcs"
> > >> layer_wfs_url: Service root for the WFS service that holds this
> layer. If
> > applicable. Example:
> > >> "http://geowebservices-restricted.stanford.edu/geoserver/wfs"
> > >> layer_wms_url: Service root for the WMS service that holds this layer
> > "http://geowebservices-restricted.stanford.edu/geoserver/wms"
> > >
> > > If there are non-ogc services (ArcGIS Server REST services, HGL's Open
> Delivery
> > for scanned maps, Berkeley's service for ungeoreferenced maps, etc.),
>  links to
> > zip files, browse graphics, or other resources would those be
> represented here
> > (with additional elements like "layer_arcgisrest_url" ) or in the
> repeatable
> > "dc_relation_url" element? One could also imagine a multipart/multilayer
> object
> > more properly represented by an OGC WMC or (upcoming) OWS Context.  I
> > wonder if there is a more generic way to define service url as a schema
> element
> > that would at least pair a url with a descriptor.   It may be that there
> is not a
> > good way to do this in Solr and "layer_${service_type}_url" is the best
> current
> > approach.  Is there a way of crafting a Solr query that would return all
> > "layer_${service_type}_url" fields, but not, say "layer_geom"?
> > >
> >
> > As Kim mentioned, we're revisiting this issue and I'm hoping to use
> DC.Relation
> > plus a verb to manage these links. A preview image, for example. I'm not
> sure
> > about using other non-WxS services for our web map -- we haven't thought
> > about that frankly.
> >
> >
> > > Two last non-field specific questions:
> > >
> > > CSW supports "anytext" queries.  It seems like that would require an
> indexed
> > field in Solr with the entire text of the metadata record (ISO or FGDC,
> etc.),
> > minus the xml entities. it's not something that tools like OGP or
> GeoBlacklight
> > must support as a matter of course, but I'm interested in thinking
> through what
> > possibilities/problems might emerge.  I'm just curious if this came up in
> > discussions, and if so, what were the key decision points?
> >
> > I'm not familiar with the semantics of the CSW anytext queries, but what
> we do
> > is copy various fields into our generic "text" field -- akin to an any
> text query
> > perhaps. We copy about a dozen fields into the text field. If you look
> at the
> > bottom of the Solr schema.xml you will see the specific copyField
> directives.
> >
> >   https://github.com/sul-dlss/geohydra/blob/master/solr/kurma-app-
> > test/conf/schema.xml
> >
> >
> > >
> > > Mike Graves has talked some in the past about separating the OGP schema
> > from its Solr implementation.  Think ISO 19115-1 vs. its implementation
> in ISO
> > 19139;  at least that's my read on what I've heard Mike say.  Did you
> guys have
> > any thoughts about having a more conceptual schema that's an information
> > model of sorts vs. concrete implementation as a Solr XML schema? I don't
> > anticipate OGP moving away from Solr anytime soon, but there may be other
> > potential partners using different search technologies.
> > >
> >
> > This was part of my thinking behind using Dublin Core as they do have a
> > conceptual schema, plus they have various extensions and profiles, etc.
>  Perhaps
> > that INSPIRE discovery profile might have some of this information
> model-level
> > work. But it would be a great asset to have a conceptual model for what
> > information is required for geospatial discovery services.
> >
> > Thanks,
> > -Darren
> >
> >
> > --
> > Darren Hardy, Ph.D.
> > GIS Software Engineer
> > Digital Library Systems & Services
> > Stanford University
> > drh at stanford.edu
> > www.stanford.edu/~drh
> >
> >
> > > Thanks for this!  There are a lot of great things here.  I understand
> that there
> > are many intractable problems that won't be solved in one iteration, or
> many!  I
> > look forward to exploring the spatial search components in more detail.
> > >
> > > Just as a side-note, I was at a meeting last week for an NSF
> initiative with
> > some ISO luminaries and there was a lot of talk about discovery profiles
> for ISO.
> > Unfortunately, I can't contribute much more than to say that it's a
> thing that
> > exists or may/will exist.  Anyone on the list know anything about ISO
> discovery
> > profiles?
> > >
> > > Chris
> > >
> > > --
> > > Christopher Barnett
> > > Geospatial Analyst, Research & Geospatial Technology Services Tufts
> > > Technology Services (TTS)
> > > 16 Dearborn Rd.
> > > Somerville, MA 02144
> > > http://gis.tufts.edu
> > >
> > >
> > >
> > >
> > >
> > > On Mar 12, 2014, at 3:05 PM, Kimberly A Durante <kdurante at stanford.edu
> >
> > wrote:
> > >
> > >> Hello OGP,
> > >>
> > >> Stanford Libraries is in the process of developing GeoBlacklight- a
> geospatial
> > data discovery application, and a plugin to Blacklight.
> > >>
> > >> As part of this development our GIS developer, Darren Hardy, has
> created a
> > Solr metadata schema which we would like to share with the larger OGP
> > community in the hopes of gathering feeedback on the schema design and
> the
> > proposed elements.
> > >> The schema is based on elements from Dublin Core for the descriptive
> > >> metadata and also contains a set of layer-specific fields (prefixed
> > >> with 'layer_'.)
> > >>
> > >> The current version of the schema can be found here:
> > >> http://goo.gl/UTIzRl
> > >>
> > >> Comments regarding the proposed GeoBlacklight schema are welcome from
> > members of entire OGP community. We are interested in any feedback or
> insight
> > into the elements, their definitions, as well as the possibilities for
> future use by
> > other institutions. Please feel free to review the schema and add your
> comments
> > to directly to the Google doc using the comment feature.
> > >> You can also send comments to this list, or directly to us us by email
> > (kdurante at stanford.edu, drh at stanford.edu).
> > >>
> > >> In order to promote metadata exchange, this schema is intended to
> > crosswalk with the OGP schema. We are also developing an FGDC to MODS
> > crosswalk to manage the source metadata transformation.
> > >>
> > >> Any comments or feedback are appreciated. Thanks for your time.
> > >>
> > >> Kim Durante
> > >> Metadata Librarian for Geographic and Scientific Data Stanford
> > >> University Libraries
> > >> 650.724.5686
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/cat-interop/attachments/20140314/c1365155/attachment-0001.html>


More information about the cat-interop mailing list