[cat-interop] GeoBlacklight Solr schema
Darren Hardy
drh at stanford.edu
Fri Mar 14 12:09:22 PDT 2014
Hi Susan,
Yes, I saw your comment earlier, and I think that's a good idea. Initially, I was only considering the core Dublin Core elements. We can certainly expand to the full Dublin Core element set.
Thanks,
-Darren
On Mar 14, 2014, at 11:47 AM, Susan Powell <spowell at library.berkeley.edu> wrote:
> Hi all,
> This a very interesting discussion. Thanks for your answers about the choice of Dublin Core, Darren. I also like the fact that it is widely used outside of the geospatial realm.
>
> One further question/comment: In looking at the semantic definition of dc:source, I wonder if dc:provenance might be a better fit for the source institution? In the DC definitions, dc:source is "A Reference to a resource from which the present resource is derived," which could be stretched to include an institution, but does seem like a bit of a stretch. Provenance, on the other hand, is more about the chain of custody, and seems to me like perhaps a better fit.
>
> Best,
> Susan
>
> Susan Powell
> GIS & Map Librarian
> UC Berkeley
>
>
> On Fri, Mar 14, 2014 at 10:30 AM, Steve Richard <steve.richard at azgs.az.gov> wrote:
> I am looking through Chris's comments sent to Kim and Darren, and the discussion about how to put distribution information in metadata for service-based access to datasets is one I've wrestled with as well. There's a current activity at http://github/cat-interop , and a related discussion in the data.gov projectOpenData (https://github.com/project-open-data/project-open-data.github.io/issues/291).
>
> My own thinking about this is summarized in a discussion paper on gitHub (https://github.com/usgin/usginspecs/raw/master/MetadataAsHypermediaApp.docx , MS Word doc... :( )
>
> I think the best course of action for putting links to resources in Dublin Core XML (should they be in dct:references or in dc:relation???) is to make the dc element content a JSON blob that provides a machine-actionable description of the link
>
>
> Here's a few examples:
> Here's some examples of what the content of a dct:references (or dc:relation) element would look like if a JSON representation of this link representation was used as the encoding for a machine actionable link:
>
> { "link":"http://mirador.gsfc.nasa.gov/mirador-plugin-search.xml",
> "rel":"documentation",
> "title":"OpenSearch description for accessing this collection",
> "type":"application/opensearchdescription+xml",
> "profile":"http://commons.esipfed.org/ns/discovery/1.2/collectionCast#",
> "description":"point to a search service from a ESIP collection cast entry" }
>
> { "link":"http://kgs.uky.edu/arcgis/services/aasggeothermal/ARSeismicHypocenters/MapServer/WFSServer?request=GetFeature&service=WFS&TypeName=Hypocenter&MaxFeatures=10",
> "rel":"download",
> "title":"Example WFS getFeature request for NGDS seismic event hypocenters",
> "type":"application/gml+xml",
> "overlayAPI":"OGC:WFS",
> "profile":"http://stategeothermaldata.org/uri-gin/aasg/xmlschema/hypocenter/1.7",
> "parameter":"typeName=Hypocenter" }
>
> { "link":"http://kgs.uky.edu/arcgis/services/aasggeothermal/ARSeismicHypocenters/MapServer/WFSServer?request=GetCapabilities",
> "rel":"documentation",
> "title":"Get the capabilities document for seismic event hypocenter WFS service",
> "type":"application/xml",
> "overlayAPI":"OGC:WFS" }
>
> /* OpenSearch link in ESIP discovery/data cast
>
> { "link":"http:// http://mirador.gsfc.nasa.gov/cgi-bin/mirador/collectionlist.pl?keyword={searchTerms}",
> "rel":"search",
> "title":"Search template for Mirador keyword search",
> "type":"application/atom+xml",
> "template":"http://a9.com/-/spec/opensearch/1.1",
> "profile":"http://commons.esipfed.org/ns/discovery/1.2/collectionCast#",
> "description":"Search service for a collection cast entry" }
>
> Stephen M Richard
> Arizona Geological Survey
> 416 W. Congress #100
> Tucson, AZ
> AZGS: 520-770-3500
> Office: 520-209-4127
> FAX: 520-770-3505
>
> > -----Original Message-----
> > From: opengeoportal-request at elist.tufts.edu [mailto:opengeoportal-
> > request at elist.tufts.edu] On Behalf Of Darren Hardy
> > Sent: Thursday, March 13, 2014 1:43 PM
> > To: opengeoportal at elist.tufts.edu
> > Subject: Re: GeoBlacklight Solr schema
> >
> > Hi Chris,
> >
> > Thanks so much for the great feedback. I've attempted to address your feedback
> > below...
> >
> > On Mar 13, 2014, at 11:38 AM, Barnett, Christopher S
> > <Christopher.Barnett at tufts.edu> wrote:
> >
> > > Hi Kim and Darren,
> > >
> > > I think this looks great! In particular I'm happy to see the use of Dublin Core,
> > as it seems like a good fit for the terms we're hoping to search. It also has good
> > mappings with CSW core queryables, which might become important. Building
> > on an existing standard also has the potential to aid in compatibility with other
> > similar projects.
> > >
> > > I have a few specific questions, most of which have to do with the layer
> > specific portions. Some of these are questions that I've also been asking myself
> > about the current OGP schema, as I've thought about ways in which it may be
> > improved.
> > >
> > >> dc_relation_url: URL to related item: Multiple values allowed. Example:
> > >> "http://purl.stanford.edu/vr593vj7147"
> > >
> > > In the example you've given, the url is to the (ISO) metadata for the data
> > object? Is this common convention in Dublin Core? I'm wondering if there is
> > room in the schema for semantics to define what type of object is being linked
> > to.
> > >
> >
> > Kim elaborated on this point in an earlier email. The semantics of DC.Relation is
> > quite vague unfortunately. I like the idea of looking at the INSPIRE profile for
> > discovery. Perhaps they have a solution for links to additional resources.
> >
> >
> > >
> > >
> > >> layer_geom: Shape of the layer as a Point, LineString, or Polygon WKT.
> > >> Example: "POLYGON((76.76 19.91705, 84.76618 19.91705, 84.76618
> > 12.62309, 76.76 12.62309, 76.76 19.91705))"
> > >
> > > Is this the actual geometry for the layer? A generalized representation? If the
> > former, how are you handling layers with millions of multipart features? If the
> > latter, how are you generalizing?
> > >
> > >
> >
> > My example is just the bounding box. If you wanted to represent the actual
> > geometry, you could generalize it at indexing time, or compute a convex hull.
> > This field is more experimental in my evaluation document. When you use Solr4
> > + JTS you can represent (more or less) arbirary geometry WKTs and issue spatial
> > predicates on them. Solr 4.7, however, upgraded to Spatial4J 0.4 which has
> > some initial support for geometry WKTs, so this avenue may be promising as far
> > as the core Solr implementation goes.
> >
> >
> > >
> > >> layer_id_s. The complete identifier for the WMS/WFS/WCS layer.
> > >> Example: "druid:vr593vj7147",
> > >
> > > This makes the assumption that the layer name will be the same for all OGC
> > services (as OGP does). It's probably a reasonable assumption, but worth
> > thinking about. I'm glad to see WorkspaceName go away.
> > >
> >
> > Yes, we're assuming the layer name is the same across all WxS services.
> >
> >
> > >
> > >> layer_srs_s: The spatial reference system for the layer. Example: EPSG:4326.
> > >
> > > Do you get this from the web service or use a library to translate WKT from the
> > metadata to EPSG, or something else? I wouldn't be surprised to hear that ISO
> > has a place to put EPSG codes, since ISO can represent virtually anything, but
> > I've also seen a lot of ISO metadata that does not have this info. Also, is this the
> > native EPSG for the original object or the web service? The web service of
> > course, can have many available projections. The end user may want to know
> > about the layer's projection, but the front end also needs to know what
> > projections are available for display. I've been dismayed to find services that
> > don't support web mercator or 4326.
> > >
> >
> > We've found this to be a nasty problem. In our "data wrangling" phase, we
> > manually normalize the data into a 4326 projection, so in our case, the SRS for
> > the web services is always 4326. If they want the original projection, we have
> > another page outside the scope of the discovery service where they can
> > download that. But overall, my assumption is that layer_srs_s is the projection
> > used by the web service.
> >
> > >> layer_geom_type_s. Valid values are: "Point", "Line", "Polygon", and
> > "Raster".
> > >
> > > Is there room in the schema for other "data types"? Something like "Scanned
> > Map" is easy enough (maybe not... how do you differentiate between a
> > georeferenced and ungeoreferenced map? Are these distinct data types?), since
> > it could be classified as a geometry type. What about documents or data sets
> > with clear geospatial extents, but nothing that could rightly be called
> > "geometry"? A lot of folks in the scientific community use what we would call
> > geospatial metadata to document such things. You are also likely to run into
> > metadata that won't specify more than "raster" or "vector".
> > >
> >
> > I was hoping to use the OGC simple feature types for these geometry types. The
> > extensions to GML provide for all sorts of coverage types which might be
> > suitable to handle a non-georectified scanned map. We do, however, want to
> > use a controlled vocabulary for the geometry type.
> >
> > Part of our initial assumption for the discovery service was that if the record
> > does not have a WMS service, then it's not cataloged in the discovery service.
> > We're revisiting that to require only a bounding box and not both a bounding
> > box and WMS service.
> >
> > For the "vector" data sets, we look at the actual data to determine the
> > geometry type during our "data wrangling" phase.
> >
> >
> > >
> > >> layer_wcs_url: Service root for the WCS service that holds this layer. If
> > applicable. Example:
> > >> "http://geowebservices-restricted.stanford.edu/geoserver/wcs"
> > >> layer_wfs_url: Service root for the WFS service that holds this layer. If
> > applicable. Example:
> > >> "http://geowebservices-restricted.stanford.edu/geoserver/wfs"
> > >> layer_wms_url: Service root for the WMS service that holds this layer
> > "http://geowebservices-restricted.stanford.edu/geoserver/wms"
> > >
> > > If there are non-ogc services (ArcGIS Server REST services, HGL's Open Delivery
> > for scanned maps, Berkeley's service for ungeoreferenced maps, etc.), links to
> > zip files, browse graphics, or other resources would those be represented here
> > (with additional elements like "layer_arcgisrest_url" ) or in the repeatable
> > "dc_relation_url" element? One could also imagine a multipart/multilayer object
> > more properly represented by an OGC WMC or (upcoming) OWS Context. I
> > wonder if there is a more generic way to define service url as a schema element
> > that would at least pair a url with a descriptor. It may be that there is not a
> > good way to do this in Solr and "layer_${service_type}_url" is the best current
> > approach. Is there a way of crafting a Solr query that would return all
> > "layer_${service_type}_url" fields, but not, say "layer_geom"?
> > >
> >
> > As Kim mentioned, we're revisiting this issue and I'm hoping to use DC.Relation
> > plus a verb to manage these links. A preview image, for example. I'm not sure
> > about using other non-WxS services for our web map -- we haven't thought
> > about that frankly.
> >
> >
> > > Two last non-field specific questions:
> > >
> > > CSW supports "anytext" queries. It seems like that would require an indexed
> > field in Solr with the entire text of the metadata record (ISO or FGDC, etc.),
> > minus the xml entities. it's not something that tools like OGP or GeoBlacklight
> > must support as a matter of course, but I'm interested in thinking through what
> > possibilities/problems might emerge. I'm just curious if this came up in
> > discussions, and if so, what were the key decision points?
> >
> > I'm not familiar with the semantics of the CSW anytext queries, but what we do
> > is copy various fields into our generic "text" field -- akin to an any text query
> > perhaps. We copy about a dozen fields into the text field. If you look at the
> > bottom of the Solr schema.xml you will see the specific copyField directives.
> >
> > https://github.com/sul-dlss/geohydra/blob/master/solr/kurma-app-
> > test/conf/schema.xml
> >
> >
> > >
> > > Mike Graves has talked some in the past about separating the OGP schema
> > from its Solr implementation. Think ISO 19115-1 vs. its implementation in ISO
> > 19139; at least that's my read on what I've heard Mike say. Did you guys have
> > any thoughts about having a more conceptual schema that's an information
> > model of sorts vs. concrete implementation as a Solr XML schema? I don't
> > anticipate OGP moving away from Solr anytime soon, but there may be other
> > potential partners using different search technologies.
> > >
> >
> > This was part of my thinking behind using Dublin Core as they do have a
> > conceptual schema, plus they have various extensions and profiles, etc. Perhaps
> > that INSPIRE discovery profile might have some of this information model-level
> > work. But it would be a great asset to have a conceptual model for what
> > information is required for geospatial discovery services.
> >
> > Thanks,
> > -Darren
> >
> >
> > --
> > Darren Hardy, Ph.D.
> > GIS Software Engineer
> > Digital Library Systems & Services
> > Stanford University
> > drh at stanford.edu
> > www.stanford.edu/~drh
> >
> >
> > > Thanks for this! There are a lot of great things here. I understand that there
> > are many intractable problems that won't be solved in one iteration, or many! I
> > look forward to exploring the spatial search components in more detail.
> > >
> > > Just as a side-note, I was at a meeting last week for an NSF initiative with
> > some ISO luminaries and there was a lot of talk about discovery profiles for ISO.
> > Unfortunately, I can't contribute much more than to say that it's a thing that
> > exists or may/will exist. Anyone on the list know anything about ISO discovery
> > profiles?
> > >
> > > Chris
> > >
> > > --
> > > Christopher Barnett
> > > Geospatial Analyst, Research & Geospatial Technology Services Tufts
> > > Technology Services (TTS)
> > > 16 Dearborn Rd.
> > > Somerville, MA 02144
> > > http://gis.tufts.edu
> > >
> > >
> > >
> > >
> > >
> > > On Mar 12, 2014, at 3:05 PM, Kimberly A Durante <kdurante at stanford.edu>
> > wrote:
> > >
> > >> Hello OGP,
> > >>
> > >> Stanford Libraries is in the process of developing GeoBlacklight- a geospatial
> > data discovery application, and a plugin to Blacklight.
> > >>
> > >> As part of this development our GIS developer, Darren Hardy, has created a
> > Solr metadata schema which we would like to share with the larger OGP
> > community in the hopes of gathering feeedback on the schema design and the
> > proposed elements.
> > >> The schema is based on elements from Dublin Core for the descriptive
> > >> metadata and also contains a set of layer-specific fields (prefixed
> > >> with 'layer_'.)
> > >>
> > >> The current version of the schema can be found here:
> > >> http://goo.gl/UTIzRl
> > >>
> > >> Comments regarding the proposed GeoBlacklight schema are welcome from
> > members of entire OGP community. We are interested in any feedback or insight
> > into the elements, their definitions, as well as the possibilities for future use by
> > other institutions. Please feel free to review the schema and add your comments
> > to directly to the Google doc using the comment feature.
> > >> You can also send comments to this list, or directly to us us by email
> > (kdurante at stanford.edu, drh at stanford.edu).
> > >>
> > >> In order to promote metadata exchange, this schema is intended to
> > crosswalk with the OGP schema. We are also developing an FGDC to MODS
> > crosswalk to manage the source metadata transformation.
> > >>
> > >> Any comments or feedback are appreciated. Thanks for your time.
> > >>
> > >> Kim Durante
> > >> Metadata Librarian for Geographic and Scientific Data Stanford
> > >> University Libraries
> > >> 650.724.5686
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
>
>
>
More information about the cat-interop
mailing list