Contents
Service Registration (SR) is a process whereby services offered by Member Nodes and other reliable resources can be registered with DataONE and so may be discovered through DataONE search mechanisms and utilized by DataONE Investigator Tools and other clients to perform actions on or provide access to data.
The fundamental issue being addressed is the discovery of services offered by Member Nodes. There is currently no consistent, convenient, universally available mechanism to discover services that are offered by data repositories to assist users with access to data. Services are available for data subsetting and slicing, merging, analysis, visualization, summarization, and so on. DataONE will enable any such services to be registered and discoverable through a search interface, and in some cases incorporated in the the functionality data search interfaces to offer online processing of the data by a registered service.
A general goal of Service Registration is to assist clients in discovery of services exposed by Member Nodes that may be applicable to a particular type of object. The process of Service Registration involves the creation of a Service Description document and registering the document with DataONE through the normal content synchronization process described in DataONE Use Case 06.
Scenarios provide a high level outline of a range of activities anticipated to be supported by the functionality of the system.
Note
Use cases are labelled with the convention S-XXnn, where XX = “SR” for Service Registration and nn = the ID of the scenario.
A user discovers a has a PID for a large dataset that is structured as a single table with many records. The user would like to retrieve a specific record from the dataset that is identified by an identifier that is local to the data object.
For example, the entire data object might have an identifier “large_table-001.csv”, and they know (perhaps from colleagues or other source) that the records identified by “118422” is of interest.
The user performs a search on DataONE for “tabular record extraction” services and selects from the results a service co-located with the Member Node that holds the data.
Using a client tool, the user provides the data object PID, the record identifier, and the service description PID. The client tool retrieves the connection information from the service description document and invokes the service using the data PID and the record identifier. The expected record is returned to the client tool and used for further analysis.
A user has a PID for a data object. Using a DataONE service aware client tool, the user discovers that the object is a fairly large (system metadata) set of imagery (system metadata, science metadata) that provides high resolution measurements on ground surface reflectance in the range of 400 to 700nm (science metadata). The science metadata describing the imagery indicates a very broad spatial coverage and the the file is too large to justify downloading for the small area of terrain being worked on.
The client tool indicates to the user that a Member Node holding a copy of the data is known to be running a service (registered service) that will allow extraction of a spatial window from the large data object (the OGC Web Coverage Service, indicated by the service format identifier as recorded in the Service Description document) and also provides the URL of the Service Endpoint for the service.
The user utilizes an OGC-WCS client application to connect with the service operating on the Member Node and retrieves the desired window from the coverage identified by the PID.
A user finds a data set with semantic description indicating temperature data of a measurement suitable for their analysis except in the wrong units. Although a trivial conversion, a service at the Member Node is advertised capability to transform the measurements to the expected units. Such capability is described in the service description document, which includes sufficient semantic information about the input and output ports as well as the acceptable input data formats to determine that this service will operate correctly for the dataset of interet.
A user discovers a GeoTIFF layer using the ONEMercury search interface. Included with the search results isa list of services that may be utilized to process the GeoTIFF. One of these services is an Open Geospatial Consortium Web Mapping Service (OGC-WMS). The user opens up their QGIS application and adds the WMS service endpoint to the QGIS-WMS plugin configuration, and adds the identifier of the GeoTIFF data set as a parmaeter for identifying the layer to process.
The WMS service recognizes the layer as being one of the datasets that it is able to process, and returns a rendering of the content as specified by the extent and style parameters specified by the QGIS WMS client.
Note that it would also be possible to develop a QGIS plugin that given the identifier of a service and data obect, could manage the background configuration for content retrieval.
A researcher uses data that was transformed by a service registered in DataONE (:ref:S-SR04). The data has an associated provenance trace that shows the data resulted from a transformation of another dataset held in DataONE, and that a registered service was used to perform the transformation.
The provenance trace records this infomration using identifiers for each entity (original data, transformed data, and the transformation service) and so is able to unambiguously define the origin of the content and provide appropriate attribution as necessary.
Using the DataONE search interfaces, a researcher discovers a number of data sets that spatially overlap the area of interest for their study.
The researcher checks the “show available services” checkbox, and the search interface renders information about services available for each of the search results.
The researcher then selects a “spatial data extraction” facet to constrain the types of services being displayed.
Five functional uses cases are decribed for registration and use ofservices within DataONE.
Note
Use cases are labelled with the convention UC-XXnn, where XX = “SR” for Service Registration and nn = the ID of the use case.
A service description document is added to a Member Node using the MNStorage.create() method. The service is registered once the content is synchronized by the Coordinating Nodes. Note that this process is identical to the existing create and synchronize use cases defined for the DataONE infrastructure. Note that registration includes indexing which is considered part of the synchronization process.
Note
It is recommended that a Member Node verify the service description including the availability of the service described therein.
A service may become unavailable or decomissioned, and so it should be “unregistered” from DataONE to prevent or discourage utilization. The historical record of the service should remain available as this information may be referenced by content within or outsode of DataONE.
The process for “unregistering” a service is the same as “archiving” content. This prevents the service from being discovered through search services though retains the record.
It may be desireable to obsolete the service description document with another service description that indicates the service is no longer available, though retains metadata describing the service capabilities when they were available.
There are two types of change that require updates: changes to low level information about the service such as access control or ownership that can be reflected in system metadata; and changes to the service itself that must be reflected in the service description document. Once the changes are recorded by the Member Node, the changes will be picked up by the Coordinating Nodes through the synchronization process and propogated to any replicas.
The first case is covered by changes to system metadata (e.g. Manage Access Policies, Transfer Object Ownership) which can be performed by an authorized client interacting with the authoritative Member Node.
The second case is covered by Use Case 05 “update” in that the process of updating content in DataONE is to obsolete the current object with a new one.
Summary
Following discovery of a service (either through DataONE or by other means), a client would like to utilize the service to perform some action. The service description document provides all the information necessary (directly within the document or by reference to other resoources such as a published specification or introspection capability offered by the service) for a client to determine how they may interact with the service.
The actual service interface definitions and modes of interaction are not defined by DataONE and hence are out of scope of this use case.
The process then from DataONE’s perspective for a client to utilize a service is to (1) discover the service; (2) retrieve and parse the service description document; and (3) interact with the service as described in the service description.
A service description document is a type of metadata document. It is treated much the same as a science metadata document in the DataONE infrastructure. The document has a persistent unique identifier and associated system metadata, it is replicated to a Coordinating Node during the synchronization process, it is indexed, it appears in search results.
To be useful to a client, the service description should minimally include the service endpoint and the service type. If the service does not support introspection then the service description document should contain a description of the service interfaces or a reference to such documents and details on any peculiarities or departures from the standard by the registered service.
The interface definitions implemented by a particular service are outside of the scope of DataONE specifications, though it is expected that service implementations should, where possible, follow standards and design already implemented by others in the community.
Services will take as input zero or more data objects and will output one or more objects that may or may not be data (e.g. an image generated by a WMS service). The inputs and outputs of a service take place through ports.
In order to identify what types of data a service can operate on, it is necessary to associate the Input Port of the Service with a format type. Note that a given port can potentially accept many format types.
Similarly, it is necessary to associate object format types with the Service Output Ports.
In order to facilitate discovery by general search, human readable metadata such as a title and description should be included in the service description document.
Property | Description |
---|---|
Description | A human readable description of the service to assist discovery and to evaluate applicability. |
Title | A brief, human readable descriptive title for the service. |
Service Type | A controlled vocabulary of service types. Extends the existing DataONE list of format types. |
Service Endpoint | A URL that indicates how to access the service. |
Service Input Port | Aspect of the service that accepts a digital entity. |
Service Ouput Port | Aspect of the service that provides a digital entity resulting from operation of the service. |
There are many metadata standards that support at least some notion of a description of software or programs that may be applied to services as defined in the context of this document. A selecton of those are listed:
ISO-19119 provides a framework for developers to create software that enables users to access and process geographic data from a variety of sources across a generic computing interface within an open information technology environment.
The EML software module contains general infomration that describes software resources. It is intended to “fully document software that is needed in order to view a resource (such as a dataset) or to process a dataset.”
codemeta is an emerging standard that defines a minimal metadata schema for science software and code.
OGC-cat specifies the interfaces, binding, and a framework for application profiles required to publish and access digital catalogues of metadata for geospatial data, services, and related resource information.
WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. It can be used to describe many types of web services, though is most commonly associated with SOAP services. WSDL can also be used to describe simple HTTP POST or GET operations, and so can describe REST type services.
Like WSDL, RSDL is an XML description of web bases applications, though is more directly targetted at REST type services.
Will it be necessary to define a new object type for service descriptions?
Currently, three object types are defined in DataONE: DATA, METADATA, and RESOURCE. These categories offer convenient groupings that indicate to processing agents how the content should be handled. For example, content categorized as “DATA” is not replicated to Coordinating Nodes during synchronization.
The advantage of a new format type (e.g. “SERVICE”) is that processing may easily determine how a particular entity should be handled without resorting to looking up the specific formatID.
The disadvantage, besides the necessary re-engineering, is the potential ambiguity between service metadata and science metadata.
The services will be identified by some identifier that is uniquely associated with that service type. Included with the entry will be connection information so that a client application of the service type is able to connect and perform the necessary operations.
Connection to, and interaction with the advertised service is not defined by DataONE APIs. Only the availability of services that may be applicable to different object format types is advertised.
Note that there may be many services that can operate on a particular data type, and any particular service may support multiple types of content.
The association between service types and object format types will need to be maintained on a service by service basis, since particular service implementations may support different types of content. For example, one instance of OGC-WCS (Web Coverage Service) may support GeoTIFF imagery only, while another may support multiple raster formats.
A typical imagined user interaction scenario is described in MNSR-S01. Similar scenarios can be constructed for other combinations of data and service types. For example: a CSV subsetting service might return a selection of rows from a CSV file; a rendering service might provide a HTML rendering of different types of content (such as metadata, resource maps, and data objects).
Services advertised will likely be mostly third-party implemenations, though it is conceivable that this mechanism for advertising the availability of services might also be applied to advertising the availability of DataONE defined services that may be optional for Member Node implementations.
codemeta https://github.com/mbjones/codemeta
EML https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/index.html
ISO-19115 http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798
ISO-19119 https://www.iso.org/obp/ui/#iso:std:iso:19119:ed-1:v1:en
OGC Catalog Service http://www.opengeospatial.org/standards/cat