Member Node Service Registration

Overview

Service Registration (SR) is a process whereby services offered by Member Nodes and other reliable resources can be registered with DataONE and so may be discovered through DataONE search mechanisms and utilized by DataONE Investigator Tools and other clients to perform actions on or provide access to data.

The fundamental issue being addressed is the discovery of services offered by Member Nodes. There is currently no consistent, convenient, universally available mechanism to discover services that are offered by data repositories to assist users with access to data. Services are available for data subsetting and slicing, merging, analysis, visualization, summarization, and so on. DataONE will enable any such services to be registered and discoverable through a search interface, and in some cases incorporated in the the functionality data search interfaces to offer online processing of the data by a registered service.

A general goal of Service Registration is to assist clients in discovery of services exposed by Member Nodes that may be applicable to a particular type of object. The process of Service Registration involves the creation of a Service Description document and registering the document with DataONE through the normal content synchronization process described in DataONE Use Case 06.

Scenarios

Scenarios provide a high level outline of a range of activities anticipated to be supported by the functionality of the system.

Note

Use cases are labelled with the convention S-XXnn, where XX = “SR” for Service Registration and nn = the ID of the scenario.

Scenario S-SR01: Retrieve Subset of Records

A user discovers a has a PID for a large dataset that is structured as a single table with many records. The user would like to retrieve a specific record from the dataset that is identified by an identifier that is local to the data object.

For example, the entire data object might have an identifier “large_table-001.csv”, and they know (perhaps from colleagues or other source) that the records identified by “118422” is of interest.

The user performs a search on DataONE for “tabular record extraction” services and selects from the results a service co-located with the Member Node that holds the data.

Using a client tool, the user provides the data object PID, the record identifier, and the service description PID. The client tool retrieves the connection information from the service description document and invokes the service using the data PID and the record identifier. The expected record is returned to the client tool and used for further analysis.

Scenario S-SR02: Spatial Subset

A user has a PID for a data object. Using a DataONE service aware client tool, the user discovers that the object is a fairly large (system metadata) set of imagery (system metadata, science metadata) that provides high resolution measurements on ground surface reflectance in the range of 400 to 700nm (science metadata). The science metadata describing the imagery indicates a very broad spatial coverage and the the file is too large to justify downloading for the small area of terrain being worked on.

The client tool indicates to the user that a Member Node holding a copy of the data is known to be running a service (registered service) that will allow extraction of a spatial window from the large data object (the OGC Web Coverage Service, indicated by the service format identifier as recorded in the Service Description document) and also provides the URL of the Service Endpoint for the service.

The user utilizes an OGC-WCS client application to connect with the service operating on the Member Node and retrieves the desired window from the coverage identified by the PID.

Scenario S-SR03: Transform Temperature Series

A user finds a data set with semantic description indicating temperature data of a measurement suitable for their analysis except in the wrong units. Although a trivial conversion, a service at the Member Node is advertised capability to transform the measurements to the expected units. Such capability is described in the service description document, which includes sufficient semantic information about the input and output ports as well as the acceptable input data formats to determine that this service will operate correctly for the dataset of interet.

Scenario S-SR04: Render Spatial Visualization

A user discovers a GeoTIFF layer using the ONEMercury search interface. Included with the search results isa list of services that may be utilized to process the GeoTIFF. One of these services is an Open Geospatial Consortium Web Mapping Service (OGC-WMS). The user opens up their QGIS application and adds the WMS service endpoint to the QGIS-WMS plugin configuration, and adds the identifier of the GeoTIFF data set as a parmaeter for identifying the layer to process.

The WMS service recognizes the layer as being one of the datasets that it is able to process, and returns a rendering of the content as specified by the extent and style parameters specified by the QGIS WMS client.

Note that it would also be possible to develop a QGIS plugin that given the identifier of a service and data obect, could manage the background configuration for content retrieval.

Scenario S-SRS05: Describe Dataset Provenance

A researcher uses data that was transformed by a service registered in DataONE (:ref:S-SR04). The data has an associated provenance trace that shows the data resulted from a transformation of another dataset held in DataONE, and that a registered service was used to perform the transformation.

The provenance trace records this infomration using identifiers for each entity (original data, transformed data, and the transformation service) and so is able to unambiguously define the origin of the content and provide appropriate attribution as necessary.

Scenario S-SR06: Discover Relevant Services

Using the DataONE search interfaces, a researcher discovers a number of data sets that spatially overlap the area of interest for their study.

The researcher checks the “show available services” checkbox, and the search interface renders information about services available for each of the search results.

The researcher then selects a “spatial data extraction” facet to constrain the types of services being displayed.

Use Cases

actor "MN Administrator" as admin
actor "User" as user

package "DataONE" {
  actor "Coordinating Node" as CN
  actor "Member Node" as MN

usecase "UC SR01. Register Service" as SR01
usecase "UC SR02. Unregister Service" as SR02
usecase "UC SR03. Update Service Registration" as SR03
usecase "UC SR04. Discover Service" as SR04
usecase "UC SR05. Use Service" as SR05

admin -- SR01
admin -- SR02
admin -- SR03
admin -- SR04
user -- SR04
user -- SR05

MN -- SR01
MN -- SR02
MN -- SR03

CN -- SR04
}

Five functional uses cases are decribed for registration and use ofservices within DataONE.

Note

Use cases are labelled with the convention UC-XXnn, where XX = “SR” for Service Registration and nn = the ID of the use case.

UC-SR01: Register Service

Goal
Register a service with DataONE such that it can be discovered by users.
Summary

A service description document is added to a Member Node using the MNStorage.create() method. The service is registered once the content is synchronized by the Coordinating Nodes. Note that this process is identical to the existing create and synchronize use cases defined for the DataONE infrastructure. Note that registration includes indexing which is considered part of the synchronization process.

Note

It is recommended that a Member Node verify the service description including the availability of the service described therein.

Actors
  • Client creating the service registration document and adding the content to a Member Node.
  • Member Node to which the content is added
  • Coordinating Node(s) that synchronize the content
Preconditions
  • Operational Member Node
  • Operational Coordinating Nodes
  • Member Node registered with DataONE
Triggers
Post Conditions
  • The service is registered with DataONE and available for discrovery through the search services.

UC-SR02: Unregister Service

Goal
Prevent a service from being discovered, effectively unregistering the service from DataONE.
Summary

A service may become unavailable or decomissioned, and so it should be “unregistered” from DataONE to prevent or discourage utilization. The historical record of the service should remain available as this information may be referenced by content within or outsode of DataONE.

The process for “unregistering” a service is the same as “archiving” content. This prevents the service from being discovered through search services though retains the record.

It may be desireable to obsolete the service description document with another service description that indicates the service is no longer available, though retains metadata describing the service capabilities when they were available.

Actors
  • Client with authority to modify the service description
  • Member Node to accept the updated information
  • Coordinating Node to record the updated information
Preconditions
  • A service has been registered
  • Operational DataONE infrastructure
Triggers
  • A client wishes to deprecate or unregister a service
Post Conditions
  • The service is no longer discoverable through search interfaces
  • The service description remains available through is marked as archived
  • Recommended, the service description document is obsoleted with an updated service description indicating a previously availabel service is no longer available.

UC-SR03: Update Service Registration

Goal
Record changes in the operation, location, capabilities, access control or other characteristics of a registered service.
Summary

There are two types of change that require updates: changes to low level information about the service such as access control or ownership that can be reflected in system metadata; and changes to the service itself that must be reflected in the service description document. Once the changes are recorded by the Member Node, the changes will be picked up by the Coordinating Nodes through the synchronization process and propogated to any replicas.

The first case is covered by changes to system metadata (e.g. Manage Access Policies, Transfer Object Ownership) which can be performed by an authorized client interacting with the authoritative Member Node.

The second case is covered by Use Case 05 “update” in that the process of updating content in DataONE is to obsolete the current object with a new one.

Actors
  • Authorized client
  • Member Node
  • Authentication system
  • Coordinating Node(s)
Preconditions
  • A service has been registered
  • Operational DataONE infrastructure
Triggers
  • A client wishes to alter properties of the service description or update the service description document
Post Conditions
  • The service description information is updated and by default searches will return the current document.
  • If the service description document was altered, then the previous version remains available.

UC-SR04: Discover Service

Goal
Facilitate discovery of a service by a user or their agent.
Summary
Discovery in DataONE is performed through the search APIs directly with the API or through a client tool such as ONEMercury that leverages the search API.
Actors
  • Client
  • Coordinating Node
Preconditions
  • Functional DataONE environment
  • Indexing system
  • Registered service
Triggers
  • A client performs a search for services through the API or a tool that leverages the API.
Post Conditions
  • A client has search results that includes zero or more references to service description documents.

UC-SR05: Use Service

Goal
A client would like to utilize a service that has been registered with DataONE.

Summary

Following discovery of a service (either through DataONE or by other means), a client would like to utilize the service to perform some action. The service description document provides all the information necessary (directly within the document or by reference to other resoources such as a published specification or introspection capability offered by the service) for a client to determine how they may interact with the service.

The actual service interface definitions and modes of interaction are not defined by DataONE and hence are out of scope of this use case.

The process then from DataONE’s perspective for a client to utilize a service is to (1) discover the service; (2) retrieve and parse the service description document; and (3) interact with the service as described in the service description.

Actors
  • Client
  • A registered service
Preconditions
  • The service has been registered
  • A client has access to the service description document
  • The client is able to parse and utilize the infomration in the service description
Triggers
  • A client wishes to use a service
Post Conditions
  • A client has sufficient information to interact with the service
  • The client may of interacted with the service

Implementation

A service description document is a type of metadata document. It is treated much the same as a science metadata document in the DataONE infrastructure. The document has a persistent unique identifier and associated system metadata, it is replicated to a Coordinating Node during the synchronization process, it is indexed, it appears in search results.

To be useful to a client, the service description should minimally include the service endpoint and the service type. If the service does not support introspection then the service description document should contain a description of the service interfaces or a reference to such documents and details on any peculiarities or departures from the standard by the registered service.

The interface definitions implemented by a particular service are outside of the scope of DataONE specifications, though it is expected that service implementations should, where possible, follow standards and design already implemented by others in the community.

Services will take as input zero or more data objects and will output one or more objects that may or may not be data (e.g. an image generated by a WMS service). The inputs and outputs of a service take place through ports.

In order to identify what types of data a service can operate on, it is necessary to associate the Input Port of the Service with a format type. Note that a given port can potentially accept many format types.

Similarly, it is necessary to associate object format types with the Service Output Ports.

In order to facilitate discovery by general search, human readable metadata such as a title and description should be included in the service description document.

Property Description
Description A human readable description of the service to assist discovery and to evaluate applicability.
Title A brief, human readable descriptive title for the service.
Service Type A controlled vocabulary of service types. Extends the existing DataONE list of format types.
Service Endpoint A URL that indicates how to access the service.
Service Input Port Aspect of the service that accepts a digital entity.
Service Ouput Port Aspect of the service that provides a digital entity resulting from operation of the service.

There are many metadata standards that support at least some notion of a description of software or programs that may be applied to services as defined in the context of this document. A selecton of those are listed:

ISO-19119 provides a framework for developers to create software that enables users to access and process geographic data from a variety of sources across a generic computing interface within an open information technology environment.

The EML software module contains general infomration that describes software resources. It is intended to “fully document software that is needed in order to view a resource (such as a dataset) or to process a dataset.”

codemeta is an emerging standard that defines a minimal metadata schema for science software and code.

OGC-cat specifies the interfaces, binding, and a framework for application profiles required to publish and access digital catalogues of metadata for geospatial data, services, and related resource information.

WSDL is an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. It can be used to describe many types of web services, though is most commonly associated with SOAP services. WSDL can also be used to describe simple HTTP POST or GET operations, and so can describe REST type services.

Like WSDL, RSDL is an XML description of web bases applications, though is more directly targetted at REST type services.

Notes and Considerations

Will it be necessary to define a new object type for service descriptions?

Currently, three object types are defined in DataONE: DATA, METADATA, and RESOURCE. These categories offer convenient groupings that indicate to processing agents how the content should be handled. For example, content categorized as “DATA” is not replicated to Coordinating Nodes during synchronization.

The advantage of a new format type (e.g. “SERVICE”) is that processing may easily determine how a particular entity should be handled without resorting to looking up the specific formatID.

The disadvantage, besides the necessary re-engineering, is the potential ambiguity between service metadata and science metadata.

The services will be identified by some identifier that is uniquely associated with that service type. Included with the entry will be connection information so that a client application of the service type is able to connect and perform the necessary operations.

Connection to, and interaction with the advertised service is not defined by DataONE APIs. Only the availability of services that may be applicable to different object format types is advertised.

Note that there may be many services that can operate on a particular data type, and any particular service may support multiple types of content.

The association between service types and object format types will need to be maintained on a service by service basis, since particular service implementations may support different types of content. For example, one instance of OGC-WCS (Web Coverage Service) may support GeoTIFF imagery only, while another may support multiple raster formats.

A typical imagined user interaction scenario is described in MNSR-S01. Similar scenarios can be constructed for other combinations of data and service types. For example: a CSV subsetting service might return a selection of rows from a CSV file; a rendering service might provide a HTML rendering of different types of content (such as metadata, resource maps, and data objects).

Services advertised will likely be mostly third-party implemenations, though it is conceivable that this mechanism for advertising the availability of services might also be applied to advertising the availability of DataONE defined services that may be optional for Member Node implementations.