Data Management Use Cases ========================= Curation Use Cases ------------------ 1. submit brand new content - Tier 3 nodes: users use create method - Tier 1 & 2 nodes: use native api - outcome: with content available via get, sysmeta with no obsoletes field, unique identifier 2. submit a data / metadata correction or change - Tier 3 nodes: users use update method - Tier 1 & 2 nodes: use native api - corrected content available via get with new identifier, - sysmeta of new item has obsoletes field populated with identifier of previous - sysmeta of previous item has obsoletedBy field populated with identifier of new item, and archived field set to true - if a data object, an update of associated metadata under data-owner's control may be needed (especially if it contains identifier of the previous) - if named in resource maps under the owner's control, updates for these should also be submitted. 3. correcting resource map relationships - update the resource map to apply the correction - same outcome and recommendations as 2. - impact - the relationships in the obsoleted resource map are removed from the relevant records in CN solr search 4. retracting content - archive the object - the sysmeta should have the isArchived field set to true. - ******** should access policy limit access? 5. limiting discovery (fewer users have access than before) - search results are subject to access policies. Don't archive unless it is a retraction (see 4. above) - Set the access policy appropriately before submitting, or call CN.setAccessPolicy if the object is already on DataONE. 6. temporary retraction of content until corrections can be applied - archive cannot be reversed. ***** make sure that update can be applied to archived object ***** review this with other coredev'rs 7. reformatting existing content: eg. converting excel workbook to 5 csv files - DataONE currently doesn't support "update" discovery when there is not a one-to-one replacement of content, because archiving of obsolete objects takes away the means to navigate from DATA / METADATA component to RESOURCEMAP - if the cardinality of obsoletes was not restricted to 1:1, finding the current data set via obsoletes chains could be done through the reorganization event. ******* why is it 1:1? ******* the following would work if archive events did not interfere with documents / isDocumentedBy relationships for out-of-date objects: - ensure that the original is part of a data package (is part of a resource map that you own) - modify the resource map to point to the new files. (submit as update) - (DO NOT archive the original file, or else the user can't get to resource map, and then find the resource map's update.) - submit the new files with create (optionally submit one as update of the original file, so it is marked as obsolete) 8. providing alternate, concurrent formats for the same content - while both formats could be current versions, (not obsoleted & not archived), DataONE does not model a relationship that would connect the two as related content. 9. adding new attributes to data records - this action does not add more data records to analyze, so is probably best accomplished via an update to the original. - on the other hand, if a "primary key" is available in the original, a creating a secondary dataset using this key and the additional attributes is equally valid, would spare existing users a refresh, and take up less storage space. 10. adding new records to data set - context dependent, related to what a user might be expecting based on past experience or previous metadata: - a new object is preferable to appending to an existing one if it expands the extent of the dataset. ******* what about a 2013-observations-YTD file (where 2013 is the current year, and the metadata describes the extent as jan-1-2013 to dec-31-2013? - if filling in missing data, update is more appropriate 11. resurrecting retracted content - not possible through dataONE api - ********** would need to apply an update as a workaround, even though it might be identical to the thing is obsoletes. - ********** what about Tier 1 and Tier 2 nodes? Can they remove the archived field in the system metadata? 12. restricting searches to data formats you can work with: - **** dataFormat:foo OR dataFormat:bar OR dataFormat:baz ? Obsoletes Chains and Changes to Access Policy --------------------------------------------- When fewer users are granted read access on an object than previous versions, those users left out will not be able to follow an obsoletes chain to its logical end. This is acceptable, since they will be fairly notified of through receipt of a NotAuthorized exception. The converse is true when access is widened and the user is trying to get to the original. Data retrieval Use Cases ------------------------ 1. checking for content updates - from existing identifiers, follow obsoletesChain to latest. - check for archived flag of the latest (if all obsoleted content is also archived, then content reorg. is not supported and so the next steps are not going to work) - determine if the latest version is part of a package that is also the latest, otherwise follow its obsoletesChain to it's end, and use its package components (this indicates that the data was reorganized at some point) - caution, when going from data/metadata id to resourceMap, you might find more than one. In this case, you need to choose the one owned by the data Id owner to determine the package that contains the reorganized data. 2. canned-queries: using a search result as the starting point for analyses - users need to cache the identifiers returned to ensure reproducibility - search results are not guaranteed to be consistent over time, due to archiving events and changes in access policies. 3. retrieving data package from package component id - cn/v1/solr/?q=id:pid&fl=resourceMap,id gets the resourceMap(s) - cn/v1/solr/?q=resourceMap:rmPid&fl=id 4. finding the latest representation of content that has been reorganized - (not supported) 5. making sure the latest data from obsoletes chain is the latest version (that the author has not reorganized the content) - (subquestion of 4., so not applicable) 6. getting to the reorganized content when the starting data object is several resource map chains. - if obsoleted implies archived, then all resource maps returned for an object will be current, and shouldn't have to make anymore traversals. 7. identifying the data owner's data package among others. - align rightsHolder / submitter / WRITErs. 8. Q. so what does it mean if an object is archived? with obsoletedBy implying archived: - archived without obsoletedBy means retracted - archived plus obsoletedBy means replaced / updated 9. points: - archiving an object removes the ability to traverse documents/isDocBy and resourceMap relationships - solr searches using identifier: and not returning any records does not indicate that the object doesn't exist. It could be deleted, or archived. The user should then pull the sysmeta Impact of archive on documents / isDocumentedBy relationships ------------------------------------------------------------- If the resourceMap is archived, do the documents, isDocumentedBy, resourceMap fields get removed from the Solr index? Probably not. Should they? probably not. Using CN solr index for traversing obsoletes chains ---------------------------------------------------- - not reliable in the face of archiving - without archiving, is it reliable? (is the 1:1 cardinality retained in the solr record, or is it denormalized, too?) - what's the impact of delete? - removes systemMetadata, so need to splice any obsoletes/ obsoletedBy relationships if we will be relying on them.