Version: SWORD 3.0
Last modified: 2021-09-01 09:54
See also: SWORD 3.0 Behaviours which provides a denormalised view of the specification's protocol operations, especially useful for implementers.
Technical Lead: Richard Jones, Cottage Labs
Community Lead: Neil Jefferies, University of Oxford
Funded By: NII, Jisc, EBSCO
Funder Liaisons: Masaharu Hayashi, NII; Dom Fripp, Jisc; Christopher Spalding, EBSCO
Technical Advisory Group: Adam Rehin, Adrian Stevenson, Alan Stiles, Alex Dutton, Catherine Jones, Claire Knowles, David Moles, David Wilcox, Eoghan Ó Carragáin, Erick Peirson, Gertjan Filarski, Goosyara Kovbasniy, Graham Triggs, Hideaki Takeda, Jan van Mansum, Jauco Noordzij, Jochen Schirrwagen, John Chodacki, Justin Simpson, Lars Holm Nielsen, Marisa Strong, Martin Wrigley, Masaharu Hayashi, Masud Khokhar, Mike Jackson, Morane Gruenpeter, Neil Chue Hong, Paul Walk, Peter Sefton, Ralf Claussnitzer, Ricardo Otelo Santos Saraiva Cruz, Richard Rodgers, Scott Wilson, Shannon Searle, Stephanie Taylor, Stuart Lewis, Tomasz Parkola, Vitali Peil
SWORD 3.0 is a protocol enabling clients and servers to communicate around complex digital objects, especially with regard to supporting the deposit of these objects into a service like a digital repository. Complex digital objects consist of both Metadata and File content, where the Files may be in a variety of formats, there may be many files, and some may be very large. The protocol defines semantics for creating, appending, replacing, deleting, and retrieving information about these complex resources. It also enables servers to communicate regarding the status of treatment of deposited content, such as exposing ingest workflow information.
The first major version of SWORD [SWORD 1.3] built upon the Resouce creation aspects of AtomPub [AtomPub] to enable fire-and-forget package deposit onto a server.
This approach, where the depositor has no further interaction with the server is of significant value in certain use cases, but there are others where this is insufficient. Consider, for example, that the depositor wishes to construct a digital artifact file by file over a period of time before deciding that it is time to archive it. In these cases, a higher level of interactivity between the participating systems is required, and this is the role that SWORD 2.0 [SWORD 2.0] was subsequently developed to fulfil.
As the use cases for SWORD have developed further, it became clear that the increasing size of files repositories were being asked to deal with was an issue. As a result of this, and the fact that the technological approach for SWORD 2.0 was starting to show its age, a new version, SWORD 3.0, has been developed. This is a radical departure from SWORD 2.0, eliminating ties with AtomPub, and moving to a much stricter REST+JSON approach, utilising JSON-LD for alignment with Linked Data. Its key differences to SWORD 2.0 from a functional perspective are:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
dc
for this namespace name; for example dc:title
dcterms
for the namespace name; for example dcterms:abstract
Objects, as represented by SWORD, have the following structure:
The SWORD Object is expressed as JSON via the Status Document, along with all its supporting metadata and workflow information.
Each of the three primary File categories can be identified by their rel
values, as they appear in the Status Document:
These are the HTTP headers used by SWORD, and their meanings within the context of the protocol. Where a Default Value is specified, this is what value the client or server MUST take the value to be if it is not provided explicitly in a request or response.
Header | Usage |
---|---|
Authorization | To pass any HTTP authorization headers, such as the content for basic auth |
Content-Disposition | Used to transmit information to the server which tells it the nature of the deposit, and any associated parameters |
Content-Length | Length of the content in the current payload |
Content-Type | Mimetype of the content being delivered |
Digest | Checksum for the depositing content. MUST include SHA-256, and allows for other formats such as MD5 and SHA (SHA-1) if still needed by the server. |
ETag | Object version identifier, as provided by the server on GET requests and any requests which modify the object and return. |
If-Match | Used to provider the server’s Object version identifier (ETag) for the version on which this request is intended to act. If the supplied ETag does not match, this means that the version on the server has changed since the client’s last operation, the server MUST reject the update. The client will need to retrieve the latest ETag and re-issue the request, taking into account any changes. |
In-Progress | Whether this operation is part of a larger deposit operation, and the server should expect subsequent related requests before injecting the item into any ingest workflows. Default Value: false |
Location | URI for the location where the requested or deposited content can be found |
On-Behalf-Of | Username of any user the action is being carried out on behalf of |
Packaging | URI unambiguously identifying the packaging profile Default Value: http://purl.org/net/sword/3.0/package/Binary |
Slug | Suggested identifier for the item |
Metadata-Format | URI unambiguously identifying the metadata format/schema/profile Default Value: http://purl.org/net/sword/3.0/types/Metadata |
This section lists the actual on-the-wire protocol operations that are part of SWORDv3. Actual usage of each of these operations is dependent on the action that you wish to take. See Protocol Requirements for the rules which govern how to use these Protocol Operations.
The full set of protocol operations is available as an OpenAPI definition [OpenAPI], available as JSON and YAML.
The following error responses are possible against some or all of the HTTP Requests. In each case an Error Document MUST be returned by the server with details as to the root cause of the error.
Some requests may result in redirect codes being sent to the client; the server MAY respond to any request with a suitable redirect. These are the redirect codes that are used, and what they mean:
These are the HTTP requests that are covered by the SWORD protocol.
Each request MAY be responded to by the server with a redirect code (see above). Each request MAY also generate an error; possible errors are listed for each section, please refer to the section above for details on the meanings of errors.
Retrieve the Service Document
Headers
Responses
Code | Description |
---|---|
200 | Service Document Body
|
401 | |
403 | |
404 | |
410 |
Make a new Object
Headers
Body
Content used to create new Object. This can be one of: Metadata, By-Reference, Metadata+By-Reference, Binary File, Packaged Content, Empty Body
Responses
Code | Description |
---|---|
201 | Resource created, responds with Status Document Headers
|
202 | Resource accepted for processing, responds with Status Document Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 | |
413 | |
415 |
Retrieve the Status information for the Object
Headers
Responses
Code | Description |
---|---|
200 | Status Document Headers
|
400 | |
401 | |
403 | |
404 | |
410 | |
412 |
Append data to an Object
Headers
Body
Content to be appended to the Object. This can be one of: Metadata, By-Reference, Metadata+By-Reference, Binary File, Packaged Content, Empty Body
Responses
Code | Description |
---|---|
200 | Content appended, responds with Status Document Headers
|
202 | Content accepted for append, responds with Status Document Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 | |
413 | |
415 |
Replace the Object
Headers
Body
Content to replace the Object. This can be one of: Metadata, By-Reference, Metadata+By-Reference, Binary File, Packaged Content, Empty Body
Responses
Code | Description |
---|---|
200 | Replace carried out, responds with Status Document Headers
|
202 | Replace accepted for action, responds with Status Document Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 | |
413 | |
415 |
Delete the Object
Headers
Responses
Code | Description |
---|---|
202 | Delete request accepted for processing Body
|
204 | Object Deleted Body
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 |
Retrieve the Metadata
Headers
Responses
Code | Description |
---|---|
200 | Metadata Document Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
410 | |
412 |
Replace the Metadata
Headers
Body
Content to replace the Metadata. This must be a Metadata Document.
Responses
Code | Description |
---|---|
204 | Metadata Replaced, no response body Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 | |
413 | |
415 |
Delete the metadata of an Object
Headers
Responses
Code | Description |
---|---|
202 | Delete request accepted for processing Body
|
204 | Metadata Deleted Body
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 |
Replace the FileSet
Headers
Body
Content to replace the FileSet. This can be one of: By-Reference, Binary File, Empty Body
Responses
Code | Description |
---|---|
202 | FileSet replacement accepted for processing, no response body Headers
|
204 | FileSet Replaced, no response body Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 | |
413 | |
415 |
Delete the FileSet
Headers
Responses
Code | Description |
---|---|
202 | Delete request accepted for processing Body
|
204 | FileSet Deleted Body
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 |
Retrieve an individual File
Headers
Responses
Code | Description |
---|---|
200 | Binary File Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
410 | |
412 |
Replace an individual File
Headers
Body
Content to replace the File. This can be one of: By-Reference, Binary File, Empty Body
Responses
Code | Description |
---|---|
204 | Binary File replaced, no response body Headers
|
400 | |
401 | |
403 | |
404 | |
405 | |
410 | |
412 | |
413 | |
415 |
Delete an individual File
Headers
Responses
Code | Description |
---|---|
202 | Delete request accepted for processing Body
|
204 | Binary File Deleted Body
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 |
Create a Temporary-URL for Segmented File Upload
Headers
Responses
Code | Description |
---|---|
201 | Temporary-URL created Headers
|
400 | |
401 | |
403 | |
404 | |
412 | |
413 |
Retrieve Information on a Segmented File Upload
Headers
Responses
Code | Description |
---|---|
200 | Segmented File Upload Document Body
|
400 | |
401 | |
403 | |
404 | |
410 |
Upload a File Segment
Headers
Body
Segment to be added to the Resource.
Responses
Code | Description |
---|---|
204 | Segment Received Body
|
400 | |
401 | |
403 | |
404 | |
405 | |
412 |
Abort a Segmented File Upload
Headers
Responses
Code | Description |
---|---|
202 | Delete request accepted for processing Body
|
204 | Temporary File Deleted Body
|
400 | |
401 | |
403 | |
404 |
This section describes the requirements of every kind of operation that you can do with SWORDv3. Each section in Requirement Groups identifies which Request Conditions have what requirements. To determine the requirements for a specific request, identify each block below which is relevant to your request, and this will provide the overall protocol requirements for that operation.
Converting the below into a set of requirements for a specific request is time consuming, so this has been done for you in the SWORDv3 Behaviours Document. If you are implementing a SWORD client or server it is STRONGLY RECOMMENDED that you work from that document rather than the normalised requirements below.
There are 3 key aspects of the Request Conditions where requirements can be applied, and these are:
When combined for a specific request, these aspects tell you the exact requirements. For example: Creating (Request) a new Object by request to the Service-URL (Resource) with Packaged Content (Content)
Each of these aspects of the Request Conditions are presented below according to a hierarchy. For a specific aspect, you must import the requirements for it and all its parents in the hierarchy, to obtain all the requierements for the request.
For each Request Condition, up to 4 kinds of requirement are present:
See the document SWORDv3 Behaviours to see each of the behaviours SWORDv3 is capable of with its requirements fully expanded.
The hierarchy for the Request is:
The hierarchy for the Content is:
The hierarchy for the Resource is:
So, for example, when considering an Request Condition such as "Creating Objects with Packaged Content", this would be take requirements as follows:
Request Conditions:
Request Requirements
Authorization
and On-Behalf-Of
headers (i.e. if authenticating this request)Server Requirements
Authorization
(and optionally On-Behalf-Of
) headers are provided, MUST authenticate the requestError Responses
401
(AuthenticationRequired)403
(AuthenticationFailed)405
(MethodNotAllowed)On-Behalf-Of
header has been provided, MAY respond with a 412
(OnBehalfOfNotAllowed)Request Conditions:
Protocol Operation
Response Requirements
Authorization
(and optionally On-Behalf-Of
) headers are provided, MUST only list Service-URLs in the Service Document for which a deposit request would be permittedRequest Conditions:
Protocol Operation
Response Requirements
ETag
header if implementing concurrency controlRequest Conditions:
Response Requirements
ETag
header if implementing concurrency controlRequest Conditions:
Protocol Operation
Response Requirements
Request Conditions:
Protocol Operation
Response Requirements
Request Conditions:
Request Requirements
Content-Disposition
header, with the appropriate value for the requestRequest Conditions:
Response Requirements
ETag
header if implementing concurrency controlRequest Conditions:
Request Requirements
Digest
headerContent-Length
Server Requirements
Digest
headerContent-Length
if this is providedError Responses
412
(DigestMismatch). Note that servers MAY NOT check digests in real-time.400
(ContentMalformed)Request Conditions:
Request Requirements
Content-Type
headerServer Requirements
Response Requirements
Error Responses
Content-Type
header contains a format that the server cannot accept, MUST respond with 415
(ContentTypeNotAcceptable)413
(MaxUploadSizeExceeded)Request Conditions:
Request Requirements
Metadata-Format
headerServer Requirements
Metadata-Format
header is provided, MUST assume this is the standard SWORD format: http://purl.org/net/sword/3.0/types/MetadataError Responses
Metadata-Format
header indicates a format the server does not support, MUST return 415
(MetadataFormatNotAcceptable)Request Conditions:
Error Responses
Metadata-Format
header does not match the format found in the body content, MAY return 415
(FormatHeaderMismatch)Request Conditions:
Request Requirements
Server Requirements
Error Responses
400
(ByReferenceFileSizeExceeded)412
(ByReferenceNotAllowed)Request Conditions:
Request Requirements
Server Requirements
Error Responses
400
(BadRequest)Request Conditions:
Request Requirements
Metadata-Format
headerServer Requirements
Metadata-Format
header is provided, MUST assume this is the standard SWORD format: http://purl.org/net/sword/3.0/types/MetadataError Responses
400
(ByReferenceFileSizeExceeded)412
(ByReferenceNotAllowed)Request Conditions:
Error Responses
Metadata-Format
header does not match the format found in the body content, MAY return 415
(FormatHeaderMismatch)Metadata-Format
header indicates a format the server does not support, MUST return 415
(MetadataFormatNotAcceptable)Request Conditions:
Server Requirements
originalDeposit
Request Conditions:
Request Requirements
Packaging
header, and if so MUST be the Binary format identifierServer Requirements
Request Conditions:
Request Requirements
Packaging
headerServer Requirements
derivedResource
s from it.Error Responses
Packaging
header, MUST respond with a 415
(PackagingFormatNotAcceptable)Request Conditions:
Error Responses
Packaging
header does not match the format found in the body content, SHOULD return 415
(FormatHeaderMismatch). Note that the server may not be able to inspect the package during the request-response, so MAY NOT return this response.Request Conditions:
Request Requirements
Content-Length
header with value 0
Request Conditions:
Protocol Operation
Request Requirements
Slug
headerIn-Progress
headerServer Requirements
Slug
header is provided, MAY use this as the identifier for the newly created Object.In-Progress
header is provided, MUST assume that it is false
In-Progress
is false
, SHOULD expect further updates to the item, and not progress it through any ingest workflows yet.Response Requirements
Location
header, containing the Object-URLLocation
header immediately (irrespective of whether this is a 201
or 202
response)201
if the item was created immediately, a 202
if the item was queued for import, or raise an error.Request Conditions:
Server Requirements
Request Conditions:
Server Requirements
Request Conditions:
Server Requirements
Request Conditions:
Protocol Operation
Server Requirements
Response Requirements
201
to indicate that the Segmented Upload has been initialised, or raise an error.Location
header containing the Temporary-URL where the client can upload file segmentsError Responses
400
(MaxAssembledSizeExceeded)400
(InvalidSegmentSize)400
(SegmentLimitExceeded)Request Conditions:
Request Requirements
If-Match
header, if the server implements concurrency controlServer Requirements
If-Match
header does not match the current ETag
of the resourceError Responses
If-Match
header does not match the current ETag
, MUST respond with 412
(ETagNotMatched)If-Match
header is provided, MUST respond with 412
(ETagRequired)Request Conditions:
Response Requirements
204
if the replacement was deposited immediately, a 202
if the replacement was queued for import, or raise an error.Request Conditions:
Request Requirements
In-Progress
headerServer Requirements
In-Progress
header is provided, MUST assume that it is false
Response Requirements
ETag
header if implementing concurrency control200
if the request was accepted immediately, a 202
if the request was queued for processing, or raise an error.Request Conditions:
Protocol Operation
Request Conditions:
Response Requirements
Location
header, containing the File-URL of the Original Deposit FileRequest Conditions:
Server Requirements
Request Conditions:
Protocol Operation
Request Conditions:
Server Requirements
originalDeposit
. The server MUST also remove all Metadata, so the Metadata Resource contains no fields.Request Conditions:
Server Requirements
Request Conditions:
Server Requirements
originalDeposit
s. The server MUST also remove all Metadata, so the Metadata Resource contains no fields.Request Conditions:
Server Requirements
Request Conditions:
Server Requirements
Request Conditions:
Server Requirements
originalDeposit
sRequest Conditions:
Server Requirements
originalDeposit
Request Conditions:
Server Requirements
originalDeposit
Request Conditions:
Server Requirements
originalDeposit
Request Conditions:
Server Requirements
originalDeposit
sRequest Conditions:
Server Requirements
originalDeposit
s, and MUST add the Metadata to the item, and only treat this as an extension to existing Metadata. The server MUST NOT overwrite or otherwise remove existing Metadata.Request Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Conditions:
Response Requirements
204
if the delete is successful, 202
if the delete is queued for processing, or raise an errorRequest Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Conditions:
Protocol Operation
Request Requirements
In-Progress: false
Content-Length
header with value 0
Server Requirements
Response Requirements
204
or a suitable errorRequest Conditions:
Protocol Operation
Server Requirements
Response Requirements
204
or a suitable errorError Responses
400
(InvalidSegmentSize)Request Conditions:
Server Requirements
Error Responses
410
(SegmentedUploadTimedOut). Servers may also return 404
and no further explanation.400
(UnexpectedSegment)Request Conditions:
Protocol Operation
Response Requirements
200
or a suitable errorSWORD defines the semantics of its documents using JSON-LD [JSON-LD]. You can see the full JSON-LD Context here
The Service Document defines the capabilities and operational parameters of the server as a whole, or of a particular Service-URL.
The Service Document consists of a set of properties at the root, and a list of "services". Each service may define a Service-URL and/or additional properties and further nested "services". For the purposes of normalising the data held in the Service Document (for brevity of the serialised document), the Service Document MAY specify at the root properties which MUST be taken to hold true for all nested "services" (at any level below) unless that lower service definition overrides the properties. A service which sits beneath the root of the Service Document and above another Service, MAY also redefine properties, and those overrides MUST be considered to cascade down to Services beneath that one.
A Service Document can be retrieved either for the root of the service, or from any Service within the hierarchy of Services available. If the root Service Document is requested, the full list of Services, including all their children, MUST be provided. If the URL of a Service is requested, it MUST only provide information about itself and its children.
The full JSON Schema [JSON-SCHEMA] can be downloaded here.
An example of the Service Document:
{
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@id" : "http://example.com/service-document",
"@type" : "ServiceDocument",
"dc:title" : "Site Name",
"dcterms:abstract" : "Site Description",
"root" : "http://example.com/service-document",
"acceptDeposits": true,
"version": "http://purl.org/net/sword/3.0",
"maxUploadSize" : 16777216000,
"maxByReferenceSize" : 30000000000000000,
"maxSegmentSize" : 16777216000,
"minSegmentSize" : 1,
"maxAssembledSize" : 30000000000000,
"maxSegments" : 1000,
"accept" : ["*/*"],
"acceptArchiveFormat" : ["application/zip"],
"acceptPackaging" : ["*"],
"acceptMetadata" : ["http://purl.org/net/sword/3.0/types/Metadata"],
"collectionPolicy" : {
"@id" : "http://www.myorg.ac.uk/collectionpolicy",
"description" : "...."
},
"treatment" : {
"@id" : "http://www.myorg.ac.uk/treatment",
"description" : "..."
},
"staging" : "http://example.com/staging",
"stagingMaxIdle" : 3600,
"byReferenceDeposit" : true,
"onBehalfOf" : true,
"digest" : ["SHA-256", "SHA", "MD5"],
"authentication": ["Basic", "OAuth", "Digest", "APIKey"],
"services" : [
{
"@id": "http://swordapp.org/deposit/43",
"dc:title" : "Deposit Service Name",
"dcterms:abstract" : "Deposit Service Description",
"root" : "http://example.com/service-document",
"parent" : "http://example.com/service-document",
"acceptDeposits": true,
"services" : []
}
]
}
The fields available are defined as follows:
Field | Type | Description |
---|---|---|
@context | string | The JSON-LD Context for this document MUST be present. |
@id | string | The URL of the service document you are looking at MUST be present. |
@type | string | JSON-LD identifier for the document type This field is used to define the type of the document, and in this case should always be 'ServiceDocument'. MUST be present. |
accept | array | List of Content Types which are acceptable to the server. MUST be present. '/' for any content type, or a list of acceptable content types |
acceptArchiveFormat | array | List of Archive Formats that the server can unpack. If the server sends a package using a different format, the server MAY treat it as a Binary File SHOULD be present. '' for any archive format (not recommended), or a list of acceptable formats. If this is omitted, the client MUST assume the server only supports application/zip |
acceptDeposits | boolean | Does the Service accept deposits? SHOULD be present. If omitted, the client MUST assume that the service does not accept deposits. |
acceptMetadata | array | List of Metadata Formats which are acceptable to the server. SHOULD be present. '' for any metadata format, or a list of acceptable metadata formats. Acceptable metadata formats SHOULD be an IRI for a known format, or any other identifying string if no IRI exists. If this is omitted, the client MUST assume the server only supports the standard SWORD metadata format: http://purl.org/net/sword/3.0/types/Metadata |
acceptPackaging | array | List of Packaging Formats which are acceptable to the server. SHOULD be present. '*' for any packaging format, or a list of acceptable packaging formats. Acceptable packaging formats SHOULD be an IRI for a known format, or any other identifying string if no IRI exists. If this is omitted, the client MUST assume the server only supports the 3 required SWORD packaging formats (see the section Packaging Formats) |
authentication | array | List of authentication schemes supported by the server. SHOULD be present. If not provided the client MUST assume the server does not support authentication. |
byReferenceDeposit | boolean | Does the server support By-Reference deposit? SHOULD be present. If omitted, the client MUST assume the server does not support By-Reference deposit. |
collectionPolicy | object | URL and description of the server’s collection policy. MAY be present. |
collectionPolicy.@id | string | Collection Policy URL |
collectionPolicy.description | string | Collection Policy Description |
dc:title | string | The title or name of the Service MUST be present. |
dcterms:abstract | string | A description of the service MAY be present. |
digest | array | The list of digest formats that the server will accept. MUST be present, and MUST include SHA-256, MAY include any others. |
maxAssembledSize | integer | Maximum size in bytes as an integer for the total size of an assembled segmented upload SHOULD be present. If omitted and segmented upload is supported, the client MUST assume the server will accept a file of any size. |
maxByReferenceSize | integer | Maximum size in bytes as an integer for files uploaded by reference. SHOULD be present. If omitted, the client MUST assume the server will accept a file of any size. |
maxSegmentSize | integer | Maximum size in bytes as an integer for an individual segment in a segmented upload MAY be present. If omitted and segmented upload is supported, the client MUST assume the maximum segment size is the same as maxUploadSize. |
maxSegments | integer | Maximum number of segments that the server will accept for a single segmented upload, if segmented upload is supported. SHOULD be present. If omitted, the client MUST assume the server will accept any number of segments. |
maxUploadSize | integer | Maximum size in bytes as an integer for files being uploaded. SHOULD be present. If omitted, the client MUST assume the server will accept an upload of any size. |
minSegmentSize | integer | Minimum size in bytes as an integer for an individual segment in a segmented upload MAY be present. If omitted and segmented upload is supported, the client MUST assume the manimum segment size 1 byte. |
onBehalfOf | boolean | Does the server support deposit on behalf of other users (mediation) SHOULD be present. If omitted, the client MUST assume the server does not support On-Behalf-Of deposit. |
root | string | The URL for the root Service Document. MUST be present. |
services | array | List of Services contained within the parent service MAY be present. |
staging | string | The URL where clients may stage content prior to deposit, in particular for segmented upload MAY be present. If omitted, the client MUST assume the server does not support Segmented Upload. |
stagingMaxIdle | integer | What is the minimum time a server will hold on to an incomplete Segmented File Upload since it last received any content before deleting it. SHOULD be present. If omitted, the client MUST assume that the server will hold on to the incomplete file indefinitely. Servers MAY delete the unfinished upload at any time after the minimum time stated here has elapsed. |
treatment | object | URL and description of the treatment content can expect during deposit. MAY be present. |
treatment.@id | string | Treatment URL |
treatment.description | string | Treatment Description |
version | string | The version of the SWORD protocol this server supports MUST be present. |
The default SWORD Metadata document allows the deposit of a standard, basic metadata document constructed using the DCMI terms [DCMI]. This Metadata document can be sent when creating an Object initially, when appending to the metadata, or in replacing the metadata or indeed the Object as a whole.
The format of the document is simple and extensible (see the Metadata Formats section). The dc
and dcterms
vocabularies are supported,
and servers MUST support this metadata format.
The full JSON Schema [JSON-SCHEMA] can be downloaded here.
An example of the Metadata Document:
{
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@id" : "http://example.com/object/1/metadata",
"@type" : "Metadata",
"dc:title" : "The title",
"dcterms:abstract" : "This is my abstract",
"dc:contributor" : "A.N. Other"
}
The fields available are defined as follows:
Field | Type | Description |
---|---|---|
@context | string | The JSON-LD Context for this document MUST be present. |
@id | string | The URL of the Metadata Document you are looking at MUST be present. |
@type | string | JSON-LD identifier for the document type This field is used to define the type of the document, and in this case should always be 'Metadata'. MUST be present. |
^dc:.+$ | string | Properties from the DC namespace MAY be present. |
^dcterms:.+$ | string | Properties from the DCTERMS namespace MAY be present. |
When sending this document, the client MUST provide a Content-Disposition
header of the form:
Content-Disposition: attachment; metadata=true
Additionally, when sending this document the client SHOULD provide the Metadata-Format
header with the identifier
for the format: http://purl.org/net/sword/3.0/types/Metadata
Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata
If the client omits the Metadata-Format
header, the server MUST assume that it is the above format.
The By-Reference document allows the client to send a list of one or more files that the server will fetch asynchronously. The By-Reference document can be sent when creating an Object initially, or when appending to or replacing the FileSet in the Object, or replacing the Object as a whole.
The full JSON Schema [JSON-SCHEMA] can be downloaded here.
An example of the By-Reference Document:
{
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@type" : "ByReference",
"byReferenceFiles" : [
{
"@id" : "http://www.otherorg.ac.uk/by-reference/file.zip",
"contentType" : "application/zip",
"contentLength" : 123456,
"contentDisposition" : "attachment; filename=file.zip",
"packaging" : "http://purl.org/net/sword/packaging/SimpleZip",
"digest" : "SHA256=....",
"ttl" : "2018-04-16T00:00:00Z",
"dereference" : true
}
]
}
The fields available are defined as follows:
Field | Type | Description |
---|---|---|
@context | string | The JSON-LD Context for this document MUST be present. |
@type | string | JSON-LD identifier for the document type This field is used to define the type of the document, and in this case should always be 'ByReference'. MUST be present. |
byReferenceFiles | array | List of files to deposit By-Reference MUST be present and contain one or more entries |
byReferenceFiles[].@id | string | The URL of the file to be retrieved and deposited MUST be present |
byReferenceFiles[].contentDisposition | string | Content-Disposition as it would have been supplied if this were a regular file deposit. MUST be present |
byReferenceFiles[].contentLength | integer | Content-Length as it would have been supplied if this were a regular file deposit. SHOULD be present |
byReferenceFiles[].contentType | string | The Content-Type of the file to be retrieved and deposited MUST be present |
byReferenceFiles[].dereference | boolean | Should the server dereference the file (i.e. download it and store it locally) or should it simply maintain a link to the external resource. MUST be present. Note that servers MAY choose to do both, irrespective of the value here, though if false , the server should make the external link available to users accessing the resource. |
byReferenceFiles[].digest | string | Digest as it would have been supplied if this were a regular file deposit. MUST be present |
byReferenceFiles[].packaging | string | The packaging format of the file, or the Binary file identifier SHOULD be present. If this is not provided, the server MUST assume this is the Binary format: http://purl.org/net/sword/3.0/package/Binary |
byReferenceFiles[].ttl | string | A timestamp which indicates when the file will no longer be available (Time To Live). MUST be formatted as UTC big-endian date as per [NOTE-datetime]. If no date is provided, the server MAY assume the file will be available indefinitely. |
When sending this document, the client MUST provide a Content-Disposition
header of the form:
Content-Disposition: attachment; by-reference=true
In the event that the client wishes to send both Metadata and By-Reference content to the server, this is possible in the event that the Metadata format is expressed as JSON, such as the default SWORD metadata format.
If the client wishes to send a metadata format that is not or cannot be expressed as JSON, this operation is not available, it is provided only as a convenience. In that case, a separate Metadata Deposit and By-Reference Deposit should be carried out.
To do this, the client may include the Metadata and By-Reference documents embedded in a single JSON document, structured as shown below.
The entire Metadata document (including its JSON-LD @context
when using the default format) is embedded in a field
entitled metadata
, and the entire By-Reference document (again, with its JSON-LD @context
) is embedded in a field entitled by-reference
.
When a document of this form is sent, the client MUST set the Content-Disposition
header appropriately, to alert the server of its
required behaviour.
An example of the Metadata + By-Reference Document:
{
"metadata" : {
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@type" : "Metadata",
"dcterms:abstract" : "....",
"dc:contributor" : "...",
"etc..." : "...."
},
"by-reference" : {
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@type" : "ByReference",
"byReferenceFiles" : []
}
}
When sending this document, the client MUST provide a Content-Disposition
header of the form:
Content-Disposition: attachment; metadata=true; by-reference=true
Additionally, when sending this document the client SHOULD provide the Metadata-Format
header with the identifier for the format:
Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata
If the client omits the Metadata-Format
header, the server MUST assume that it is the default format: http://purl.org/net/sword/3.0/types/Metadata
The status document is provided in response to a deposit operation on a Service-URL, and can be retrieved at any subsequent point by a GET on the Object-URL, and is returned each time the client takes action on the Object-URL. It tells the client detailed information about the content and current state of the item.
The full JSON Schema [JSON-SCHEMA] can be downloaded here.
An example of the Status Document:
{
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@id" : "http://www.myorg.ac.uk/sword3/object/1",
"@type" : "Status",
"eTag" : "...",
"metadata" : {
"@id" : "http://www.myorg.ac.uk/sword3/object/1/metadata",
"eTag" : "..."
},
"fileSet" : {
"@id" : "http://www.myorg.ac.uk/sword3/object/1fileset",
"eTag" : "..."
},
"service" : "http://www.myorg.ac.uk/sword3",
"state" : [
{
"@id" : "http://purl.org/net/sword/3.0/state/inProgress",
"description" : "the item is currently inProgress"
}
],
"actions" : {
"getMetadata" : true,
"getFiles" : true,
"appendMetadata" : true,
"appendFiles" : true,
"replaceMetadata" : true,
"replaceFiles" : true,
"deleteMetadata" : true,
"deleteFiles" : true,
"deleteObject" : true
},
"links" : [
{
"@id" : "http://www.myorg.ac.uk/col1/mydeposit.html",
"rel" : ["alternate"],
"contentType" : "text/html"
},
{
"@id" : "http://www.myorg.ac.uk/sword3/object/1/package.zip",
"rel" : ["http://purl.org/net/sword/3.0/terms/originalDeposit"],
"contentType" : "application/zip",
"packaging" : "http://purl.org/net/sword/3.0/package/SimpleZip",
"depositedOn" : "[timestamp]",
"depositedBy" : "[user identifier]",
"depositedOnBehalfOf" : "[user identifier]",
"byReference" : "http://www.otherorg.ac.uk/by-reference/file.zip",
"status" : "http://purl.org/net/sword/3.0/filestate/ingested",
"log" : "[any information associated with the deposit that the client should know]"
},
{
"@id" : "http://www.myorg.ac.uk/sword3/object/1/file1.pdf",
"rel" : [
"http://purl.org/net/sword/3.0/terms/fileSetFile",
"http://purl.org/net/sword/3.0/terms/derivedResource"
],
"contentType" : "application/pdf",
"derivedFrom" : "http://www.myorg.ac.uk/sword3/object1/package.zip",
"dcterms:relation" : "http://www.myorg.ac.uk/repo/123456789/file1.pdf",
"dcterms:replaces" : "http://www.myorg.ac.uk/sword3/object/1/versions/file1.1.pdf",
"eTag" : "..."
},
{
"@id" : "http://www.myorg.ac.uk/sword3/object/1/package.1.zip",
"rel" : ["http://purl.org/net/sword/terms/packagedContent"],
"contentType" : "application/zip",
"packaging" : "http://purl.org/net/sword/3.0/package/SimpleZip"
},
{
"@id" : "http://www.swordserver.ac.uk/col1/mydeposit/metadata.mods.xml",
"rel" : ["http://purl.org/net/sword/3.0/terms/formattedMetadata"],
"contentType" : "application/xml",
"metadataFormat" : "http://www.loc.gov/mods/v3"
},
{
"@id" : "http://www.myorg.ac.uk/sword3/object/1/versions/file1.1.pdf",
"rel" : ["http://purl.org/net/sword/3.0/terms/derivedResource"],
"contentType" : "application/pdf",
"dcterms:isReplacedBy" : "http://www.myorg.ac.uk/sword3/object1/file1.pdf",
"versionReplacedOn" : "[xsd:dateTime]"
},
{
"@id" : "http://www.myorg.ac.uk/sword3/object/1/reference.zip",
"rel" : [
"http://purl.org/net/sword/3.0/terms/byReferenceDeposit",
"http://purl.org/net/sword/3.0/terms/originalDeposit",
"http://purl.org/net/sword/3.0/terms/fileSetFile"
],
"byReference" : "http://www.otherorg.ac.uk/by-reference/file2.zip",
"log" : "Any information on the download, especially if it failed",
"eTag" : "...",
"status" : "http://purl.org/net/sword/3.0/filestate/ingested"
}
]
}
The fields available are defined as follows:
Field | Type | Description |
---|---|---|
@context | string | The JSON-LD Context for this document MUST be present. |
@id | string | The Object-URL for this document MUST be present |
@type | string | JSON-LD identifier for the document type This field is used to define the type of the document, and in this case should always be 'Status'. MUST be present. |
actions | object | Container for the list of actions that are available against the object for the client. MUST be present |
actions.appendFiles | boolean | Whether the client can issue a request to append one or more files (individually or via a package) to the item MUST be present |
actions.appendMetadata | boolean | Whether the client can issue a request to append the metadata of the item MUST be present |
actions.deleteFiles | boolean | Whether the client can issue a request to delete files in the item. This may be a single file or all files. MUST be present |
actions.deleteMetadata | boolean | Whether the client can issue a request to delete all the item metadata. MUST be present |
actions.deleteObject | boolean | Whether the client can issue a request to delete the entire object. MUST be present. |
actions.getFiles | boolean | Whether the client can issue a request to retrieve any/all files in the item (both Binary Files and Packaged Content) MUST be present |
actions.getMetadata | boolean | Whether the client can issue a request to retrieve the item metadata MUST be present |
actions.replaceFiles | boolean | Whether the client can issue a request to replace files in an item. This may be a single file or all of the files. MUST be present |
actions.replaceMetadata | boolean | Whether the client can issue a request to replace the item metadata. MUST be present |
eTag | string | The current ETag for the Object MUST be present if the repository enforces concurrency control |
fileSet | object | Information about the identifier/version of the Object's FileSet MUST be present. |
fileSet.@id | string | The FileSet-URL for this Object MUST be present. |
fileSet.eTag | string | The Etag for the FileSet MUST be present if the server supports concurrency control |
links | array | List of link objects referring to the various files, both content and metadata, available on the object MUST be present if there is one or more links available to the client |
links[].@id | string | The URL of the resource MUST be present |
links[].byReference | string | The external URL of the location a By-Reference deposit was retrieved from SHOULD be present if this is an Original Deposit that was deposited By-Reference, or is an active By-Reference deposit |
links[].contentType | string | Content type of the resource SHOULD be present |
links[].dcterms:isReplacedBy | string | URL to a newer version of the file in the same Object, if this is present as a resource SHOULD be present, if newer version is present |
links[].dcterms:relation | string | URL to a non-sword access point to the file MAY be present. For example, the URL from which an end-user would download the file via the website. This related URL does not need to support any of the SWORD protocol operations, and indeed may even be on a server or application which has no sword support. Primary use case is to redirect the user to the web front end for the repository. |
links[].dcterms:replaces | string | URL to an older version of the file in the same Object, if this is also present as a resource. SHOULD be present, if an older version of the file is present |
links[].depositedBy | string | Identifier for the user that deposited the item SHOULD be present if this is an Original Deposit |
links[].depositedOn | string | Timestamp of when the deposit happened SHOULD be present if this is an Original Deposit. If present, MUST be formatted as UTC big-endian date as per [NOTE-datetime]. |
links[].depositedOnBehalfOf | string | Identifier for the user that the item was deposited on behalf of. SHOULD be present if this is an Original Deposit that was done On-Behalf-Of another user |
links[].derivedFrom | string | Reference to URL of resource from which the current resource was derived, for example, if extracted from a package that was deposited. SHOULD be present, if the resource is derived from another resource |
links[].eTag | string | The eTag of the resource MUST be present if the server supports concurrency control and the resource is available to the client to modify |
links[].log | string | Any information associated with the deposit that the client should know. MAY be present |
links[].packaging | string | The package format identifier if the resource is a package. SHOULD, if the resource is a package |
links[].rel | array | The relationship between the resource and the object. MUST be present. Note that multiple relationships are supported. |
links[].status | string | The status of the resource, with regard to ingest. SHOULD be present. For example, packaged resources which are still being unpacked and ingested may announce their status here. Likewise, by-reference deposits may do the same. MUST be one of the allowed status URIs. Any associated information to go along with the status, especially if the status is an error, SHOULD be in link[].log. If no value is provided, the client MUST assume that the item is in the status: http://purl.org/net/sword/3.0/filestate/ingested |
links[].versionReplacedOn | string | Date that the current resource was replaced by a newer resource SHOULD be present if dcterms:isReplacedBy is present |
metadata | object | Information about the identifier/version of the Object's Metadata MUST be present if the server permits any operations on metadata. |
metadata.@id | string | The Metadata-URL for this Object MUST be present if the server permits any operations on metadata |
metadata.eTag | string | The ETag for the Metadata MUST be present if the server supports concurrency control and the Metadata-URL is present |
service | string | The URL for the service to which this item was deposited (the Service-URL) MUST be present. This is the URL from which the client can retrieve information about the settings for the server that are relevant to this item (e.g. max upload sizes, etc) |
state | array | List of states that the item is in on the server. At least one state MUST be present, using the SWORD state vocabulary. Other states using server-specific vocabularies may also be used alongside. |
state[].@id | string | Identifier for the state. MUST be present. At least one such identifier MUST be from the SWORD state vocabulary. |
state[].description | string | Human readable description of the state MAY be present |
rel
types and their meaningsAn alternate, non-SWORD URL which will allow the user to access the same object. For example, this could be the URL of the landing page in the repository for the item.
The resource (file or package) was explicitly deposited via some deposit operation.
The relevant properties of the link section for any resource with this rel are
A file which was unpacked or otherwise derived from another deposited resource, and which itself was not explicitly deposited through some deposit operation. The main usage would be to identify files which were extracted from a deposited zip file.
The relevant properties of the link section for any resource with this rel are
A resource which makes this object available packaged in the specified package format on HTTP GET. This is not a resource which has been deposited or derived (though it may be very similar to an originally deposited package), it is one which the server makes available as a service to the client. Packages may be pre-built or assembled on the fly - that responsibility rests with the server.
The relevant properties of the link section for any resource with this rel are
A resource which makes this object’s metadata available, serialised in the specified metadata format on HTTP GET. This is not a resource which has been deposited or derived (though it may be very similar to the originally deposited metadata), it is one which the server makes available as a service to the client. Metadata documents may be pre-built or assembled on the fly - that responsibility rests with the server.
The relevant properties of the link section for any resource with this rel are
A file which is currently being downloaded from an external reference. Often will also have the rel for originalDeposit, and once all segments have been uploaded the byReferenceDeposit rel can be removed.
The relevant properties of the link section for any resource with this rel are
A File which can be considered by the client to be part of the FileSet. Files in this state are available for modification via the SWORD protocol, and should be considered to form the actual "content" of the Object.
state/@id
MUST contain one of:
The state field is a list, so it may also contain other states that are server-specific in addition to the SWORD values.
Some files, when deposited, may be processed asynchronously to the client’s request. For example, large files that require unpacking, by-reference deposits, etc. In these cases, the client will not receive feedback on the state or success of their deposit in the request/response exchange. Instead, the client may monitor the file(s) via the Status document, and for each appropriate file (Original Deposits), a “status” field will provide information on the current status of processing for that file.
The following statuses are permitted, servers SHOULD provide one of these by each relevant file:
A client may request information on an ongoing Segmented File Upload at any point via a GET to the Temporary-URL.
The full JSON Schema [JSON-SCHEMA] can be downloaded here.
An example of the Segmented File Upload Document:
{
"@context": "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@id": "http://example.com/temporary/1",
"@type": "Temporary",
"received": [
1,
2,
4
],
"expecting": [
3,
5
],
"assembledSize": 10000000,
"segmentSize": 2000000
}
The fields available are defined as follows:
Field | Type | Description |
---|---|---|
@context | string | The JSON-LD Context for this document MUST be present. |
@id | string | The Temporary-URL for this document MUST be present |
@type | string | JSON-LD identifier for the document type This field is used to define the type of the document, and in this case should always be 'Temporary'. MUST be present. |
assembledSize | integer | The expected size in bytes of the final resulting assembled file. MUST be present. |
expecting | array | This list of integers identifying the segments which are expected and that have not yet been deposited MUST be present if there are any segments remaining to be uploaded |
received | array | The list of integers identifying the segments that have been successfully uploaded so far. MUST be present if one or more segments have been uploaded |
segmentSize | integer | The expected size in bytes of the segments (except the final one) that will be uploaded. MUST be present. |
An error document is returned at any point that a synchronous operation fails.
The full JSON Schema [JSON-SCHEMA] can be downloaded here.
An example of the Error Document:
{
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@type" : "BadRequest",
"timestamp" : "[timestamp]",
"error" : "error summary",
"log" : "text log of any debug information for the client"
}
The fields available are defined as follows:
Field | Type | Description |
---|---|---|
@context | string | The JSON-LD Context for this document MUST be present |
@type | string | JSON-LD identifier for the document type This field is used to define the type of the document, and in this case should be one of the allowed Error Doucment types. MUST be present. |
error | string | A short summary/title for the error MUST be present |
log | string | Some detail as to the error, with any information that might help resolve it. SHOULD be present |
timestamp | string | When the error occurred. MUST be formatted as UTC big-endian date as per [NOTE-datetime]. MUST be present |
See Error Types for details of what errors can be reported in the @type
field.
It is strongly RECOMMENDED that SWORD servers support authentication and authorisation for requests.
SWORD servers are not restricted in the forms of authentication that they employ, and there is no minimum requirement or default supported approach.
Servers SHOULD enumerate the authentication schemes that they support in the Service Document, in the field authentication
, and MUST draw
from the IANA registry of HTTP auth scheme names [IANA Auth] where one is available.
Where an authentication scheme is in use by the server which is not covered by the IANA registry - such as a custom API-token-based approach, the server MAY indicate this in whatever way seems most appropriate.
For example, a Server which supports Basic, Digest and OAuth authentication, as well as a custom API-Key approach could indicate as follows:
{
"authentication": [
"Basic",
"OAuth",
"Digest",
"APIKey"
]
}
Servers MAY also choose to support On-Behalf-Of deposit, which means that the authenticating user is providing content to the server, as if another user were actually carrying out this request. A use case for this would be when a known third-party deposit tool is sending content to a server and has been authorised by another user to add content on their behalf.
If a server supports On-Behalf-Of deposit, it SHOULD indicate this in the Service Document with the field onBehalfOf
set to true
.
If this field is not present clients MUST assume that the server does not support On-Behalf-Of deposit.
{
"onBehalfOf": true
}
When carrying out authenticated requests, Authorization headers MUST be sent with every request to the server - the server is not
responsible for maintaining state for the client. The server is responsible for authenticating and authorising every request individually.
Clients may choose also to send Cookie
headers, and servers may support these, but support for Cookies is explicitly outside this
specification.
When an On-Behalf-Of deposit is received, the server MUST ensure that the user identified in that header is valid with respect to the associated Authorization header. For example, when using OAuth2, the On-Behalf-Of user MUST match the user for which the token in the Authorization header was granted.
There are two possible error responses to a request from the perspective of authentication:
If the request does not supply any credentials, and the server is expecting to authenticate requests, then a 401
(AuthenticationRequired) response MUST be returned.
If the request contains credentials and the server is unable to authenticate the client based on those credentials,
then a 403
(AuthenticationFailed) response MUST be returned.
In all cases (On-Behalf-Of or not) where a user has authenticated to make a deposit, servers SHOULD preserve the user's identity in the
depositedBy
property of the Original Deposit in the Status document. In On-Behalf-Of deposit, the value given in the On-Behalf-Of
header SHOULD be used for the value of the depositedOnBehalfOf
property of the Original Deposit in the Status document.
Note that recording a user's identity in this way does not have to contain enough information for the client to directly identify the user, and implementers should take note of privacy legislation when choosing what information to expose in these fields.
It is strongly RECOMMENDED that servers implement modern transport layer security, whether authenticating requests or not. If you are carrying out authenticated protocol operations you MUST implement TLS.
The following are the error types that are available (to place in @type
in the Error Document),
their associated HTTP Status Code, and the legitimate reasons for returning that error:
Error Type | Error Code | HTTP Name | Reason |
---|---|---|---|
AuthenticationFailed | 403 | Forbidden | The request supplied invalid credentials |
AuthenticationRequired | 401 | Unauthorized | The request supplied no credentials, when the server was expecting to authenticate the request. |
BadRequest | 400 | BadRequest | The request did not meet the standard specified by the SWORD protocol. This error can be used when no other error is appropriate |
ByReferenceFileSizeExceeded | 400 | BadRequest | The client supplied a By-Reference deposit file, which specified a file size which exceeded the server's limit |
ByReferenceNotAllowed | 412 | PreconditionFailed | The client attempted to carry out a By-Reference deposit on a server which does not support it |
ContentMalformed | 400 | BadRequest | The body content of the request was malformed in some way, such that the server cannot read it correctly. |
ContentTypeNotAcceptable | 415 | UnsupportedMediaType | The Content-Type header specifies a content type of the request which is in a format that the server cannot accept. |
DigestMismatch | 412 | PreconditionFailed | One or more of the Digests that the server checked did not match the deposited content |
ETagNotMatched | 412 | PreconditionFailed | The client supplied an If-Match header which did not match the current ETag for the resource being updated. |
ETagRequired | 412 | PreconditionFailed | The client did not supply an If-Match header, when one was required by the server |
Forbidden | 403 | Forbidden | The client requested an operation that is not permitted by the server in this context. |
FormatHeaderMismatch | 415 | UnsupportedMediaType | The Metadata-Format or Packaging header does not match what the server found when looking at the Metadata or Packaged Content supplied in a request. |
InvalidSegmentSize | 400 | BadRequest | The client sent a segment that was not the final segment, and was not the size that it indicated segments would be, or during segmented upload initialisation, the client specified a segment size which was not between minSegmentSize and maxSegmentSize . |
MaxAssembledSizeExceeded | 400 | BadRequest | During a segmented upload initialisation, the client specified a total file size which is larger than the maximum assembled file size supported by the server |
MaxUploadSizeExceeded | 413 | PayloadTooLarge | The request supplied body content which is larger than that supported by the server. |
MetadataFormatNotAcceptable | 415 | UnsupportedMediaType | The Metadata-Format header specifies a metadata format for the request which is in a format that the server cannot accept |
MethodNotAllowed | 405 | MethodNotAllowed | The request is for a method on a resource that is not permitted. This may be permanent, temporary, and may depend on the client’s credentials |
OnBehalfOfNotAllowed | 412 | PreconditionFailed | The request contained an On-Behalf-Of header, although the server indicates that it does not support this. |
PackagingFormatNotAcceptable | 415 | UnsupportedMediaType | The Packaging header specifies a packaging format for the request which is in a format that the server cannot accept |
SegmentedUploadTimedOut | 410 | MethodNotAllowed | The client's segmented upload URL has timed out. Servers MAY respond to this with a 404 and no explanation also. |
SegmentLimitExceeded | 400 | BadRequest | During a segmented upload initialisation, the client specified a total number of intended segments which is larger than the limit specified by the server |
UnexpectedSegment | 400 | BadRequest | The client sent a segment that the server was not expecting; in particular the server may have recieved all the segments it was expecting, and this is an extra one |
SWORD uses the Content-Disposition
header in client requests to indicate to the server information about the payload being delivered.
Traditionally Content-Disposition
is an HTTP response header, but it makes sense in the PUSH context of SWORD to use this as a request
header. We follow [RFC6266] for its usage.
Implementers should also note [RFC5987] if sending filenames which require characters outside the ISO-8859-1 character set.
The general format of a Content-Disposition header is as follows:
Content-Disposition: [disposition type]; [disposition param]=[value]; ...
The rules below define how to generate the correct Content-Disposition for a given set of Request Conditions. If you are implementing a SWORD client
or server it is STRONGLY RECOMMENDED that you work from the SWORDv3 Behaviours Document,
as this lays out the Content-Disposition
requirements per-request, rather than in the form of the normalised requirements below.
There are three general deposit operations in SWORD:
Each of these has a different Content-Disposition
, which makes it clear to the server what it should do with that content.
There are two aspects which control what the Content-Disposition should be:
The requirements below define what Disposition Type and Parameters are required for each kind of request. The requirements should be interpreted according to the following hierarchy for each of the above aspects:
The hierarchy for the Upload Type is:
The hierarchy for the Content is:
So, for example, if delivering a Metadata+By-Reference Document (MD+BR) as a Direct Deposit, you would take into account the following requirements:
The requirements are:
Request Conditions:
Disposition Type
Request Conditions:
Param
Request Conditions:
Param
Request Conditions:
Param
Request Conditions:
Param
filename\*
instead.Request Conditions:
Param
filename\*
instead.Request Conditions:
Disposition Type
Param
Request Conditions:
Disposition Type
Param
The following examples show a number of key cases:
A Metadata Deposit
Content-Disposition: attachment; metadata=true
A By-Reference Deposit
Content-Disposition: attachment; by-reference=true
A Metadata+By-Reference Deposit
Content-Disposition: attachment; metadata=true; by-reference=true
A Binary File Deposit
Content-Disposition: attachment; filename=[filename]
A Segmented Upload Initialisation
Content-Disposition: segment-init; size=[bytes]; digest=[digest]; segment_count=[n]; segment_size=[bytes]
A File Segment Upload
Content-Disposition: segment; segment_number=[n]
In order to ensure that the content transmitted via SWORD is correct when it arrives at its destination, clients MUST provide Digests that servers MUST check against incoming content.
Servers can announce support for the Digest formats that they support in the Service Document as follows:
{
"digest": [
"SHA-256",
"SHA",
"MD5"
]
}
The Server SHOULD list all the digest formats that it supports. Servers MUST support at least SHA-256 and MAY support any other digest formats.
The Digest formats MUST be identified as per the IANA HTTP Digest Algorithm values: [IANA Digest]
SWORD uses the recommendations of [RFC3230] for transmitting base64 encoded Digests of request bodies.
For every request where there is a request body, the client MUST attach the Digest
header with the appropriate content:
Digest: SHA-256=MzA1ZmIzMDJiZjA4MzUzYTg5ZGY4NDIxMjcyY2JmZTEwNzM5ODdmMjJhY2Y1ZDc5NzFhOTY3MmM1MGNkN2ZlMA==
Note that the client MAY send multiple digests from different algorithms, separated by commas in the header:
Digest: SHA-256=MzA1ZmIzMDJiZjA4MzUzYTg5ZGY4NDIxMjcyY2JmZTEwNzM5ODdmMjJhY2Y1ZDc5NzFhOTY3MmM1MGNkN2ZlMA==, MD5=ZjQxNjA3N2M3MDdhODJkZGJlMGE0YTk2NGRjZWEyNWE=
The server MUST validate at least one digest, SHOULD validate all digests, though MAY choose its preferred format to validate against.
Servers MAY choose to implement concurrency control, in order to ensure that clients do not accidentally overwrite or make changes that conflict with other changes which may have happened to the Object since it was first deposited. Note that this does not prevent clients causing damage to Objects, only that it cannot be so easily done by accident.
Objects may change for a number of reasons after their initial creation, such as:
In order to provide concurrency control, SWORD follows [RFC7232], and specificially uses the ETag
and If-Match
headers.
On each request for a resource, or when the Status document is retrieved, the ETag
for the resource MUST be returned. The ETag
gives
the client an opaque identifier for the current version of that resource. When the resource is being updated by the client (for example,
it is replacing a File), the ETag
that the client expects to be the current one MUST be sent in the If-Match
header. The server MUST
then compare that with its actual current ETag
for the resource. If they match, the request can go ahead, otherwise the Server MUST
respond with an error (412).
Note that ETags, and Concurrency Control in general, is only applicable from the Object downwards. There are no requirements for use of
ETag
or If-Match
headers on Service-URLs.
The server does not have to announce support for concurrency control in the Service Document. Clients MUST check response headers for the
presence of an ETag
. Presence of the ETag
indicates that the server requires the client to pay attention to its concurrency control
procedures, and to carry out later requests with an If-Match
header.
If supporting concurrency control, Servers MUST provide an ETag
on all responses to requests (GET, POST, PUT) against resources from the
Object and below.
If a server supports Concurrency Control, it MUST behave in accordance with the following rules.
ETag
MUST be provided for each SWORD resource: the Object, the Metadata, the FileSet and any Files.ETag
is a resource-level version identifier, it MUST be the same for all expressions of the resource. For example, all serialised
Metadata documents (such as in JSON, or in XML) MUST have the same ETag
as the Metadata resource, and each other.ETag
that it expects to represent the current version with every request to change the resource (POST, PUT,
DELETE) by placing it in the If-Match
headerETag
supplied by the client in the If-Match
header does not match the current ETag
for the resource, the Server MUST respond with a 412 (Precondition
Failed) errorETag
supplied by the client in the If-Match
header does match the current ETag
for the resource, the request MUST go ahead as normal.ETag
in the HTTP headers of every GET request for a resource.ETag
s for the resources in the appropriate places in the Status document.ETag
MUST changeETag
s of all resources within which it is contained MUST change.ETag
s, at its discretionETag
values for a resource.If an ETag
of a resource changes, the resources above it (up to the level of the Object) MUST also change. This is to prevent a change
at a higher level (e.g. an Object replacement) overwriting a change at a lower level (e.g. addition of a single file).
The Object hierarchy is as follows:
So, for example, if the Metadata is updated, then the Metadata and Object ETag
s MUST change, but the FileSet and File ETag
s MAY NOT.
Similarly, if a File ETag
changes, then the FileSet and Object ETag
s must also change, while the Metadata ETag
MAY NOT.
Some systems may wish to give the client more control over the ingest process, and SWORD uses the In-Progress
HTTP header to allow the
client to indicate that a deposit should not yet be injected into any post-submission or pre-ingest workflow. The In-Progress
header MUST
take the value true
or false
, and if it is not present the server MUST assume that it is false
and behave as described below.
An example use case for this is that the client may be embedded into a system which uses the SWORD server as a storage layer, but which
cannot acquire all of the content for a "finished" item in one deposit operation. Consider a user-facing system which encourages users to
upload files one at a time through some web interface, which causes each file to be directly deposited onto the SWORD server. At the start
of the deposit the client asserts that deposit is In-Progress: true
, and then proceeds to upload files. If uploading them to the
Object-URL the client continues to assert In-Progress: true
on each request (if depositing to other URLs this is not necessary). This
goes on until the user confirms that they have uploaded all the relevant files, or navigates away from the page. At that stage, the client
can issue a blank HTTP POST request to the SWORD server, with In-Progress: false
to complete the deposit.
Note that the In-Progress
header is intended to indicate to the server that further content will be coming in which is associated with
the existing content, before it can be considered "complete". It is not intended to provide workflow control, and clients MUST NOT assume
that asserting In-Progress: true
will have any specific effect on the state of the item.
If In-Progress
is false
, the server MAY assume that it can carry on processing the deposit as it sees fit.
If In-Progress
is true
, the server SHOULD expect the client to provide further updates to the item some undetermined time in the future.
Details of how this is implemented is dependent on the server's purpose. For example, a repository system may hold items which are marked
In-Progress
in a workspace until such time as a client request indicates that the deposit is complete.
The client can assert that a deposit process has completed by issuing an HTTP POST to the Object-URL with a blank request body and with the
In-Progress
header set to false
(it may simply omit the header altogether too, as this is treated as In-Progress: false
by the
server). The client MAY specify a Content-Length: 0
HTTP header, and MUST NOT include any body content.
Once the server has processed the request it MUST respond with status code 204 (No Content), or a suitable error.
If a client has a very large file that it wishes to transfer to the server by value, then in may be beneficial to do this in several small operations, rather than as a single large operation. Large uploads are at higher risk of failure, depending on a variety of factors, and there is no guarantee that a SWORD server will be able to resume a partial upload.
In order to transfer a large file, the client can break it down into a number of equally sized segments of binary data (the final segment may be a different size to the rest). It can then initialise a Segmented File Upload with the server, and then transfer the segments. The server will reconstitute these segments into a single file, and then the client may deposit this file by-reference.
Segments can be uploaded in any order, and can be uploaded one at a time or in parallel.
Servers MAY support Segmented File Upload. To do so, it must provide a staging area where file segments can be uploaded prior to the client
requesting a specific deposit operation. The server MUST include a staging
field in the Service Document with a URL for where
the client can initialise its Segmented File Upload. It SHOULD also specify how long it will retain an unfinished Segmented File Upload, before
assuming that the client will not complete it, with the stagingMaxIdle
field. In addition, the server SHOULD specify the size
parameters of the segments using maxSegmentSize
, minSegmentSize
, maxAssembledSize
and maxSegments
:
{
"maxAssembledSize": 30000000000000,
"maxSegmentSize": 16777216000,
"maxSegments": 1000,
"minSegmentSize": 1,
"staging": "http://example.com/staging",
"stagingMaxIdle": 3600
}
Obtain the Staging-URL[def] from the Service from which to request an Temporary-URL[def]
If the client is creating a new Object, the Staging-URL can be found in the staging
field in the Service Document. If an Object
already exists, the client should find the Service-URL from the service
field in the Service Document, then GET this URL
to obtain the appropriate Service Document, and subsequently get the Staging-URL from the staging
field.
Request a Temporary-URL[def] from the Service, via a Segmented Upload Initialisation request.
Send a POST request to the Staging-URL, as per POST Staging-URL, with the appropriate Content-Disposition
(see below). The
server will respond with a Temporary-URL in the Location
header.
Upload all the file segments to the Temporary-URL[def]
Send one or more POST requests to the Temporary-URL as per POST Temporary-URL, with the appropriate Content-Disposition
(see
below), until all file segments have been uploaded.
Carry out the desired deposit operation as a By-Reference deposit, using the Temporary-URL as the by-reference file.
Refer to the section By-Reference Deposit for more information on this approach. Deposits of content hosted at Temporary-URLs SHOULD NOT
contain the ttl
or dereference
fields in the By-Reference Document, and if they are included, the server MUST ignore them.
Before sending any segments to the server, the client must initialise the process. This is done by sending a POST request to the Staging-URL as per POST Staging-URL.
The requirements of the protocol for a Segment Upload Initialisation are:
Protocol Operation
Request Requirements
Authorization
and On-Behalf-Of
headers (i.e. if authenticating this request)Content-Disposition
header, with the appropriate value for the requestContent-Length
header with value 0
Server Requirements
Authorization
(and optionally On-Behalf-Of
) headers are provided, MUST authenticate the requestResponse Requirements
201
to indicate that the Segmented Upload has been initialised, or raise an error.Location
header containing the Temporary-URL where the client can upload file segmentsSee the section Content Disposition for detailed information on the Content-Disposition
header. Based on that section, the
supplied Content-Disposition
would be:
Content-Disposition: segment-init; size=[bytes]; digest=[digest]; segment_count=[n]; segment_size=[bytes]
The server MAY choose to reject the Segmented Upload Initialisation request at this stage, for a variety of reasons - for example, it may have a limit on the total number of segments it will accept, or the total size may exceed a maximum file size for assembled files. In these cases, the server MUST respond with one of the appropriate Error Types.
If the request is successful, the server will respond with a Temporary-URL in the Location
header, and the segments themselves can be
uploaded to that URL.
Segments may be uploaded in any order and may also be parallelised. Segments MUST all be the same size, with the exception of the final
segment with MUST be the same size or smaller than the other segments. Segments size MUST be smaller than the maxSegmentSize
if specified
and if not then smaller than maxUploadSize
specified in the Service Document. Segments MUST be larger than the minSegmentSize
also specified in the Service Document.
The requirements of the protocol for File Segment Upload are:
Protocol Operation
Request Requirements
Authorization
and On-Behalf-Of
headers (i.e. if authenticating this request)Content-Disposition
header, with the appropriate value for the requestDigest
headerContent-Length
Server Requirements
Authorization
(and optionally On-Behalf-Of
) headers are provided, MUST authenticate the requestDigest
headerContent-Length
if this is providedResponse Requirements
204
or a suitable errorError Responses
401
(AuthenticationRequired)403
(AuthenticationFailed)405
(MethodNotAllowed)On-Behalf-Of
header has been provided, MAY respond with a 412
(OnBehalfOfNotAllowed)412
(DigestMismatch). Note that servers MAY NOT check digests in real-time.400
(ContentMalformed)400
(InvalidSegmentSize)410
(SegmentedUploadTimedOut). Servers may also return 404
and no further explanation.400
(UnexpectedSegment)See the section Content Disposition for detailed information on the Content-Disposition
header. Based on that section, the
supplied Content-Disposition
would be:
Content-Disposition: segment; segment_number=[n]
The Content-Type
header MUST just be application/octet-stream
.
The Digest
header MUST contain the Digest for the File Segment itself, so the server can confirm successful transfer of the segment.
At any point after creating a Temporary-URL, the client may request information on the state of their Segmented File Upload. This can be done via a GET to the Temporary-URL.
This will return you a document as described in Segmented File Upload Document.
The requirements for this operation are:
Protocol Operation
Request Requirements
Authorization
and On-Behalf-Of
headers (i.e. if authenticating this request)Server Requirements
Authorization
(and optionally On-Behalf-Of
) headers are provided, MUST authenticate the requestResponse Requirements
200
or a suitable errorError Responses
401
(AuthenticationRequired)403
(AuthenticationFailed)405
(MethodNotAllowed)On-Behalf-Of
header has been provided, MAY respond with a 412
(OnBehalfOfNotAllowed)NOTE that you cannot retrieve an actual copy of the full or partially uploaded Segmented File Upload from the Temporary-URL at any point.
If, part way through a segmented upload (even after completion) the client wishes to abort, it can send an DELETE request to the Temporary-URL, with the following requirements:
Protocol Operation
Request Requirements
Authorization
and On-Behalf-Of
headers (i.e. if authenticating this request)Server Requirements
Authorization
(and optionally On-Behalf-Of
) headers are provided, MUST authenticate the requestResponse Requirements
204
if the delete is successful, 202
if the delete is queued for processing, or raise an errorError Responses
401
(AuthenticationRequired)403
(AuthenticationFailed)405
(MethodNotAllowed)On-Behalf-Of
header has been provided, MAY respond with a 412
(OnBehalfOfNotAllowed)If a client submits the Temporary-URL as a By-Reference deposit to the server after completing the upload, the client SHOULD NOT delete the Temporary-URL themselves, the server SHOULD take responsibility for this. If the client deletes the resource before the By-Reference deposit has completed, the server SHOULD record an error against the ingest.
Servers SHOULD delete incomplete Segmented File Uploads after a specified amount of time (in the Service Document), if they are not finalised with all segments.
Servers SHOULD delete completed Segmented File Uploads after a specified amount of time (in the Service Document). Servers MUST be able to tell when they have been given one of their own Temporary-URLs as a By-Reference deposit, and not delete that resource until after it has been ingested.
If a Temporary-URL is used in a By-Reference deposit, this should reset the idle counter on the server for that file, and the server SHOULD NOT delete the file until after the idle period has expired. This allows clients to be able to reference the file in multiple deposits should that be necessary.
Servers MUST respond with Error documents under the following circumstances (in addition to the standard errors that may arise through using the protocol):
The server MAY respond with an Error document under the following circumstances:
If any other errors occur asynchronously, such as in reassembling or unpacking the resulting file, servers MUST provide an error status
field and suitable log
information in the link record in the Status document.
By-Reference Deposit is when the client provides the server with URLs for Files which it would like the server to retrieve asynchronously to the deposit request itself. This could be useful in a number of contexts, such as when the files are very large, and are stored on specialist staging hardware, or where the files are already readily available elsewhere, and there is no need to push them through a by-value deposit.
Servers MAY support By-Reference deposit. If a server supports By-Reference it SHOULD indicate this in the Service Document
using the field byReferenceDeposit
:
{
"byReferenceDeposit": true
}
Clients may use a By-Reference Deposit anywhere a by-value deposit could be carried out. Instead of sending any Binary content, the client sends the By-Reference Document containing one or more (depending on context) URLs to files which the server can retrieve.
See the document SWORDv3 Behaviours for an expansion of the Protocol Requirements for requests to deposit By-Reference.
The Content Disposition for a By-Reference deposit is:
Content-Disposition: attachment; by-reference=true
If carrying out a Segmented File Upload, the final deposit stage is to send the Temporary-URL[def] to the server
as part of a By-Reference deposit. In this case the client SHOULD omit the ttl
and dereference
fields from the
By-Reference Document, thus:
{
"@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
"@type" : "ByReference",
"byReferenceFiles" : [
{
"@id" : "[Temporary-URL]",
"contentType" : "application/zip",
"contentLength" : 123456,
"contentDisposition" : "attachment; filename=file.zip",
"packaging" : "http://purl.org/net/sword/packaging/SimpleZip",
"digest" : "SHA256=...."
}
]
}
The server MUST recognise one of its own Temporary-URLs, and should implement ingest in the most efficient way possible, remembering that
you cannot retrieve a copy of the actual Segmented File Upload from the Temporary-URL via GET, so the server MUST have a way to retrieve the content
from those uploads in another way. The server MUST NOT delete the resource
until after it has been successfully ingested (i.e. the stagingMaxIdle
time should be ignored when the server has received the resource
as a By-Reference deposit).
The following is the procedure that MUST be followed by servers implementing By-Reference deposit.
The server receives a By-Reference Document with one or more files listed
The server creates records for each of these files that it plans to dereference, which then become visible in the
Status Document. Files marked by the client not to be dereferenced are considered metadata, and MAY NOT appear in the Status
Document. All other supplied Files MUST have the status pending
in the Status Document.
The server responds to the client with the appropriate response for the action (See Protocol Operations and Protocol Requirements)
At its own pace, taking into account the ttl
of the Files, the server obtains all the files that are marked for dereference and
validates them against their Digest and any other supporting information such as contentType
, contentLength
, and packaging
. During
the download the server SHOULD set the status to downloading
. The server SHOULD be able to resume an interrupted download.
Once the Files are downloaded and processed, the server MUST set the status to ingested
. If the Files need unpacking first, the
server SHOULD first set the status to unpacking
and then ingested
when this operation is complete. The server MUST also remove the
byReferenceDeposit
rel.
If there is an error in downloading or otherwise processing the file, the server MUST set the status to error
and SHOULD provide a
meaningful log
message.
The server MAY continue to record the original URL of the file if desired.
While a By-Reference File is being processed, it MUST be represented in the Status Document under the link
field. The
following sections show how it is represented.
On Initial Deposit
{
"@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip",
"byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip",
"eTag": "1",
"rel": [
"http://purl.org/net/sword/3.0/terms/byReferenceDeposit",
"http://purl.org/net/sword/3.0/terms/originalDeposit",
"http://purl.org/net/sword/3.0/terms/fileSetFile"
],
"status": "http://purl.org/net/sword/3.0/filestate/pending"
}
During Download
{
"@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip",
"byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip",
"eTag": "1",
"rel": [
"http://purl.org/net/sword/3.0/terms/byReferenceDeposit",
"http://purl.org/net/sword/3.0/terms/originalDeposit",
"http://purl.org/net/sword/3.0/terms/fileSetFile"
],
"status": "http://purl.org/net/sword/3.0/filestate/downloading"
}
During Unpacking
{
"@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip",
"byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip",
"eTag": "2",
"rel": [
"http://purl.org/net/sword/3.0/terms/originalDeposit",
"http://purl.org/net/sword/3.0/terms/fileSetFile"
],
"status": "http://purl.org/net/sword/3.0/filestate/unpacking"
}
After Completion
{
"@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip",
"byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip",
"eTag": "2",
"rel": [
"http://purl.org/net/sword/3.0/terms/originalDeposit",
"http://purl.org/net/sword/3.0/terms/fileSetFile"
],
"status": "http://purl.org/net/sword/3.0/filestate/ingested"
}
In Case of Error
{
"@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip",
"byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip",
"eTag": "2",
"log": "There was an error ingesting your file",
"rel": [
"http://purl.org/net/sword/3.0/terms/originalDeposit",
"http://purl.org/net/sword/3.0/terms/fileSetFile"
],
"status": "http://purl.org/net/sword/3.0/filestate/error"
}
To provide deposit By-Reference, the reference server, where the file is initially hosted, SHOULD:
To use By-Reference deposit, the client SHOULD:
SWORD allows the client to deposit arbitrary metadata onto the server through agnostic support for metadata formats. A metadata format is any document which expresses metadata in a given serialisation. SWORD has a default format which MUST be supported by the server, which consists of the set of DCMI Terms [DCMI] expressed as JSON (see Metadata Document).
In general, the form of metadata consists of several aspects:
The serialisation, such as to JSON or XML
The vocabulary of the metadata, such as Dublin Core, or MODS (sometimes the vocabulary and the serialisation will be conflated here)
The profile of the metadata, such as the RIOXX profile for DC (+extensions)
Any format (combining the 3 aspects above) may be represented by an IRI in the protocol, or an opaque string if no IRI exists or can be minted.
SWORD does not require that the server be able to disseminate any metadata in a format other than the default format. Metadata in the
default format can be obtained from GET Metadata-URL. If the server chooses to make other metadata formats
available, this SHOULD be listed in the links
section of the Status Document. See
Representing Other Formats in the Service Document for details.
The server can list Metadata formats that it will accept in the acceptMetadata
field of the Service Document.
If no acceptMetadata
field is present, the client MUST assume the server only supports the default SWORD metadata format
(http://purl.org/net/sword/3.0/types/Metadata).
{
"acceptMetadata": [
"http://purl.org/net/sword/3.0/types/Metadata"
]
}
During deposit, the client SHOULD specify a Metadata-Format
header which contains the identifier for the format. For example, if
supplying the default SWORD metadata format:
Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata
If this header is not present the server MUST assume it has the above value.
In order to provide a baseline of interoperability, SWORD provides a default metadata format which MUST be supported by the server. This document has the following aspects (as per Metadata Deposit):
It is serialised as JSON and with a JSON-LD @context
It contains dc
and dcterms
vocabulary elements, and any other arbitrary elements added by the client
It does not pre-suppose any particular profile of usage of these vocabulary elements.
Clients MAY choose to extend this document with their own metadata fields, though the server MAY NOT understand them, and MAY ignore them.
When using this Metadata Format, the client should identify it in the Metadata-Format header with the following IRI:
http://purl.org/net/sword/3.0/types/Metadata
In addition to the standard SWORD metadata format described above, SWORD can support the deposit of arbitrary metadata schemas and serialisations.
Clients who wish to ensure that their servers support all the metadata they send them should consider minting a new identifier for their format, and looking for servers to declare explicit support for it.
Clients should not expect that servers will keep their metadata in the format it is provided. Servers can and will store the metadata in their internal formats as needed.
The following is a minimal example of the deposit of a MODS XML metadata file while creating a new Object:
POST Service-URL
Content-Type: application/xml
Content-Disposition: attachment; metadata=true
Digest: SHA-256=74b2851bd2760785b0987ba219debea69c228353f7ccc67a2bdcd9819f97fc71
Metadata-Format: http://www.loc.gov/mods/v3
<mods xmlns:mods="http://www.loc.gov/mods/v3">
<originInfo>
<place>
<placeTerm type="code" authority="marccountry">nyu</placeTerm>
<placeTerm type="text">Ithaca, NY</placeTerm>
</place>
<publisher>Cornell University Press</publisher>
<copyrightDate>1999</copyrightDate>
</originInfo>
</mods>
If the server supports the MODS Metadata-Format, identified with the IRI http://www.loc.gov/mods/v3
then it will be able to create a new
Object from this XML document, and populate the Metadata from the data therein.
A server is not required to retain or be able to disseminate the metadata delivered to it by the client in the format it is provided. Alternative
metadata formats to the default format MAY be accepted (as defined by the acceptMetadata
field in the Status Document),
but the server is not required to be able to serve that metadata format as well.
If the server chooses to expose metadata in alternative formats to the default, it may do so by providing them
as links in the links
section of the Status Document. To do this:
rel
type of http://purl.org/net/sword/3.0/terms/formattedMetadata
contentType
as neededmetadataFormat
as the format identifier for the metadata schema.For example, to reflect the metadata from the previous section back to the client:
{
"@id": "http://www.swordserver.ac.uk/col1/mydeposit/metadata.mods.xml",
"contentType": "application/xml",
"metadataFormat": "http://www.loc.gov/mods/v3",
"rel": [
"http://purl.org/net/sword/3.0/terms/formattedMetadata"
]
}
SWORD allows you to deposit both Files and Metadata simultaneously through support of Packaged Content. SWORD does not place any limitations on the number or type of packaging formats that the client/server support, though see the section Packaging Formats for the packages that MUST be supported by the server.
The Service Document uses the acceptPackaging
field to indicate that a Service will accept deposits of a particular
packaging format, and the acceptArchiveFormat
field to indicate the serialisation/compression formats that it understands.
Clients should refer to the treatment
description in the Service Document to find out the treatment for a particular packaging type.
Packages formats SHOULD be identified by a IRI, but MAY be identified by an arbitrary string.
If no acceptPackaging
field is supplied the client MUST assume that the server does not formally support any package formats, and
should expect everything to be treated as per the server's policies with regard to the mimetype as per the accept
element.
If no acceptArchiveFormat
field is supplied the client MUST assume that the server supports application/zip
only.
{
"accept": [
"*/*"
],
"acceptArchiveFormat": [
"application/zip"
],
"acceptPackaging": [
"*"
]
}
When depositing Packaged Content, the client SHOULD indicate the archive file MIME type using the Content-Type
header, and SHOULD also
give information about content packaging using the Packaging
header.
The value of the Packaging
header SHOULD match one of values the server has advertised as acceptable for the service.
If a server receives a POST with an unacceptable Packaging
header value, it MUST reject the POST by returning an HTTP response with a
status code of 415
(Unsupported Media Type) and a SWORD Error document with URI
http://purl.net/org/sword/3.0/error/PackagingFormatNotAcceptable, or store the content without further processing.
Status documents can speak about packaging in two distinct ways, depending on whether an element in the links
list refers to a file that
was deposited, or a file that is available for retrieval by the client (or both).
When a package has been deposited as the Original Deposit, it SHOULD record the packaging format and content type alongside it in the record.
{
"@id": "http://www.myorg.ac.uk/sword3/object1/package.zip",
"contentType": "application/zip",
"packaging": "http://purl.org/net/sword/3.0/package/SimpleZip",
"rel": [
"http://purl.org/net/sword/3.0/terms/originalDeposit"
]
}
Similarly, when a package has been created by the server from the Object’s content and made available to the client as a service, the packaging format and content type MUST be presented alongside it:
{
"@id": "http://www.myorg.ac.uk/sword3/object1/package.1.zip",
"contentType": "application/zip",
"packaging": "http://purl.org/net/sword/3.0/package/SimpleZip",
"rel": [
"http://purl.org/net/sword/terms/packagedContent"
]
}
There are 3 packaging formats the all SWORD implementations MUST support.
URI: http://purl.org/net/sword/3.0/package/Binary
This format indicates that the package should be interpreted as an opaque blob, and the server SHOULD NOT attempt to extract any content from it. This is typically for use when depositing single files, which do not need unpacking of any kind.
Servers MAY choose, nonetheless, to extract content from Binary packages, if they have the capabilities, such as metadata from images, structural information from text documents, etc.
URI: http://purl.org/net/sword/3.0/package/SimpleZip
This format indicates that the package is a compressed set of one or more files in an arbitrary directory structure. The nature of the compression and the structure of the compressed content is not specified.
Servers MAY choose to extract the content from SimpleZip packages, and present the individual file components as derivedResource
s, if
desired.
URI: http://purl.org/net/sword/3.0/package/SWORDBagIt
This format is a profile of the BagIt directory structure, which has in turn been serialised (which may include compression). The nature
of the serialisation/compression is not specified, though if the client wishes the server to extract the content, it SHOULD use one of
the formats specified in the Service Document field acceptArchiveFormat
.
A SWORD BagIt Profile is available which desribes the outline structure of the bag.
SwordBagIt
| -- bag-info.txt
| -- bagit.txt
| -- data
| -- | -- bitstreams ...
| \ -- directories ...
| \ bitstreams ...
| -- manifest-sha-256.txt
| -- metadata
| \-- sword.json
\ -- tagmanifest-sha-256.txt
This allows us to represent the item as a combination of an arbitrary structure of bitstreams in the data directory (similar to SimpleZip),
and the metadata in the sword default format in metadata/sword.json
. A manifest
(and tagmanifest
) of sha-256 checksums is required, as
well as the bagit.txt
file and a bag-info.txt
file. Note that although listed, the bag-info.txt
is not used by SWORD to transfer metadata. All metadata MUST appear in
metadata/sword.json
.
The content of sword.json
is exactly as defined in the SWORD default Metadata. Note that use of fetch.txt
is not supported here.
The server SHOULD unpack this file, and action at least the Metadata. The contents of the data directory MAY be unpackaged into
derivedResource
s if the server desires. It is RECOMMENDED that the contents of the data directory be a flat file structure, to aid
mutual comprehension by servers/clients.
In order to assist potential clients discover a server’s capabilities, SWORD RECOMMENDS the following auto-discovery features to be embedded in any web interfaces associated with the service provider.
Embed an html link with a rel value of http://purl.org/net/sword/3.0/discovery/Service
in any page which represents
a deposit Service.
<html:link rel="http://purl.org/net/sword/3.0/discovery/Service" href="[Service-URL]"/>
Embed an html link with a rel value of http://purl.org/net/sword/3.0/discovery/Object
in any page which represents a deposited resource.
<html:link rel="http://purl.org/net/sword/3.0/discovery/Object" href="[Object-URL]"/>
For any server which wishes to expose its main or root Service-URL via Well-Known URIs [RFC8615], provide a
redirect (307) from ./well-known/swordv3
(PROVISIONAL) to your root Service-URL.
AtomPub Gregario, J. and B. de hOra, "The Atom Publishing Protocol", RFC 5023, October 2007. http://www.ietf.org/rfc/rfc5023.txt
DCMI DCMI Metadata Terms, 2012-06-14 http://dublincore.org/documents/dcmi-terms/
IANA Auth Hypertext Transfer Protocol (HTTP) Authentication Scheme Registry https://www.iana.org/assignments/http-authschemes/http-authschemes.xhtml
IANA Digest Hypertext Transfer Protocol (HTTP) Digest Algorithm Values https://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml
JSON-LD JSON-LD 1.1, A JSON-based Serialization for Linked Data, 28 March 2018 https://json-ld.org/spec/latest/json-ld/
JSON-SCHEMA JSON Schema: A Media Type for Describing JSON Documents http://json-schema.org/latest/json-schema-core.html
NOTE-datetime Wolf, M. and Wicksteed, C. "Date and Time Formats", 1997 https://www.w3.org/TR/NOTE-datetime
OpenAPI OpenAPI Specification, Version 3.0.0 https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md
RFC2119 Bradner, S. "Key words for use in RFCs to Indicate Requirement Levels", March 1997. http://www.ietf.org/rfc/rfc2119.txt
RFC3230 J. Mogul et al "Instance Digests in HTTP" https://www.ietf.org/rfc/rfc3230.txt
RFC5987 J. Reschke. "Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters" https://tools.ietf.org/html/rfc5987
RFC6266 J. Reschke. "Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP)", 2011 https://tools.ietf.org/html/rfc6266
RFC7232 R. Fielding and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests", June 2014 https://tools.ietf.org/html/rfc7232
RFC8615 M Nottingham. "Well-Known Uniform Resource Identifiers (URIs)", 2019 https://tools.ietf.org/html/rfc8615
SWORD 1.3 Downing, J. "SWORD AtomPub Profile version 1.3", 2008. http://www.swordapp.org/docs/sword-profile-1.3.html
SWORD 2.0 Jones, R. and Lewis, S. "SWORD 2.0 Profile", 2011 http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html