SWORDv3 Specification

DRAFT 4

Last modified: 2018-06-04

1. Credits

Technical Lead: Richard Jones, Cottage Labs

Community Lead: Neil Jefferies, University of Oxford

Funder Liaison: Dom Fripp, Jisc

Technical Advisory Group: Adam Rehin, Adrian Stevenson, Alan Stiles, Catherine Jones, Claire Knowles, David Moles, David Wilcox, Eoghan Ó Carragáin, Erick Peirson, Gertjan Filarski, Goosyara Kovbasniy, Graham Triggs, Hideaki Takeda, Jan van Mansum, Jauco Noordzij, Jochen Schirrwagen, John Chodacki, Justin Simpson, Lars Holm Nielsen, Marisa Strong, Martin Wrigley, Masaharu Hayashi, Masud Khokhar, Mike Jackson, Mike Jackson, Morane GRUENPETER, Neil Chue Hong, Paul Walk, Peter Sefton, Ralf Claussnitzer, Ricardo Otelo Santos Saraiva Cruz, Richard Rodgers, Scott Wilson, Shannon Searle, Stephanie Taylor, Stuart Lewis, Tomasz Parkola, Vitali Peil

2. Introduction

SWORD 3.0 is a protocol enabling clients and servers to communicate around complex digital objects, especially with regard to supporting the deposit of these objects into a service like a digital repository. Complex digital objects consist of both Metadata and File content, where the Files may be in a variety of formats, there may be many files, and some may be very large. The protocol defines semantics for creating, appending, replacing, deleting, and retrieving information about these complex resources. It also enables servers to communicate regarding the status of treatment of deposited content, such as exposing ingest workflow information.

The first major version of SWORD [SWORD 1.3] built upon the Resouce creation aspects of AtomPub [AtomPub] to enable fire-and-forget package deposit onto a server.

This approach, where the depositor has no further interaction with the server is of significant value in certain use cases, but there are others where this is insufficient. Consider, for example, that the depositor wishes to construct a digital artifact file by file over a period of time before deciding that it is time to archive it. In these cases, a higher level of interactivity between the participating systems is required, and this is the role that SWORD 2.0 [SWORD 2.0] was subsequently developed to fulfil.

As the use cases for SWORD have developed further, it became clear that the increasing size of files repositories were being asked to deal with was an issue. As a result of this, and the fact that the technological approach for SWORD 2.0 was starting to show its age, a new version, SWORD 3.0, has been developed. This is a radical departure from SWORD 2.0, eliminating ties with AtomPub, and moving to a much stricter REST+JSON approach, utilising JSON-LD for alignment with Linked Data. Its key differences to SWORD 2.0 from a functional perspective are:

3. Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

4. Terminology

4.1. URLs

File-URL
A single Binary File within the Object
FileSet-URL
The aggregate of all Binary Files associated with the Object which are available for SWORD protocol operations to be carried out on them
Metadata-URL
The Metadata resource associated with the Object
Object-URL
An Object that exists on the server, probably as a result of a deposit operation, which is a container for Metadata and zero or more Files.
Service-URL
The location of the document which describes the server's capabilities for the user, and which may accept initial deposits
Staging-URL
A URL provided by the Service where clients can initialise segmented file upload requests
Temporary-URL
A staging area where file segments can be uploaded to the server prior to a deposit operation, obtained from the Staging-URL

4.2. Document Types

Service Document
Describes the capabilities of the server with respect to the user
Metadata Document
A format for depositing and retrieving object Metadata
By-Reference Document
A format for describing one or more files to be deposited By-Reference.
Metadata+By-Reference Document
A single expression of both the Metadata and the By-Reference file deposits.
Status Document
A document describing the current status of the Object and its components
Binary File
An opaque binary file
Packaged Content
A serialisation of the entire Object, consisting of its Metadata and Binary Files.
Error Document
Describes an error that occurred while processing a request.
Segmented File Upload Document
A document describing the current status of a Segmented File Upload

4.3. Namespaces

http://purl.org/net/sword/3.0/
All SWORDv3 extensions are defined within this namespace. This Namespace also serves to identify the SWORD version for a given Service.
http://purl.org/net/sword/3.0/terms/
All predicates associated with SWORDv3
http://purl.org/net/sword/3.0/package/
All packaged formats defined by SWORDv3
http://purl.org/net/sword/3.0/error/
All error documents defined by SWORDv3
http://purl.org/net/sword/3.0/state/
Terms used to describe the state of Objects in SWORDv3
http://purl.org/net/sword/3.0/types/
Namespace for all document types used in SWORDv3
http://purl.org/net/sword/3.0/filestate/
Terms used to descript he state of Files in SWORDv3
http://purl.org/net/sword/3.0/discovery/
Terms used for auto-discovery of SWORDv3 services
http://purl.org/dc/elements/1.1/
The Simple Dublin Core elements. This document uses the prefix dc for this namespace name; for example dc:title
http://purl.org/dc/terms/
The Extended Dublin Core terms. This document uses the prefix dcterms for the namespace name; for example dcterms:abstract

5. Structure of SWORD Objects

Objects, as represented by SWORD, have the following structure:

Structure of a SWORD Object
Figure 1: Structure of a SWORD Object

The SWORD Object is expressed as JSON via the Status Document, along with all its supporting metadata and workflow information.

Each of the three primary File categories can be identified by their rel values, as they appear in the Status Document:

6. HTTP Headers

These are the HTTP headers used by SWORD, and their meanings within the context of the protocol. Where a Default Value is specified, this is what value the client or server MUST take the value to be if it is not provided explicitly in a request or response.

Header Usage
Authorization To pass any HTTP authorization headers, such as the content for basic auth
Content-Disposition Used to transmit information to the server which tells it the nature of the deposit, and any associated parameters
Content-Length Length of the content in the current payload
Content-Type Mimetype of the content being delivered
Digest Checksum for the depositing content. MUST include SHA-256, and allows for other formats such as MD5 and SHA (SHA-1) if still needed by the server.
ETag Object version identifier, as provided by the server on GET requests and any requests which modify the object and return.
If-Match Used to provider the server’s Object version identifier (ETag) for the version on which this request is intended to act. If the supplied ETag does not match, this means that the version on the server has changed since the client’s last operation, the server MUST reject the update. The client will need to retrieve the latest ETag and re-issue the request, taking into account any changes.
In-Progress Whether this operation is part of a larger deposit operation, and the server should expect subsequent related requests before injecting the item into any ingest workflows.

Default Value: false
Location URI for the location where the requested or deposited content can be found
On-Behalf-Of Username of any user the action is being carried out on behalf of
Packaging URI unambiguously identifying the packaging profile

Default Value: http://purl.org/net/sword/3.0/package/Binary
Slug Suggested identifier for the item
Metadata-Format URI unambiguously identifying the metadata format/schema/profile

Default Value: http://purl.org/net/sword/3.0/types/Metadata

7. Protocol Operations

This section lists the actual on-the-wire protocol operations that are part of SWORDv3. Actual usage of each of these operations is dependent on the action that you wish to take. See Protocol Requirements for the rules which govern how to use these Protocol Operations.

The full set of protocol operations is available as an OpenAPI definition [OpenAPI], available as JSON and YAML.

7.1. Error Responses

The following error responses are possible against some or all of the HTTP Requests. In each case an Error Document MUST be returned by the server with details as to the root cause of the error.

7.2. Redirects

Some requests may result in redirect codes being sent to the client; the server MAY respond to any request with a suitable redirect. These are the redirect codes that are used, and what they mean:

7.3. HTTP Requests

These are the HTTP requests that are covered by the SWORD protocol.

Each request MAY be responded to by the server with a redirect code (see above). Each request MAY also generate an error; possible errors are listed for each section, please refer to the section above for details on the meanings of errors.

7.3.1. GET Service-URL

Retrieve the Service Document

Headers

Responses

Code Description
200 Service Document

Body
  • application/json
401
403
404

7.3.2. POST Service-URL

Make a new Object

Headers

Body

Content used to create new Object. This can be one of: Metadata, By-Reference, Metadata+By-Reference, Binary File, Packaged Content, Empty Body

Responses

Code Description
201 Resource created, responds with Status Document

Headers
  • ETag - version identifier
  • Location - Object-URL
Body
  • application/json
202 Resource accepted for processing, responds with Status Document

Headers
  • ETag - version identifier
  • Location - Object-URL
Body
  • application/json
400
401
403
404
405
412
413
415

7.3.3. GET Object-URL

Retrieve the Status information for the Object

Headers

Responses

Code Description
200 Status Document

Headers
  • ETag - version identifier
Body
  • application/json
400
401
403
404
412

7.3.4. POST Object-URL

Append data to an Object

Headers

Body

Content to be appended to the Object. This can be one of: Metadata, By-Reference, Metadata+By-Reference, Binary File, Packaged Content, Empty Body

Responses

Code Description
200 Content appended, responds with Status Document

Headers
  • ETag - version identifier
  • Location - The File-URL
Body
  • application/json
202 Content accepted for append, responds with Status Document

Headers
  • ETag - version identifier
  • Location - The File-URL
Body
  • application/json
400
401
403
404
412
413
415

7.3.5. PUT Object-URL

Replace the Object

Headers

Body

Content to replace the Object. This can be one of: Metadata, By-Reference, Metadata+By-Reference, Binary File, Packaged Content, Empty Body

Responses

Code Description
200 Replace carried out, responds with Status Document

Headers
  • ETag - version identifier
Body
  • application/json
202 Replace accepted for action, responds with Status Document

Headers
  • ETag - version identifier
Body
  • application/json
400
401
403
404
405
412
413
415

7.3.6. DELETE Object-URL

Delete the Object

Headers

Responses

Code Description
204 Object Deleted

Body
  • None
400
401
403
404
405
412

7.3.7. GET Metadata-URL

Retrieve the Metadata

Headers

Responses

Code Description
200 Metadata Document

Headers
  • ETag - version identifier
Body
  • application/json
400
401
403
404
405
412

7.3.8. PUT Metadata-URL

Replace the Metadata

Headers

Body

Content to replace the Metadata. This must be a Metadata Document.

Responses

Code Description
204 Metadata Replaced, no response body

Headers
  • ETag - version identifier
Body
  • None
400
401
403
404
405
412
413
415

7.3.9. PUT FileSet-URL

Replace the FileSet

Headers

Body

Content to replace the FileSet. This can be one of: By-Reference, Binary File, Empty Body

Responses

Code Description
202 FileSet replacement accepted for processing, no response body

Headers
  • ETag - version identifier
Body
  • None
204 FileSet Replaced, no response body

Headers
  • ETag - version identifier
Body
  • None
400
401
403
404
405
412
413

7.3.10. DELETE FileSet-URL

Delete the FileSet

Headers

Responses

Code Description
204 FileSet Deleted

Body
  • None
400
401
403
404
405
412

7.3.11. GET File-URL

Retrieve an individual File

Headers

Responses

Code Description
200 Binary File

Headers
  • ETag - version identifier
Body
  • */*
400
401
403
404
405
412

7.3.12. PUT File-URL

Replace an individual File

Headers

Body

Content to replace the File. This can be one of: By-Reference, Binary File, Empty Body

Responses

Code Description
204 Binary File replaced, no response body

Headers
  • ETag - version identifier
Body
  • None
400
401
403
404
405
412
413

7.3.13. DELETE File-URL

Delete an individual File

Headers

Responses

Code Description
204 Binary File Deleted

Body
  • None
400
401
403
404
405
412

7.3.14. POST Staging-URL

Create a Temporary-URL for Segmented File Upload

Headers

Responses

Code Description
201 Temporary-URL created

Headers
  • Location - The Temporary-URL to which Segmented File Upload requests can be sent
Body
  • None
400
401
403
404
412
413

7.3.15. GET Temporary-URL

Retrieve Information on a Segmented File Upload

Headers

Responses

Code Description
200 Segmented File Upload Document

Body
  • application/json
400
401
403
404

7.3.16. POST Temporary-URL

Upload a File Segment

Headers

Body

Segment to be added to the Resource.

Responses

Code Description
204 Segment Received

Body
  • None
400
401
403
404
405
412

7.3.17. DELETE Temporary-URL

Abort a Segmented File Upload

Headers

Responses

Code Description
204 Temporary File Deleted

Body
  • None
400
401
403
404

8. Protocol Requirements

This section describes the requirements of every kind of operation that you can do with SWORDv3.

There are 3 key aspects of the specification where requirements can be applied, and these are:

  1. Request: The operations that you can perform on the resources
  2. Content: The body content of the request, such as Metadata, By-Reference, Metadata+ByReference, Binary File, Packaged Content, Empty Body
  3. Resource: Service-URL[def], Object-URL[def], Metadata-URL[def], FileSet-URL[def], File-URL[def], Staging-URL[def]. Temporary-URL[def]

When combined for a specific request. For example: Creating (Request) a new Object by request to the Service-URL (Resource) with Packaged Content (Content), these aspects tell you the exact requirements.

The requirements below are presented using a hierarchy; for any given combination of Request, Content and Resource all requirements above the relevant node should be imported when considering the actual requirements for an operation. See the document SWORDv3 Behaviours to see each of the behaviours SWORDv3 is capable of with its requirements fully expanded.

8.1. Requirement Hierarchies

The hierarchy for the Request is:

The hierarchy for the Content is:

The hierarchy for the Resource is:

So, for example, when considering an operation such as "Creating Objects with Packaged Content", this would be take requirements as follows:

8.2. Requirement Groups

RequestContentResource
***
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MAY specify Authorization and On-Behalf-Of headers (i.e. if authenticating this request)
  • If Authorization (and optionally On-Behalf-Of) headers are provided, MUST authenticate the request
  • If authentication fails, MUST respond with a 403
Retrieve*Service-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • GET Service-URL
  • If Authorization (and optionally On-Behalf-Of) headers are provided, MUST only list Service-URLs in the Service Document for which a deposit request would be permitted
  • MUST respond with a valid Service Document or a suitable error response
Retrieve*Object-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • GET Object-URL
  • MUST respond with a valid Status document or a suitable error response
  • MUST include ETag header if implementing concurrency control
Retrieve*Components
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST include ETag header if implementing concurrency control
Retrieve*Metadata-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • GET Metadata-URL
  • MUST respond with a valid Metadata document (see definition below) or a suitable error response
Retrieve*File-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • GET File-URL
  • MUST respond with a File (which may be Packaged Content, a Binary File, a Metadata document, or any other file that the server exposes) or a suitable error response
Modify**
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST provide the Content-Disposition header, with the appropriate value for the request
Modify*Deposit
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST include ETag header if implementing concurrency control
ModifyBodyDeposit
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If all preconditions are met, MUST either accept the deposit request immediately, queue the request for processing, or respond with an error
  • MUST include one or more File-URLs for the deposited content in the Status document. The behaviour of these File-URLs may vary depending on the type of content deposited (e.g. ByReference and Segmented Uploads do not need to be immediately retrievable)
ModifyBody*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST provide the Content-Type and Digest headers
  • SHOULD provide the Content-Length
  • MUST verify that the content matches the Digest header
  • MUST verify that the supplied content matches the Content-Length if this is provided
ModifyMetadata*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • SHOULD provide the Metadata-Format header
  • MUST provide only the Metadata document
  • If no Metadata-Format header is provided, MUST assume this is the standard SWORD format: http://purl.org/net/sword/3.0/types/Metadata
ModifyBy-Reference*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST provide the By-Reference document
  • If downloading copies of the files in the By-Reference document, MUST do this asynchronously to the deposit request
ModifyBy-ReferenceFile-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST only include a single By-Reference File in the By-Reference document
  • If more than one By-Reference File is present, MUST reject the request.
  • If rejecting the request due to the presence of more than one By-Reference File in the By-Reference Document, MUST respond with a 400
ModifyMD+BR*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • SHOULD provide the Metadata-Format header
  • MUST provide the Metadata+By Reference document
  • If no Metadata-Format header is provided, MUST assume this is the standard SWORD format: http://purl.org/net/sword/3.0/types/Metadata
  • If downloading copies of the files in the By-Reference document, MUST do this asynchronously to the deposit request
ModifyBinaryDeposit
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the request MUST attach the supplied file to the Object as an originalDeposit
ModifyBinary File*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MAY provide the Packaging header, and if so MUST be the Binary format identifier
  • MUST provide Binary File body content
  • The server SHOULD NOT attempt to unpack the file
ModifyPackaged Content*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST provide the Packaging header
  • MUST provide Packaged Content in the request body
  • The server MAY attempt to unpack the file, and create derivedResources from it.
ModifyEmpty Body*
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MAY provide the Content-Length header with value 0
  • MUST NOT include any body content
Create*Service-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • POST Service-URL
  • MAY provide the Slug header
  • If a Slug header is provided, MAY use this as the identifier for the newly created Object.
  • If accepting the request MUST create a new Object
  • MUST respond with a Location header, containing the Object-URL
  • MUST respond with a valid Status document or a suitable error response
  • Status document MUST be available on GET to the Object-URL in the Location header immediately (irrespective of whether this is a 201 or 202 response)
  • MUST respond with a 201 if the item was created immediately, a 202 if the item was queued for import, or raise an error.
CreateBodyService-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MAY provide the In-Progress header
  • If no In-Progress header is provided, MUST assume that it is false
  • If In-Progress is false, SHOULD expect further updates to the item, and not progress it through any ingest workflows yet.
CreateMetadataService-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the request MUST populate the Object with the supplied Metadata
CreateBy-ReferenceService-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the request MUST attach the By-Reference files to the Object.
CreateMD+BRService-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the request MUST populate the Object with the supplied Metadata, and attach the By-Reference files to it.
CreateEmpty BodyStaging-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • POST Staging-URL
  • If all preconditions are met, MUST create a resource to which the client can upload file segments
  • MUST reject the request if the conditions of the upload are not acceptable
  • MUST respond with a 201 to indicate that the Segmented Upload has been initialised, or raise an error.
  • MUST respond with a Location header containing the Temporary-URL where the client can upload file segments
Update*Deposit
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST include the If-Match header, if the server implements concurrency control
  • MUST reject the request if the If-Match header does not match the current ETag of the resource
UpdateBodyDeposit
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST respond with a 200 if the request was accepted immediately, a 202 if the request was queued for processing, or raise an error.
UpdateBodyObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MAY provide the In-Progress header
  • If no In-Progress header is provided, MUST assume that it is false
  • MUST respond with a valid Status document or a suitable error response
  • MUST include ETag header if implementing concurrency control
Append*Object-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • POST Object-URL
AppendBinaryObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST respond with a Location header, containing the File-URL of the Original Deposit File
AppendMetadataObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new Metadata MUST add the Metadata to the item, and only treat this as an extension to existing Metadata. The server MUST NOT overwrite or otherwise remove existing Metadata.
Replace*Object-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • PUT Object-URL
ReplaceBinaryObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new File, MUST remove all existing Files from the Object and replace with the new File. The new File should be marked as an originalDeposit. The server MUST also remove all Metadata, so the Metadata Resource contains no fields.
ReplaceMetadataObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new Metadata, MUST remove all existing Files from the Object, and MUST replace the existing Metadata with the new.
ReplaceBy-ReferenceObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new By-Reference files, MUST remove all existing Files from the Object and replace with the By-Reference files. The server MUST remove the existing Files immediately, even before the By-Reference files have dereferenced. The new files MUST be marked as originalDeposits. The server MUST also remove all Metadata, so the Metadata Resource contains no fields.
ReplaceMD+BRObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new Metadata and By-Reference files, MUST remove all existing Files from the Object and replace with the By-Reference files. The server MUST remove the existing Files immediately, even before the By-Reference files have dereferenced. The server MUST also replace all existing Metadata with the new Metadata.
ReplaceMetadataMetadata-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new Metadata MUST entirely replace the existing Metadata with the new.
ReplaceBy-ReferenceFileSet-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new By-Reference Files, MUST replace the existing FileSet with the new files. The server MUST remove all the old files immediately, even before the new By-Reference files have been dereferenced. The new Files MUST be marked as originalDeposits
ReplaceBinary FileFileSet-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new File, MUST replace the existing FileSet with a single new File. The File MUST be marked as an originalDeposit
ReplaceBy-ReferenceFile-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new By-Reference File, MUST replace the existing File. The server MAY keep the previous file as an older version. The new file MUST be marked as an originalDeposit
ReplaceBinary FileFile-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the new File, MUST replace the existing File. The server MAY keep the previous file as an older version. The new File MUST be marked as an originalDeposit
UpdateBodyComponents
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST respond with a 204 if the replacement was deposited immediately, a 202 if the replacement was queued for import, or raise an error.
AppendBy-ReferenceObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the request, MUST attach all the By-Reference files to the Object as originalDeposits
AppendMD+BRObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If accepting the request, MUST attach all the By-Reference files to the Object as originalDeposits, and MUST add the Metadata to the item, and only treat this as an extension to existing Metadata. The server MUST NOT overwrite or otherwise remove existing Metadata.
Replace*Metadata-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • PUT Metadata-URL
Replace*FileSet-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • PUT FileSet-URL
Replace*File-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • PUT File-URL
Delete**
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • MUST respond with a 204 if the delete is successful, or raise an error
Delete*Object-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • DELETE Object-URL
Delete*FileSet-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • DELETE FileSet-URL
Delete*File-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • DELETE File-URL
Delete*Metadata-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • DELETE Metadata-URL
CompleteEmpty BodyObject-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • POST Object-URL
  • MUST provide the header In-Progress: false
  • MAY provide the Content-Length header with value 0
  • MUST NOT include any body content
  • MAY inject the content into any ingest workflows
  • MUST respond with a 204 or a suitable error
AppendBodyTemporary-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • POST Temporary-URL
  • MUST reject the request if the segment is incorrect or unexpected: for example, all segments were already received, or the segment is a different size than expected.
  • MUST respond with a 204 or a suitable error
AppendFile SegmentTemporary-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • If all preconditions are met, MUST accept the file segment, and record the receipt of it
  • MUST be prepared to accept file segments in any order, and in parallel
  • MUST be able to store the incoming file segments as they arrive, and then reconstitute them into a single file when all segments have been received.
RetrieveEmpty BodyTemporary-URL
Protocol OperationRequest RequirementsServer RequirementsResponse Requirements
  • GET Temporary-URL
  • MUST respond with a 200 or a suitable error
  • If successful, MUST respond with a Segmented File Upload Document describing the current state of the upload.

9. Documents

9.1. JSON-LD Context

SWORD defines the semantics of its documents using JSON-LD [JSON-LD]. You can see the full JSON-LD Context here

9.2. Service Document

The Service Document defines the capabilities and operational parameters of the server as a whole, or of a particular Service-URL.

The Service Document consists of a set of properties at the root, and a list of "services". Each service may define a Service-URL and/or additional properties and further nested "services". For the purposes of normalising the data held in the Service Document (for brevity of the serialised document), the Service Document MAY specify at the root properties which MUST be taken to hold true for all nested "services" (at any level below) unless that lower service definition overrides the properties. A service which sits beneath the root of the Service Document and above another Service, MAY also redefine properties, and those overrides MUST be considered to cascade down to Services beneath that one.

A Service Document can be retrieved either for the root of the service, or from any Service within the hierarchy of Services available. If the root Service Document is requested, the full list of Services, including all their children, MUST be provided. If the URL of a Service is requested, it MUST only provide information about itself and its children.

The full JSON Schema [JSON-SCHEMA] can be downloaded here.

An example of the Service Document:

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@id" : "http://example.com/service-document",
  "@type" : "ServiceDocument",

  "dc:title" : "Site Name",
  "dcterms:abstract" : "Site Description",

  "root" : "http://example.com/service-document",
  "acceptDeposits": true,

  "version": "http://purl.org/net/sword/3.0",
  "maxUploadSize" : 16777216000,
  "maxByReferenceSize" : 30000000000000000,
  "maxAssembledSize" : 30000000000000,
  "maxSegments" : 1000,

  "accept" : ["*/*"],
  "acceptArchiveFormat" : ["application/zip"],
  "acceptPackaging" : ["*"],
  "acceptMetadata" : ["http://purl.org/net/sword/3.0/types/Metadata"],

  "collectionPolicy" : {
    "@id" : "http://www.myorg.ac.uk/collectionpolicy",
    "description" : "...."
  },
  "treatment" : {
    "@id" : "http://www.myorg.ac.uk/treatment",
    "description" : "..."
  },

  "staging" : "http://example.com/staging",
  "stagingMaxIdle" : 3600,

  "byReferenceDeposit" : true,
  "onBehalfOf" : true,

  "digest" : ["SHA-256", "SHA", "MD5"],
  "authentication": ["Basic", "OAuth", "Digest", "APIKey"],

  "services" : [
    {
      "@id": "http://swordapp.org/deposit/43",

      "dc:title" : "Deposit Service Name",
      "dcterms:abstract" : "Deposit Service Description",

      "root" : "http://example.com/service-document",
      "parent" : "http://example.com/service-document",
      "acceptDeposits": true,

      "services" : []
    }
  ]
}

The fields available are defined as follows:

Field Type Description
@context string The JSON-LD Context for this document

MUST be present.
@id string The URL of the service document you are looking at

MUST be present.
@type string JSON-LD identifier for the document type

This field is used to define the type of the document, and in this case should always be 'ServiceDocument'. MUST be present.
accept array List of Content Types which are acceptable to the server.

MUST be present. '*/*' for any content type, or a list of acceptable content types
acceptArchiveFormat array List of Archive Formats that the server can unpack. If the server sends a package using a different format, the server MAY treat it as a Binary File

SHOULD be present. '*' for any archive format (not recommended), or a list of acceptable formats. If this is omitted, the client MUST assume the server only supports application/zip
acceptDeposits boolean Does the Service accept deposits?

SHOULD be present. If omitted, the client MUST assume that the service does not accept deposits.
acceptMetadata array List of Metadata Formats which are acceptable to the server.

SHOULD be present. '*' for any metadata format, or a list of acceptable metadata formats. Acceptable metadata formats SHOULD be an IRI for a known format, or any other identifying string if no IRI exists. If this is omitted, the client MUST assume the server only supports the standard SWORD metadata format: http://purl.org/net/sword/3.0/types/Metadata
acceptPackaging array List of Packaging Formats which are acceptable to the server.

SHOULD be present. '*' for any packaging format, or a list of acceptable packaging formats. Acceptable packaging formats SHOULD be an IRI for a known format, or any other identifying string if no IRI exists. If this is omitted, the client MUST assume the server only supports the 3 required SWORD packaging formats (see the section Packaging Formats)
authentication array List of authentication schemes supported by the server.

SHOULD be present. If not provided the client MUST assume the server does not support authentication.
byReferenceDeposit boolean Does the server support By-Reference deposit?

SHOULD be present. If omitted, the client MUST assume the server does not support By-Reference deposit.
collectionPolicy object URL and description of the server’s collection policy.

MAY be present.
collectionPolicy.@id string Collection Policy URL
collectionPolicy.description string Collection Policy Description
dc:title string The title or name of the Service

MUST be present.
dcterms:abstract string A description of the service

MAY be present.
digest array The list of digest formats that the server will accept.

MUST be present, and MUST include SHA-256, MAY include any others.
maxAssembledSize integer Maximum size in bytes as an integer for the total size of an assembled segmented upload

SHOULD be present. If omitted and segmented upload is supported, the client MUST assume the server will accept a file of any size.
maxByReferenceSize integer Maximum size in bytes as an integer for files uploaded by reference.

SHOULD be present. If omitted, the client MUST assume the server will accept a file of any size.
maxSegments integer Maximum number of segments that the server will accept for a single segmented upload, if segmented upload is supported.

SHOULD be present. If omitted, the client MUST assume the server will accept any number of segments.
maxUploadSize integer Maximum size in bytes as an integer for files being uploaded.

SHOULD be present. If omitted, the client MUST assume the server will accept an upload of any size.
onBehalfOf boolean Does the server support deposit on behalf of other users (mediation)

SHOULD be present. If omitted, the client MUST assume the server does not support On-Behalf-Of deposit.
root string The URL for the root Service Document.

MUST be present.
services array List of Services contained within the parent service

MAY be present.
staging string The URL where clients may stage content prior to deposit, in particular for segmented upload

MAY be present. If omitted, the client MUST assume the server does not support Segmented Upload.
stagingMaxIdle integer What is the minimum time a server will hold on to an incomplete Segmented File Upload since it last received any content before deleting it.

SHOULD be present. If omitted, the client MUST assume that the server will hold on to the incomplete file indefinitely. Servers MAY delete the unfinished upload at any time after the minimum time stated here has elapsed.
treatment object URL and description of the treatment content can expect during deposit.

MAY be present.
treatment.@id string Treatment URL
treatment.description string Treatment Description
version string The version of the SWORD protocol this server supports

MUST be present.

9.3. Metadata Document

The default SWORD Metadata document allows the deposit of a standard, basic metadata document constructed using the DCMI terms [DCMI]. This Metadata document can be sent when creating an Object initially, when appending to the metadata, or in replacing the metadata or indeed the Object as a whole.

The format of the document is simple and extensible (see the Metadata Formats section). The dc and dcterms vocabularies are supported, and servers MUST support this metadata format.

The full JSON Schema [JSON-SCHEMA] can be downloaded here.

An example of the Metadata Document:

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@id" : "http://example.com/object/1/metadata",
  "@type" : "Metadata",

  "dc:title" : "The title",
  "dcterms:abstract" : "This is my abstract",
  "dc:contributor" : "A.N. Other"
}

The fields available are defined as follows:

Field Type Description
@context string The JSON-LD Context for this document

MUST be present.
@id string The URL of the Metadata Document you are looking at

MUST be present.
@type string JSON-LD identifier for the document type

This field is used to define the type of the document, and in this case should always be 'Metadata'. MUST be present.
^dc:.+$ string Properties from the DC namespace

MAY be present.
^dcterms:.+$ string Properties from the DCTERMS namespace

MAY be present.

When sending this document, the client MUST provide a Content-Disposition header of the form:

Content-Disposition: attachment; metadata=true

Additionally, when sending this document the client SHOULD provide the Metadata-Format header with the identifier for the format: http://purl.org/net/sword/3.0/types/Metadata

Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata

If the client omits the Metadata-Format header, the server MUST assume that it is the above format.

9.4. By-Reference Document

The By-Reference document allows the client to send a list of one or more files that the server will fetch asynchronously. The By-Reference document can be sent when creating an Object initially, or when appending to or replacing the FileSet in the Object, or replacing the Object as a whole.

The full JSON Schema [JSON-SCHEMA] can be downloaded here.

An example of the By-Reference Document:

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@type" : "ByReference",

  "byReferenceFiles" : [
    {
      "@id" : "http://www.otherorg.ac.uk/by-reference/file.zip",
      "contentType" : "application/zip",
      "contentLength" : 123456,
      "contentDisposition" : "attachment; filename=file.zip",
      "packaging" : "http://purl.org/net/sword/packaging/SimpleZip",
      "digest" : "SHA256=....",
      "ttl" : "2018-04-16T00:00:00Z",
      "dereference" : true
    }
  ]
}

The fields available are defined as follows:

Field Type Description
@context string The JSON-LD Context for this document

MUST be present.
@type string JSON-LD identifier for the document type

This field is used to define the type of the document, and in this case should always be 'ByReference'. MUST be present.
byReferenceFiles array List of files to deposit By-Reference

MUST be present and contain one or more entries
byReferenceFiles[].@id string The URL of the file to be retrieved and deposited

MUST be present
byReferenceFiles[].contentDisposition string Content-Disposition as it would have been supplied if this were a regular file deposit.

MUST be present
byReferenceFiles[].contentLength integer Content-Length as it would have been supplied if this were a regular file deposit.

SHOULD be present
byReferenceFiles[].contentType string The Content-Type of the file to be retrieved and deposited

MUST be present
byReferenceFiles[].dereference boolean Should the server dereference the file (i.e. download it and store it locally) or should it simply maintain a link to the external resource.

MUST be present. Note that servers MAY choose to do both, irrespective of the value here, though if false, the server should make the external link available to users accessing the resource.
byReferenceFiles[].digest string Digest as it would have been supplied if this were a regular file deposit.

MUST be present
byReferenceFiles[].packaging string The packaging format of the file, or the Binary file identifier

SHOULD be present. If this is not provided, the server MUST assume this is the Binary format: http://purl.org/net/sword/3.0/package/Binary
byReferenceFiles[].ttl string A timestamp which indicates when the file will no longer be available (Time To Live)

If no date is provided, the server MAY assume the file will be available indefinitely.

When sending this document, the client MUST provide a Content-Disposition header of the form:

Content-Disposition: attachment; by-reference=true

9.5. Metadata + By-Reference Document

In some cases it is convenient to be able to send both Metadata and By-Reference files in a single request. This is possible because both Metadata and By-Reference documents are simply JSON documents; contrast this with sending Metadata and Binary Files, where a package is required.

To do this, the client may include the Metadata and By-Reference documents embedded in a single JSON document, structured as shown below. The entire Metadata document (including its JSON-LD @context) is embedded in a field entitled metadata, and the entire By-Reference document (again, with its JSON-LD @context) is embedded in a field entitled by-reference.

When a document of this form is sent, the client MUST set the Content-Disposition header appropriately, to alert the server of its required behaviour.

An example of the Metadata + By-Reference Document:

{
  "metadata" : {
    "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
    "@type" : "Metadata",

    "dcterms:abstract" : "....",
    "dc:contributor" : "...",
    "etc..." : "...."
  },

  "by-reference" : {
    "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",
    "@type" : "ByReference",

    "byReferenceFiles" : []
  }
}

When sending this document, the client MUST provide a Content-Disposition header of the form:

Content-Disposition: attachment; metadata=true; by-reference=true

Additionally, when sending this document the client SHOULD provide the Metadata-Format header with the identifier for the format: http://purl.org/net/sword/3.0/types/Metadata

Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata

If the client omits the Metadata-Format header, the server MUST assume that it is the above format.

9.6. Status Document

The status document is provided in response to a deposit operation on a Service-URL, and can be retrieved at any subsequent point by a GET on the Object-URL, and is returned each time the client takes action on the Object-URL. It tells the client detailed information about the content and current state of the item.

The full JSON Schema [JSON-SCHEMA] can be downloaded here.

An example of the Status Document:

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@id" : "http://example.com/object/1",
  "@type" : "Status",
  "eTag" : "...",

  "metadata" : {
    "@id" : "http://www.myorg.ac.uk/sword3/object1/metadata",
    "eTag" : "..."
  },
  "fileSet" : {
    "@id" : "http://www.myorg.ac.uk/sword3/object1/fileset",
    "eTag" : "..."
  },

  "service" : "http://swordapp.org/deposit/43",

  "state" : [
    {
      "@id" : "http://purl.org/net/sword/3.0/state/inProgress",
      "description" : "the item is currently inProgress"
    }
  ],

  "actions" : {
    "getMetadata" : true,
    "getFiles" : true,
    "appendMetadata" : true,
    "appendFiles" : true,
    "replaceMetadata" : true,
    "replaceFiles" : true,
    "deleteMetadata" : true,
    "deleteFiles" : true,
    "deleteObject" : true
  },

  "lastAction" : {
    "timestamp" : "[xsd:dateTime]",
    "log" : "description of the event that occurred, with any verbose information",
    "treatment" : {
      "@id" : "http://www.myorg.ac.uk/treatment",
      "description" : "treatment description"
    }
  },

  "links" : [
    {
      "@id" : "http://www.myorg.ac.uk/col1/mydeposit.html",
      "rel" : ["alternate"],
      "contentType" : "text/html"
    },
    {
      "@id" : "http://www.myorg.ac.uk/sword3/object1/package.zip",
      "rel" : ["http://purl.org/net/sword/3.0/terms/originalDeposit"],
      "contentType" : "application/zip",
      "packaging" : "http://purl.org/net/sword/3.0/package/SimpleZip",
      "depositedOn" : "[timestamp]",
      "depositedBy" : "[user identifier]",
      "depositedOnBehalfOf" : "[user identifier]",
      "byReference" : "http://www.otherorg.ac.uk/by-reference/file.zip",
      "status" : "http://purl.org/net/sword/3.0/filestate/ingested",
      "log" : "[any information associated with the deposit that the client should know]"
    },
      {
      "@id" : "http://www.myorg.ac.uk/sword3/object1/file1.pdf",
      "rel" : [
        "http://purl.org/net/sword/3.0/terms/fileSetFile",
        "http://purl.org/net/sword/3.0/terms/derivedResource"
      ],
      "contentType" : "application/pdf",
      "derivedFrom" : "http://www.myorg.ac.uk/sword3/object1/package.zip",
      "dcterms:relation" : "http://www.myorg.ac.uk/repo/123456789/file1.pdf",
      "dcterms:replaces" : "http://www.myorg.ac.uk/sword3/object1/versions/file1.1.pdf",
      "eTag" : "..."
    },
    {
      "@id" : "http://www.myorg.ac.uk/sword3/object1/package.1.zip",
      "rel" : ["http://purl.org/net/sword/terms/packagedContent"],
      "contentType" : "application/zip",
      "packaging" : "http://purl.org/net/sword/3.0/package/SimpleZip"
    },
    {
      "@id" : "http://www.swordserver.ac.uk/col1/mydeposit/metadata.xml",
      "rel" : ["http://purl.org/net/sword/3.0/terms/formattedMetadata"],
      "contentType" : "text/json",
      "metadataFormat" : "http://purl.org/net/sword/3.0/types/Metadata"
    },
    {
      "@id" : "http://www.myorg.ac.uk/sword3/object1/versions/file1.1.pdf",
      "rel" : ["http://purl.org/net/sword/3.0/terms/derivedResource"],
      "contentType" : "application/pdf",
      "dcterms:isReplacedBy" : "http://www.myorg.ac.uk/sword3/object1/file1.pdf",
      "versionReplacedOn" : "[xsd:dateTime]"
    },
    {
      "@id" : "http://www.myorg.ac.uk/sword3/object1/reference.zip",
      "rel" : [
        "http://purl.org/net/sword/3.0/terms/byReferenceDeposit",
        "http://purl.org/net/sword/3.0/terms/originalDeposit",
        "http://purl.org/net/sword/3.0/terms/fileSetFile"
      ],
      "byReference" : "http://www.otherorg.ac.uk/by-reference/file2.zip",
      "log" : "Any information on the download, especially if it failed",
      "eTag" : "...",
      "status" : "http://purl.org/net/sword/3.0/filestate/ingested"
    }
  ],

  "forwarding" : [
    {
      "@id" : "http://www.otherorg.ac.uk/sword3/object12",

      "links" : [
        {
          "@id" : "http://www.otherorg.ac.uk/col2/yourdeposit.html",
          "rel" : ["alternate"],
          "contentType" : "text/html"
        }
      ]
    }
  ]
}

The fields available are defined as follows:

Field Type Description
@context string The JSON-LD Context for this document

MUST be present.
@id string The Object-URL for this document

MUST be present
@type string JSON-LD identifier for the document type

This field is used to define the type of the document, and in this case should always be 'Status'. MUST be present.
actions object Container for the list of actions that are available against the object for the client.

MUST be present
actions.appendFiles boolean Whether the client can issue a request to append one or more files (individually or via a package) to the item

MUST be present
actions.appendMetadata boolean Whether the client can issue a request to append the metadata of the item

MUST be present
actions.deleteFiles boolean Whether the client can issue a request to delete files in the item. This may be a single file or all files.

MUST be present
actions.deleteMetadata boolean Whether the client can issue a request to delete all the item metadata.

MUST be present
actions.deleteObject boolean Whether the client can issue a request to delete the entire object.

MUST be present.
actions.getFiles boolean Whether the client can issue a request to retrieve any/all files in the item (both Binary Files and Packaged Content)

MUST be present
actions.getMetadata boolean Whether the client can issue a request to retrieve the item metadata

MUST be present
actions.replaceFiles boolean Whether the client can issue a request to replace files in an item. This may be a single file or all of the files.

MUST be present
actions.replaceMetadata boolean Whether the client can issue a request to replace the item metadata.

MUST be present
eTag string The current ETag for the Object

MUST be present if the repository enforces concurrency control
fileSet object Information about the identifier/version of the Object's FileSet

MUST be present.
fileSet.@id string The FileSet-URL for this Object

MUST be present.
fileSet.eTag string The Etag for the FileSet

MUST be present if the server supports concurrency control
forwarding array List of other locations where the object is available.

MAY be present
forwarding[].@id string The SWORD identifier for the Object in the other system

MAY be present
forwarding[].links array List of links to the Object as it appears in the other system

MAY be present
forwarding[].links[].@id string The URL of a representation of the Object in the other system

MUST be present
forwarding[].links[].contentType string The Content Type of the resource

MAY be present
forwarding[].links[].rel array The relationship to the Object that this URL has

MAY be present
lastAction object Container for information about the last action taken on the object by a client (not necessarily the current client).

SHOULD be present, if appropriate
lastAction.log string Detailed log information about the last action

MAY be present
lastAction.timestamp string When the last action was taken by the client

SHOULD be present
lastAction.treatment object Container for information about the treatment the item received in the last action

MAY be present
lastAction.treatment.@id string URL for information about the treatment the item received

MAY be present
lastAction.treatment.description string Description of the treatment the item received

MAY be present
links array List of link objects referring to the various files, both content and metadata, available on the object

MUST be present if there is one or more links available to the client
links[].@id string The URL of the resource

MUST be present
links[].byReference string The external URL of the location a By-Reference deposit was retrieved from

SHOULD be present if this is an Original Deposit that was deposited By-Reference, or is an active By-Reference deposit
links[].contentType string Content type of the resource

SHOULD be present
links[].dcterms:isReplacedBy string URL to a newer version of the file in the same Object, if this is present as a resource

SHOULD be present, if newer version is present
links[].dcterms:relation string URL to a non-sword access point to the file

MAY be present. For example, the URL from which an end-user would download the file via the website. This related URL does not need to support any of the SWORD protocol operations, and indeed may even be on a server or application which has no sword support. Primary use case is to redirect the user to the web front end for the repository.
links[].dcterms:replaces string URL to an older version of the file in the same Object, if this is also present as a resource.

SHOULD be present, if an older version of the file is present
links[].depositedBy string Identifier for the user that deposited the item

SHOULD be present if this is an Original Deposit
links[].depositedOn string Timestamp of when the deposit happened

SHOULD be present if this is an Original Deposit
links[].depositedOnBehalfOf string Identifier for the user that the item was deposited on behalf of.

SHOULD be present if this is an Original Deposit that was done On-Behalf-Of another user
links[].derivedFrom string Reference to URL of resource from which the current resource was derived, for example, if extracted from a package that was deposited.

SHOULD be present, if the resource is derived from another resource
links[].eTag string The eTag of the resource

MUST be present if the server supports concurrency control and the resource is available to the client to modify
links[].log string Any information associated with the deposit that the client should know.

MAY be present
links[].packaging string The package format identifier if the resource is a package.

SHOULD, if the resource is a package
links[].rel array The relationship between the resource and the object.

MUST be present. Note that multiple relationships are supported.
links[].status string The status of the resource, with regard to ingest.

SHOULD be present. For example, packaged resources which are still being unpacked and ingested may announce their status here. Likewise, by-reference deposits may do the same. MUST be one of the allowed status URIs. Any associated information to go along with the status, especially if the status is an error, SHOULD be in link[].log. If no value is provided, the client MUST assume that the item is in the status: http://purl.org/net/sword/3.0/filestate/ingested
links[].versionReplacedOn string Date that the current resource was replaced by a newer resource

SHOULD be present if dcterms:isReplacedBy is present
metadata object Information about the identifier/version of the Object's Metadata

MUST be present.
metadata.@id string The Metadata-URL for this Object

MUST be present
metadata.eTag string The ETag for the Metadata

MUST be present if the server supports concurrency control
service string The URL for the service to which this item was deposited (the Service-URL)

MUST be present. This is the URL from which the client can retrieve information about the settings for the server that are relevant to this item (e.g. max upload sizes, etc)
state array List of states that the item is in on the server.

At least one state MUST be present, using the SWORD state vocabulary. Other states using server-specific vocabularies may also be used alongside.
state[].@id string Identifier for the state.

MUST be present. At least one such identifier MUST be from the SWORD state vocabulary.
state[].description string Human readable description of the state

MAY be present

9.6.1. Available rel types and their meanings

alternate

An alternate, non-SWORD URL which will allow the user to access the same object. For example, this could be the URL of the landing page in the repository for the item.

http://purl.org/net/sword/3.0/terms/originalDeposit

The resource (file or package) was explicitly deposited via some deposit operation.

The relevant properties of the link section for any resource with this rel are

http://purl.org/net/sword/3.0/terms/derivedResource

A file which was unpacked or otherwise derived from another deposited resource, and which itself was not explicitly deposited through some deposit operation. The main usage would be to identify files which were extracted from a deposited zip file.

The relevant properties of the link section for any resource with this rel are

http://purl.org/net/sword/terms/packagedContent

A resource which makes this object available packaged in the specified package format on HTTP GET. This is not a resource which has been deposited or derived (though it may be very similar to an originally deposited package), it is one which the server makes available as a service to the client. Packages may be pre-built or assembled on the fly - that responsibility rests with the server.

The relevant properties of the link section for any resource with this rel are

http://purl.org/net/sword/3.0/terms/formattedMetadata

A resource which makes this object’s metadata available, serialised in the specified metadata format on HTTP GET. This is not a resource which has been deposited or derived (though it may be very similar to the originally deposited metadata), it is one which the server makes available as a service to the client. Metadata documents may be pre-built or assembled on the fly - that responsibility rests with the server.

The relevant properties of the link section for any resource with this rel are

http://purl.org/net/sword/3.0/terms/byReferenceDeposit

A file which is currently being downloaded from an external reference. Often will also have the rel for originalDeposit, and once all segments have been uploaded the byReferenceDeposit rel can be removed.

The relevant properties of the link section for any resource with this rel are

http://purl.org/net/sword/3.0/terms/fileSetFile

A File which can be considered by the client to be part of the FileSet. Files in this state are available for modification via the SWORD protocol, and should be considered to form the actual "content" of the Object.

9.6.2. Required SWORD State Information

state/@id MUST contain one of:

http://purl.org/net/sword/3.0/state/accepted
for records accepted for processing but not yet created
http://purl.org/net/sword/3.0/state/inProgress
for records that have been deposited, but for which the deposit has not yet completed
http://purl.org/net/sword/3.0/state/inWorkflow
for records that are in the server’s ingest workflow
http://purl.org/net/sword/3.0/state/ingested
for records that are in the server’s archive state, whatever that might mean (e.g. published to the web)
http://purl.org/net/sword/3.0/state/rejected
for records that have been rejected from the server’s workflow
http://purl.org/net/sword/3.0/state/deleted
for tombstone records

The state field is a list, so it may also contain other states that are server-specific in addition to the SWORD values.

9.6.3. Ingest Statuses for Individual Files

Some files, when deposited, may be processed asynchronously to the client’s request. For example, large files that require unpacking, by-reference deposits, etc. In these cases, the client will not receive feedback on the state or success of their deposit in the request/response exchange. Instead, the client may monitor the file(s) via the Status document, and for each appropriate file (Original Deposits), a “status” field will provide information on the current status of processing for that file.

The following statuses are permitted, servers SHOULD provide one of these by each relevant file:

http://purl.org/net/sword/3.0/filestate/pending
the server has not yet started to process this file. It may be in a queue, or it may still be in the process of deposit via a Segmented Upload.
http://purl.org/net/sword/3.0/filestate/downloading
the server has started to download your By-Reference file, and is not yet complete
http://purl.org/net/sword/3.0/filestate/unpacking
the server has started unpacking your Packaged Content, and is not yet finished
http://purl.org/net/sword/3.0/filestate/error
there was an error either downloading or unpacking your file; information should be available in the “log” field to aid the client in understanding what went wrong.
http://purl.org/net/sword/3.0/filestate/ingested
the file has been successfully ingested

9.7. Segmented File Upload Document

A client may request information on an ongoing Segmented File Upload at any point via a GET to the Temporary-URL.

The full JSON Schema [JSON-SCHEMA] can be downloaded here.

An example of the Segmented File Upload Document:

{
    "@context": "https://swordapp.github.io/swordv3/swordv3.jsonld",
    "@id": "http://example.com/temporary/1",
    "@type": "Temporary",

    "segments": {
        "received": [
            1,
            2,
            4
        ],
        "expecting": [
            3,
            5
        ],
        "size": 10000000,
        "segment_size": 2000000
    }
}

The fields available are defined as follows:

Field Type Description
@context string The JSON-LD Context for this document

MUST be present.
@id string The Temporary-URL for this document

MUST be present
@type string JSON-LD identifier for the document type

This field is used to define the type of the document, and in this case should always be 'Temporary'. MUST be present.
segments object Container for information on file segments

MUST be present
segments.expecting array This list of integers identifying the segments which are expected and that have not yet been deposited

MUST be present if there are any segments remaining to be uploaded
segments.received array The list of integers identifying the segments that have been successfully uploaded so far.

MUST be present if one or more segments have been uploaded
segments.segment_size integer The expected size in bytes of the segments (except the final one) that will be uploaded.

MUST be present.
segments.size integer The expected size in bytes of the final resulting assembled file.

MUST be present.

9.8. Error Document

An error document is returned at any point that a synchronous operation fails.

The full JSON Schema [JSON-SCHEMA] can be downloaded here.

An example of the Error Document:

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@type" : "BadRequest",

  "timestamp" : "[xsd:dateTime]",
  "error" : "error summary",
  "log" : "text log of any debug information for the client"
}

The fields available are defined as follows:

Field Type Description
@context string The JSON-LD Context for this document

MUST be present
@type string JSON-LD identifier for the document type

This field is used to define the type of the document, and in this case should be one of the allowed Error Doucment types. MUST be present.
error string A short summary/title for the error

MUST be present
log string Some detail as to the error, with any information that might help resolve it.

SHOULD be present
timestamp string When the error occurred.

MUST be present

9.8.1. Error Types

The following are the error types that are available (to place in @type), their associated HTTP Status Code, and the legitimate reasons for returning that error:

Error Type Error Code Reason
AuthenticationFailed 403 The request supplied invalid credentials, or no credentials, when the server was expecting to authenticate the request.
BadRequest 400 The request did not meet the standard specified by the SWORD protocol. This error can be used when no other error is appropriate
ByReferenceFileSizeExceeded 400 The client supplied a By-Reference deposit file, which specified a file size which exceeded the server's limit
ByReferenceNotAllowed 412 The client attempted to carry out a By-Reference deposit on a server which does not support it
ContentMalformed 400 The body content of the request was malformed in some way, such that the server cannot read it correctly.
ContentTypeNotAcceptable 415 The Content-Type header specifies a content type of the request which is in a format that the server cannot accept.
DigestMismatch 412 One or more of the Digests that the server checked did not match the deposited content
ETagNotMatched 412 The client supplied an If-Match header which did not match the current ETag for the resource being updated.
ETagRequired 412 The client did not supply an If-Match header, when one was required by the server
FormatHeaderMismatch 415 The Metadata-Format or Packaging header does not match what the server found when looking at the Metadata or Packaged Content supplied in a request.
InvalidSegmentSize 400 The client sent a segment that was not the final segment, and was not the size that it indicated segments would be
MaxAssembledSizeExceeded 400 During a segmented upload initialisation, the client specified a total file size which is larger than the maximum assembled file size supported by the server
MaxUploadSizeExceeded 413 The request supplied body content which is larger than that supported by the server.
MetadataFormatNotAcceptable 415 The Metadata-Format header specifies a metadata format for the request which is in a format that the server cannot accept
MethodNotAllowed 405 The request is for a method on a resource that is not permitted. This may be permanent, temporary, and may depend on the client’s credentials
OnBehalfOfNotAllowed 412 The request contained an On-Behalf-Of header, although the server indicates that it does not support this.
PackagingFormatNotAcceptable 415 The Packaging header specifies a packaging format for the request which is in a format that the server cannot accept
SegmentedUploadNotAllowed 412 The client attempted to carry out a Segmented Upload on a server which does not support it
SegmentedUploadTimedOut 405 The client's segmented upload URL has timed out. Servers MAY respond to this with a 404 and no explanation also.
SegmentLimitExceeded 400 During a segmented upload initialisation, the client specified a total number of intended segments which is larger than the limit specified by the server
UnexpectedSegment 400 The client sent a segment that the server was not expecting; in particular the server may have recieved all the segments it was expecting, and this is an extra one
ValidationFailed 400 The server could not validate the structure of the incoming content against its expected schema. This may include the JSON schema of the SWORD documents, the metadata held within those documents, or the expected structure of packaged content.

10. Authentication and Authorisation

It is strongly RECOMMENDED that SWORD servers support authentication and authorisation for requests.

SWORD servers are not restricted in the forms of authentication that they employ, and there is no minimum requirement or default supported approach.

10.1. Announcing Support for Authentication Schemes

Servers SHOULD enumerate the authentication schemes that they support in the Service Document, in the field authentication, and MUST draw from the IANA registry of HTTP auth scheme names [IANA Auth] where one is available.

Where an authentication scheme is in use by the server which is not covered by the IANA registry - such as a custom API-token-based approach, the server MAY indicate this in whatever way seems most appropriate.

For example, a Server which supports Basic, Digest and OAuth authentication, as well as a custom API-Key approach could indicate as follows:

{
  "authentication": [
    "Basic", 
    "OAuth", 
    "Digest", 
    "APIKey"
  ]
}

Servers MAY also choose to support On-Behalf-Of deposit, which means that the authenticating user is providing content to the server, as if another user were actually carrying out this request. A use case for this would be when a known third-party deposit tool is sending content to a server and has been authorised by another user to add content on their behalf.

If a server supports On-Behalf-Of deposit, it SHOULD indicate this in the Service Document with the field onBehalfOf set to true. If this field is not present clients MUST assume that the server does not support On-Behalf-Of deposit.

{
  "onBehalfOf": true
}

10.2. Authentication and Authorisation in requests

When carrying out authenticated requests, Authorization headers MUST be sent with every request to the server - the server is not responsible for maintaining state for the client. The server is responsible for authenticating and authorising every request individually. Clients may choose also to send Cookie headers, and servers may support these, but support for Cookies is explicitly outside this specification.

When an On-Behalf-Of deposit is received, the server MUST ensure that the user identified in that header is valid with respect to the associated Authorization header. For example, when using OAuth2, the On-Behalf-Of user MUST match the user for which the token in the Authorization header was granted.

10.3. Recording Depositing Users

In all cases (On-Behalf-Of or not) where a user has authenticated to make a deposit, servers SHOULD preserve the user's identity in the depositedBy property of the Original Deposit in the Status document. In On-Behalf-Of deposit, the value given in the On-Behalf-Of header SHOULD be used for the value of the depositedOnBehalfOf property of the Original Deposit in the Status document.

Note that recording a user's identity in this way does not have to contain enough information for the client to directly identify the user, and implementers should take note of privacy legislation when choosing what information to expose in these fields.

11. Transport Security

It is strongly RECOMMENDED that servers implement modern transport layer security, whether authenticating requests or not. If you are carrying out authenticated protocol operations you MUST implement TLS.

12. Content Disposition

SWORD uses the Content-Disposition header in client requests to indicate to the server information about the payload being delivered. Traditionally Content-Disposition is an HTTP response header, but it makes sense in the PUSH context of SWORD to use this as a request header. We follow [RFC6266] for its usage.

Implementers should also note [RFC5987] if sending filenames which require characters outside the ISO-8859-1 character set.

The general format of a Content-Disposition header is as follows:

Content-Disposition: [disposition type]; [disposition param]=[value]; ...

There are three general deposit operations in SWORD:

  1. A direct upload of some content, which may be Metadata, a By-Reference document, or a Binary File (which may itself be Packaged Content)
  2. A Segmented Upload Initialisation
  3. A File Segment for a Segmented Upload

Each of these has a different Content-Disposition, which makes it clear to the server what it should do with that content.

There are two aspects which control what the Content-Disposition should be:

The requirements below define what Disposition Type and Parameters are required for each kind of request. The requirements should be interpreted according to the following hierarchy for each of the above aspects:

The hierarchy for the Upload Type is:

The hierarchy for the Content is:

So, for example, if delivering a Metadata+By-Reference Document (MD+BR) as a Direct Deposit, you would take into account the following requirements:

The requirements are:

Upload TypeContent
Direct Deposit*
Disposition TypeParam
  • attachment
Direct DepositMetadata
Disposition TypeParam
  • metadata=true - Indicates that the body content of the request contains Metadata. A Direct Deposit containing Metadata MUST contain this parameter.
Direct DepositBy-Reference
Disposition TypeParam
  • by-reference=true - Indicates that the body content of the request contains By-Reference files. A Direct Deposit containing By-Reference files MUST contain this parameter.
Direct DepositMD+BR
Disposition TypeParam
  • metadata=true - Indicates that the body content of the request contains Metadata. A Direct Deposit containing Metadata MUST contain this parameter.
  • by-reference=true - Indicates that the body content of the request contains By-Reference files. A Direct Deposit containing By-Reference files MUST contain this parameter.
Direct DepositBinary File
Disposition TypeParam
  • filename=[filename] - Indicates the intended filename of the deposited file. MAY be present, and if present the server SHOULD respect it. If using a character set outside of ISO-8859-1, you MUST use filename* instead.
Direct DepositPackaged Content
Disposition TypeParam
  • filename=[filename] - Indicates the intended filename of the deposited file. MAY be present, and if present the server SHOULD respect it. If using a character set outside of ISO-8859-1, you MUST use filename* instead.
Segmented Upload InitialisationEmpty Body
Disposition TypeParam
  • segment-init
  • size=[bytes] - The total size of the final file. This MUST be sent so that the server can determine when all the bytes of the file have been uploaded.
  • digest=[digest] - The Digest information for the resulting file as a whole, after assembly. This MUST be present, and MUST be in the same form as if it were the HTTP header you would use if depositing this file as a whole.
  • segment_count=[n] - The total number of segments that will be sent to the Temporary-URL. This MUST be present. Later, any segment uploads with segment_number greater than this number MUST be rejected by the server.
  • segment_size=[bytes] - The size of each segment (except the final segment) that the client will be sending. This MUST be present. If a non-final segment is sent with a different size, this MUST be rejected by the server.
File Segment UploadFile Segment
Disposition TypeParam
  • segment
  • segment_number=[n] - The position in the full sequence of this segment. This MUST be present. It MUST be an integer, and MUST start counting at 1. Full list of segments MUST be a sequential list of integers.

The following examples show a number of key cases:

A Metadata Deposit

Content-Disposition: attachment; metadata=true

A By-Reference Deposit

Content-Disposition: attachment; by-reference=true

A Metadata+By-Reference Deposit

Content-Disposition: attachment; metadata=true; by-reference=true

A Binary File Deposit

Content-Disposition: attachment; filename=[filename]

A Segmented Upload Initialisation

Content-Disposition: segment-init; size=[bytes]; digest=[digest]; segment_count=[n]; segment_size=[bytes]

A File Segment Upload

Content-Disposition: segment; segment_number=[n]

13. Content Digests

In order to ensure that the content transmitted via SWORD is correct when it arrives at its destination, clients MUST provide Digests that servers MUST check against incoming content.

13.1. Announcing Support For Digests

Servers can announce support for the Digest formats that they support in the Service Document as follows:

{
  "digest": [
    "SHA-256", 
    "SHA", 
    "MD5"
  ]
}

The Server SHOULD list all the digest formats that it supports. Servers MUST support at least SHA-256 and MAY support any other digest formats.

The Digest formats MUST be identified as per the IANA HTTP Digest Algorithm values: [IANA Digest]

13.2. Transmitting Digests

SWORD uses the recommendations of [RFC3230] for transmitting base64 encoded Digests of request bodies.

For every request where there is a request body, the client MUST attach the Digest header with the appropriate content:

Digest: SHA-256=MzA1ZmIzMDJiZjA4MzUzYTg5ZGY4NDIxMjcyY2JmZTEwNzM5ODdmMjJhY2Y1ZDc5NzFhOTY3MmM1MGNkN2ZlMA==

Note that the client MAY send multiple digests from different algorithms, separated by commas in the header:

Digest: SHA-256=MzA1ZmIzMDJiZjA4MzUzYTg5ZGY4NDIxMjcyY2JmZTEwNzM5ODdmMjJhY2Y1ZDc5NzFhOTY3MmM1MGNkN2ZlMA==, MD5=ZjQxNjA3N2M3MDdhODJkZGJlMGE0YTk2NGRjZWEyNWE=

The server MUST validate at least one digest, SHOULD validate all digests, though MAY choose its preferred format to validate against.

14. Concurrency Control

Servers MAY choose to implement concurrency control, in order to ensure that clients do not accidentally overwrite or make changes that conflict with other changes which may have happened to the Object since it was first deposited. Note that this does not prevent clients causing damage to Objects, only that it cannot be so easily done by accident.

Objects may change for a number of reasons after their initial creation, such as:

In order to provide concurrency control, SWORD follows [RFC7232], and specificially uses the ETag and If-Match headers.

On each request for a resource, or when the Status document is retrieved, the ETag for the resource MUST be returned. The ETag gives the client an opaque identifier for the current version of that resource. When the resource is being updated by the client (for example, it is replacing a File), the ETag that the client expects to be the current one MUST be sent in the If-Match header. The server MUST then compare that with its actual current ETag for the resource. If they match, the request can go ahead, otherwise the Server MUST respond with an error (412).

Note that ETags, and Concurrency Control in general, is only applicable from the Object downwards. There are no requirements for use of ETag or If-Match headers on Service-URLs.

14.1. Announcing Support for Concurrency Control

The server does not have to announce support for concurrency control in the Service Document. Clients MUST check response headers for the presence of an ETag. Presence of the ETag indicates that the server requires the client to pay attention to its concurrency control procedures, and to carry out later requests with an If-Match header.

If supporting concurrency control, Servers MUST provide an ETag on all responses to requests (GET, POST, PUT) against resources from the Object and below.

14.2. Procedures around Concurrency Control

If a server supports Concurrency Control, it MUST behave in accordance with the following rules.

14.3. Resource Hierarchy for ETag Regeneration

If an ETag of a resource changes, the resources above it (up to the level of the Object) MUST also change. This is to prevent a change at a higher level (e.g. an Object replacement) overwriting a change at a lower level (e.g. addition of a single file).

The Object hierarchy is as follows:

So, for example, if the Metadata is updated, then the Metadata and Object ETags MUST change, but the FileSet and File ETags MAY NOT. Similarly, if a File ETag changes, then the FileSet and Object ETags must also change, while the Metadata ETag MAY NOT.

15. Continued Deposit

Some systems may wish to give the client more control over the ingest process, and SWORD uses the In-Progress HTTP header to allow the client to indicate that a deposit should not yet be injected into any post-submission or pre-ingest workflow. The In-Progress header MUST take the value true or false, and if it is not present the server MUST assume that it is false and behave as described below.

An example use case for this is that the client may be embedded into a system which uses the SWORD server as a storage layer, but which cannot acquire all of the content for a "finished" item in one deposit operation. Consider a user-facing system which encourages users to upload files one at a time through some web interface, which causes each file to be directly deposited onto the SWORD server. At the start of the deposit the client asserts that deposit is In-Progress: true, and then proceeds to upload files. If uploading them to the Object-URL the client continues to assert In-Progress: true on each request (if depositing to other URLs this is not necessary). This goes on until the user confirms that they have uploaded all the relevant files, or navigates away from the page. At that stage, the client can issue a blank HTTP POST request to the SWORD server, with In-Progress: false to complete the deposit.

Note that the In-Progress header is intended to indicate to the server that further content will be coming in which is associated with the existing content, before it can be considered "complete". It is not intended to provide workflow control, and clients MUST NOT assume that asserting In-Progress: true will have any specific effect on the state of the item.

15.1. Deposit Complete

If In-Progress is false, the server MAY assume that it can carry on processing the deposit as it sees fit.

15.2. Deposit Incomplete

If In-Progress is true, the server SHOULD expect the client to provide further updates to the item some undetermined time in the future. Details of how this is implemented is dependent on the server's purpose. For example, a repository system may hold items which are marked In-Progress in a workspace until such time as a client request indicates that the deposit is complete.

15.3. Completing a Previously Incomplete Deposit

The client can assert that a deposit process has completed by issuing an HTTP POST to the Object-URL with a blank request body and with the In-Progress header set to false (it may simply omit the header altogether too, as this is treated as In-Progress: false by the server). The client MAY specify a Content-Length: 0 HTTP header, and MUST NOT include any body content.

Once the server has processed the request it MUST respond with status code 204 (No Content), or a suitable error.

16. Segmented File Upload

If a client has a very large file that it wishes to transfer to the server by value, then in may be beneficial to do this in several small operations, rather than as a single large operation. Large uploads are at higher risk of failure, depending on a variety of factors, and there is no guarantee that a SWORD server will be able to resume a partial upload.

In order to transfer a large file, the client can break it down into a number of equally sized segments of binary data (the final segment may be a different size to the rest). It can then initialise a Segmented File Upload with the server, and then transfer the segments. The server will reconstitute these segments into a single file, and then the client may deposit this file by-reference.

Segments can be uploaded in any order, and can be uploaded one at a time or in parallel.

16.1. Announcing Support for Segmented File Upload

Servers MAY support Segmented File Upload. To do so, it must provide a staging area where file segments can be uploaded prior to the client requesting a specific deposit operation. The server MUST include a staging field in the Service Document with a URL for where the client can initialise its Segmented File Upload. It SHOULD also specify how long it will retain an unfinished Segmented File Upload, before assuming that the client will not complete it, with the stagingMaxIdle field:

{
  "staging": "http://example.com/staging", 
  "stagingMaxIdle": 3600
}

16.2. Outline of Process for Segmented File Upload

  1. Obtain the Staging-URL[def] from the Service from which to request an Temporary-URL[def]

    If the client is creating a new Object, the Staging-URL can be found in the staging field in the Service Document. If an Object already exists, the client should find the Service-URL from the service field in the Status Document, then GET this URL to obtain the appropriate Service Document, and subsequently get the Staging-URL from the staging field.

  2. Request a Temporary-URL[def] from the Service, via a Segmented Upload Initialisation request.

    Send a POST request to the Staging-URL, as per POST Staging-URL, with the appropriate Content-Disposition (see below). The server will respond with a Temporary-URL in the Location header.

  3. Upload all the file segments to the Temporary-URL[def]

    Send one or more POST requests to the Temporary-URL as per POST Temporary-URL, with the appropriate Content-Disposition (see below), until all file segments have been uploaded.

  4. Carry out the desired deposit operation as a By-Reference deposit, using the Temporary-URL as the by-reference file.

    Refer to the section [By-Reference Deposit] for more information on this approach. Deposits of content hosted at Temporary-URLs SHOULD NOT contain the ttl or dereference fields in the By-Reference Document, and if they are included, the server MUST ignore them.

16.3. Segmented Upload Initialisation

Before sending any segments to the server, the client must initialise the process. This is done by sending a POST request to the Staging-URL as per POST Staging-URL.

The requirements of the protocol for a Segment Upload Initialisation are:

Protocol Operation

Request Requirements

Server Requirements

Response Requirements

See the section Content Disposition for detailed information on the Content-Disposition header. Based on that section, the supplied Content-Disposition would be:

Content-Disposition: segment-init; size=[bytes]; digest=[digest]; segment_count=[n]; segment_size=[bytes]

The server MAY choose to reject the Segmented Upload Initialisation request at this stage, for a variety of reasons - for example, it may have a limit on the total number of segments it will accept, or the total size may exceed a maximum file size for assembled files. In these cases, the server MUST respond with one of the appropriate Error Types.

If the request is successful, the server will respond with a Temporary-URL in the Location header, and the segments themselves can be uploaded to that URL.

16.4. Uploading File Segments

Segments may be uploaded in any order and may also be parallelised. Segments MUST all be the same size, with the exception of the final segment with MUST be the same size or smaller than the other segments. Segments size MUST be smaller than the maxUploadSize specified in the Service Document.

The requirements of the protocol for File Segment Upload are:

Protocol Operation

Request Requirements

Server Requirements

Response Requirements

See the section Content Disposition for detailed information on the Content-Disposition header. Based on that section, the supplied Content-Disposition would be:

Content-Disposition: segment; segment_number=[n]

The Content-Type header MUST just be application/octet-stream.

The Digest header MUST contain the Digest for the File Segment itself, so the server can confirm successful transfer of the segment.

16.5. Retrieving Information about a Segmented File Upload

At any point after creating a Temporary-URL, the client may request information on the state of their Segmented File Upload. This can be done via a GET to the Temporary-URL.

This will return you a document as described in Segmented File Upload Document.

The requirements for this operation are:

Protocol Operation

Request Requirements

Server Requirements

Response Requirements

NOTE that you cannot retrieve an actual copy of the full or partially uploaded Segmented File Upload from the Temporary-URL at any point.

16.6. Aborting an Upload

If, part way through a segmented upload (even after completion) the client wishes to abort, it can send an DELETE request to the Temporary-URL, with the following requirements:

Request Requirements

Server Requirements

Response Requirements

If a client submits the Temporary-URL as a By-Reference deposit to the server after completing the upload, the client SHOULD NOT delete the Temporary-URL themselves, the server SHOULD take responsibility for this. If the client deletes the resource before the By-Reference deposit has completed, the server SHOULD record an error against the ingest.

16.7. Incomplete Upload Retention

Servers SHOULD delete incomplete Segmented File Uploads after a specified amount of time (in the Service Document), if they are not finalised with all segments.

16.8. Completed Upload Retention

Servers SHOULD delete completed Segmented File Uploads after a specified amount of time (in the Service Document). Servers MUST be able to tell when they have been given one of their own Temporary-URLs as a By-Reference deposit, and not delete that resource until after it has been ingested.

16.9. Errors

Servers MUST respond with Error documents under the following circumstances (in addition to the standard errors that may arise through using the protocol):

The server MAY respond with an Error document under the following circumstances:

If any other errors occur asynchronously, such as in reassembling or unpacking the resulting file, servers MUST provide an error status field and suitable log information in the link record in the Status document.

17. By-Reference Deposit

By-Reference Deposit is when the client provides the server with URLs for Files which it would like the server to retrieve asynchronously to the deposit request itself. This could be useful in a number of contexts, such as when the files are very large, and are stored on specialist staging hardware, or where the files are already readily available elsewhere, and there is no need to push them through a by-value deposit.

17.1. Announcing Support for By Reference Deposit

Servers MAY support By-Reference deposit. If a server supports By-Reference it SHOULD indicate this in the Service Document using the field byReferenceDeposit:

{
  "byReferenceDeposit": true
}

17.2. Options for By-Reference Deposit

Clients may use a By-Reference Deposit anywhere a by-value deposit could be carried out. Instead of sending any Binary content, the client sends the By-Reference Document containing one or more (depending on context) URLs to files which the server can retrieve.

See the document SWORDv3 Behaviours for an expansion of the Protocol Requirements for requests to deposit By-Reference.

The Content Disposition for a By-Reference deposit is:

Content-Disposition: attachment; by-reference=true

17.2.1. Usage with Segmented File Upload

If carrying out a Segmented File Upload, the final deposit stage is to send the Temporary-URL[def] to the server as part of a By-Reference deposit. In this case the client SHOULD omit the ttl and dereference fields from the By-Reference Document, thus:

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@type" : "ByReference",

  "byReferenceFiles" : [
    {
      "@id" : "[Temporary-URL]",
      "contentType" : "application/zip",
      "contentLength" : 123456,
      "contentDisposition" : "attachment; filename=file.zip",
      "packaging" : "http://purl.org/net/sword/packaging/SimpleZip",
      "digest" : "SHA256=...."
    }
  ]
}

The server MUST recognise one of its own Temporary-URLs, and should implement ingest in the most efficient way possible, remembering that you cannot retrieve a copy of the actual Segmented File Upload from the Temporary-URL via GET, so the server MUST have a way to retrieve the content from those uploads in another way. The server MUST NOT delete the resource until after it has been successfully ingested (i.e. the stagingMaxIdle time should be ignored when the server has received the resource as a By-Reference deposit).

17.3. Server-Side Processing of By Reference Deposits

The following is the procedure that MUST be followed by servers implementing By-Reference deposit.

  1. The server receives a By-Reference Document with one or more files listed

  2. The server creates records for each of these files that it plans to dereference, which then become visible in the Status Document. Files marked by the client not to be dereferenced are considered metadata, and MAY NOT appear in the Status Document. All other supplied Files MUST have the status pending in the Status Document.

  3. The server responds to the client with the appropriate response for the action (See Protocol Operations and Protocol Requirements)

  4. At its own pace, taking into account the ttl of the Files, the server obtains all the files that are marked for dereference and validates them against their Digest and any other supporting information such as contentType, contentLength, and packaging. During the download the server SHOULD set the status to downloading. The server SHOULD be able to resume an interrupted download.

  5. Once the Files are downloaded and processed, the server MUST set the status to ingested. If the Files need unpacking first, the server SHOULD first set the status to unpacking and then ingested when this operation is complete. The server MUST also remove the byReferenceDeposit rel.

  6. If there is an error in downloading or otherwise processing the file, the server MUST set the status to error and SHOULD provide a meaningful log message.

  7. The server MAY continue to record the original URL of the file if desired.

17.3.1. Representation in the Status Document

While a By-Reference File is being processed, it MUST be represented in the Status Document under the link field. The following sections show how it is represented.

On Initial Deposit

{
  "status": "http://purl.org/net/sword/3.0/filestate/pending", 
  "eTag": "1", 
  "@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip", 
  "byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip", 
  "rel": [
    "http://purl.org/net/sword/3.0/terms/byReferenceDeposit", 
    "http://purl.org/net/sword/3.0/terms/originalDeposit", 
    "http://purl.org/net/sword/3.0/terms/fileSetFile"
  ]
}

During Download

{
  "status": "http://purl.org/net/sword/3.0/filestate/downloading", 
  "eTag": "1", 
  "@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip", 
  "byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip", 
  "rel": [
    "http://purl.org/net/sword/3.0/terms/byReferenceDeposit", 
    "http://purl.org/net/sword/3.0/terms/originalDeposit", 
    "http://purl.org/net/sword/3.0/terms/fileSetFile"
  ]
}

During Unpacking

{
  "status": "http://purl.org/net/sword/3.0/filestate/unpacking", 
  "eTag": "2", 
  "@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip", 
  "byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip", 
  "rel": [
    "http://purl.org/net/sword/3.0/terms/originalDeposit", 
    "http://purl.org/net/sword/3.0/terms/fileSetFile"
  ]
}

After Completion

{
  "status": "http://purl.org/net/sword/3.0/filestate/ingested", 
  "eTag": "2", 
  "@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip", 
  "byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip", 
  "rel": [
    "http://purl.org/net/sword/3.0/terms/originalDeposit", 
    "http://purl.org/net/sword/3.0/terms/fileSetFile"
  ]
}

In Case of Error

{
  "status": "http://purl.org/net/sword/3.0/filestate/error", 
  "log": "There was an error ingesting your file", 
  "byReference": "http://www.otherorg.ac.uk/by-reference/file2.zip", 
  "eTag": "2", 
  "rel": [
    "http://purl.org/net/sword/3.0/terms/originalDeposit", 
    "http://purl.org/net/sword/3.0/terms/fileSetFile"
  ], 
  "@id": "http://www.myorg.ac.uk/sword3/object1/reference.zip"
}

17.4. Responsibilities of the client/reference server

To provide deposit By-Reference, the reference server, where the file is initially hosted, SHOULD:

To use By-Reference deposit, the client SHOULD:

18. Metadata Deposit

SWORD allows the client to deposit arbitrary metadata onto the server through agnostic support for metadata formats. A metadata format is any document which expresses metadata in a given serialisation. SWORD has a default format which MUST be supported by the server, which consists of the set of DCMI Terms [DCMI] expressed as JSON (see Metadata Document).

In general, the form of metadata consists of several aspects:

  1. The serialisation, such as to JSON or XML

  2. The vocabulary of the metadata, such as Dublin Core, or MODS (sometimes the vocabulary and the serialisation will be conflated here)

  3. The profile of the metadata, such as the RIOXX profile for DC (+extensions)

Any format (combining the 3 aspects above) may be represented by an IRI in the protocol, or an opaque string if no IRI exists or can be minted.

18.1. Announcing Support for Metadata Formats

The server can list Metadata formats that it will accept in the acceptMetadata field of the Service Document.

If no acceptMetadata field is present, the client MUST assume the server only supports the default SWORD metadata format (http://purl.org/net/sword/3.0/types/Metadata).

{
  "acceptMetadata": [
    "http://purl.org/net/sword/3.0/types/Metadata"
  ]
}

18.2. Indicating Metadata Format to the Server

During deposit, the client SHOULD specify a Metadata-Format header which contains the identifier for the format. For example, if supplying the default SWORD metadata format:

Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata

If this header is not present the server MUST assume it has the above value.

19. Metadata Formats

19.1. Default Format

In order to provide a baseline of interoperability, SWORD provides a default metadata format which MUST be supported by the server. This document has the following aspects (as per Metadata Deposit):

  1. It is serialised as JSON and with a JSON-LD @context

  2. It contains dc and dcterms vocabulary elements, and any other arbitrary elements added by the client

  3. It does not pre-suppose any particular profile of usage of these vocabulary elements.

Clients MAY choose to extend this document with their own metadata fields, though the server MAY NOT understand them, and MAY ignore them.

When using this Metadata Format, the client should identify it in the Metadata-Format header with the following IRI:

http://purl.org/net/sword/3.0/types/Metadata

19.2. Depositing Other Formats

In addition to the standard SWORD metadata format described above, SWORD can support the deposit of arbitrary metadata schemas and serialisations.

Clients who wish to ensure that their servers support all the metadata they send them should consider minting a new identifier for their format, and looking for servers to declare explicit support for it.

The following is a minimal example of the deposit of a MODS XML metadata file while creating a new Object:

POST Service-URL
Content-Type: application/xml
Content-Disposition: attachment; metadata=true
Digest: SHA-256=74b2851bd2760785b0987ba219debea69c228353f7ccc67a2bdcd9819f97fc71
Metadata-Format: http://www.loc.gov/mods/v3

<mods xmlns:mods="http://www.loc.gov/mods/v3">
  <originInfo>
    <place>
      <placeTerm type="code" authority="marccountry">nyu</placeTerm>
      <placeTerm type="text">Ithaca, NY</placeTerm>
    </place>
    <publisher>Cornell University Press</publisher>
    <copyrightDate>1999</copyrightDate>
  </originInfo>
</mods>

If the server supports the MODS Metadata-Format, identified with the IRI http://www.loc.gov/mods/v3 then it will be able to create a new Object from this XML document, and populate the Metadata from the data therein.

20. Packaged Content Deposit

SWORD allows you to deposit both Files and Metadata simultaneously through support of Packaged Content. SWORD does not place any limitations on the number or type of packaging formats that the client/server support, though see the section Packaging Formats for the packages that MUST be supported by the server.

20.1. Announcing Support for Packaged Content Deposit

The Service Document uses the acceptPackaging field to indicate that a Service will accept deposits of a particular packaging format, and the acceptArchiveFormat field to indicate the serialisation/compression formats that it understands.

Clients should refer to the treatment description in the Service Document to find out the treatment for a particular packaging type.

Packages formats SHOULD be identified by a IRI, but MAY be identified by an arbitrary string.

If no acceptPackaging field is supplied the client MUST assume that the server does not formally support any package formats, and should expect everything to be treated as per the server's policies with regard to the mimetype as per the accept element.

If no acceptArchiveFormat field is supplied the client MUST assume that the server supports application/zip only.

{
  "acceptArchiveFormat": [
    "application/zip"
  ], 
  "acceptPackaging": [
    "*"
  ], 
  "accept": [
    "*/*"
  ]
}

20.2. Package support during resource creation

When depositing Packaged Content, the client SHOULD indicate the archive file MIME type using the Content-Type header, and SHOULD also give information about content packaging using the Packaging header.

The value of the Packaging header SHOULD match one of values the server has advertised as acceptable for the service.

If a server receives a POST with an unacceptable Packaging header value, it MUST reject the POST by returning an HTTP response with a status code of 415 (Unsupported Media Type) and a SWORD Error document with URI http://purl.net/org/sword/3.0/error/PackagingFormatNotAcceptable, or store the content without further processing.

20.3. Package description in Status documents

Status documents can speak about packaging in two distinct ways, depending on whether an element in the links list refers to a file that was deposited, or a file that is available for retrieval by the client (or both).

When a package has been deposited as the Original Deposit, it SHOULD record the packaging format and content type alongside it in the record.

{
  "packaging": "http://purl.org/net/sword/3.0/package/SimpleZip", 
  "contentType": "application/zip", 
  "@id": "http://www.myorg.ac.uk/sword3/object1/package.zip", 
  "rel": [
    "http://purl.org/net/sword/3.0/terms/originalDeposit"
  ]
}

Similarly, when a package has been created by the server from the Object’s content and made available to the client as a service, the packaging format and content type MUST be presented alongside it:

{
  "packaging": "http://purl.org/net/sword/3.0/package/SimpleZip", 
  "contentType": "application/zip", 
  "@id": "http://www.myorg.ac.uk/sword3/object1/package.1.zip", 
  "rel": [
    "http://purl.org/net/sword/terms/packagedContent"
  ]
}

21. Packaging Formats

There are 3 packaging formats the all SWORD implementations MUST support.

21.1. Binary

URI: http://purl.org/net/sword/3.0/package/Binary

This format indicates that the package should be interpreted as an opaque blob, and the server SHOULD NOT attempt to extract any content from it. This is typically for use when depositing single files, which do not need unpacking of any kind.

Servers MAY choose, nonetheless, to extract content from Binary packages, if they have the capabilities, such as metadata from images, structural information from text documents, etc.

21.2. SimpleZip

URI: http://purl.org/net/sword/3.0/package/SimpleZip

This format indicates that the package is a compressed set of one or more files in an arbitrary directory structure. The nature of the compression and the structure of the compressed content is not specified.

Servers MAY choose to extract the content from SimpleZip packages, and present the individual file components as derivedResources, if desired.

21.3. SWORDBagIt

URI: http://purl.org/net/sword/3.0/package/SWORDBagIt

This format is a profile of the BagIt directory structure, which has in turn been serialised (which may include compression). The nature of the serialisation/compression is not specified, though if the client wishes the server to extract the content, it SHOULD use one of the formats specified in the Service Document field acceptArchiveFormat.

SwordBagIt
| -- bag-info.txt
| -- bagit.txt
| -- data
| -- | -- bitstreams ...
|    \ -- directories ...
|         \ bitstreams ...
| -- manifest-sha-256.txt
| -- metadata
|     \-- sword.json
\ -- tagmanifest-sha-256.txt

This allows us to represent the item as a combination of an arbitrary structure of bitstreams in the data directory (similar to SimpleZip), and the metadata in the sword default format in metadata/sword.json. A manifest (and tagmanifest) of sha-256 checksums is required, as well as the bagit.txt file and a bag-info.txt file.

The content of sword.json is exactly as defined in the SWORD default Metadata. Note that use of fetch.txt is not supported here.

The server SHOULD unpack this file, and action at least the Metadata. The contents of the data directory MAY be unpackaged into derivedResources if the server desires. It is RECOMMENDED that the contents of the data directory be a flat file structure, to aid mutual comprehension by servers/clients.

22. Auto-Discovery

In order to assist potential clients discover a server’s capabilities, SWORD RECOMMENDS the following auto-discovery features to be embedded in any web interfaces associated with the service provider.

22.1. For Services

Embed an html link with a rel value of http://purl.org/net/sword/3.0/discovery/Service

<html:link rel="http://purl.org/net/sword/3.0/discovery/Service" href="[Service-URL]"/>

22.2. For Objects

Embed an html link with a rel value of http://purl.org/net/sword/3.0/discovery/Object in any page which represents a deposited resource.

<html:link rel="http://purl.org/net/sword/3.0/discovery/Object" href="[Object-URL]"/>

23. References

AtomPub Gregario, J. and B. de hOra, "The Atom Publishing Protocol", RFC 5023, October 2007. http://www.ietf.org/rfc/rfc5023.txt

DCMI DCMI Metadata Terms, 2012-06-14 http://dublincore.org/documents/dcmi-terms/

IANA Auth Hypertext Transfer Protocol (HTTP) Authentication Scheme Registry https://www.iana.org/assignments/http-authschemes/http-authschemes.xhtml

IANA Digest Hypertext Transfer Protocol (HTTP) Digest Algorithm Values https://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml

JSON-LD JSON-LD 1.1, A JSON-based Serialization for Linked Data, 28 March 2018 https://json-ld.org/spec/latest/json-ld/

JSON-SCHEMA JSON Schema: A Media Type for Describing JSON Documents http://json-schema.org/latest/json-schema-core.html

OpenAPI OpenAPI Specification, Version 3.0.0 https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md

RFC2119 Bradner, S. "Key words for use in RFCs to Indicate Requirement Levels", March 1997. http://www.ietf.org/rfc/rfc2119.txt

RFC3230 J. Mogul et al "Instance Digests in HTTP" https://www.ietf.org/rfc/rfc3230.txt

RFC5987 J. Reschke. "Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters" https://tools.ietf.org/html/rfc5987

RFC6266 J. Reschke. "Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP)", 2011 https://tools.ietf.org/html/rfc6266

RFC7232 R. Fielding and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests", June 2014 https://tools.ietf.org/html/rfc7232

SWORD 1.3 Downing, J. "SWORD AtomPub Profile version 1.3", 2008. http://www.swordapp.org/docs/sword-profile-1.3.html

SWORD 2.0 Jones, R. and Lewis, S. "SWORD 2.0 Profile", 2011 http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html