SWORD 3.0 New Features

There are 4 major new features in SWORDv3:

1. Concurrency Control

Servers MAY implement Concurrency Control, to prevent clients from unintentionally overwriting data.

The Server provides the ETag header on every response, which contains a unique version number for the Object.

The client must then provide the If-Match header with every request to change data, which reflects the latest ETag

Objects may change for a number of reasons after their initial creation, such as:

1.1. Announcing Support for Concurrency Control

Servers are not required to support Concurrency Control.

Clients MUST check response headers for the presence of an ETag. Presence of the ETag indicates that the server requires the client to pay attention to its concurrency control procedures, and to carry out later requests with an If-Match header.

1.2. Key Requirements

2. Metadata Formats

SWORD allows the client to deposit arbitrary metadata onto the server through agnostic support for metadata formats

2.1. Announcing Support for Metadata Formats

The server can list Metadata formats that it will accept in the acceptMetadata field of the Service Document.

If no acceptMetadata field is present, the client MUST assume the server only supports the default SWORD metadata format (http://purl.org/net/sword/3.0/types/Metadata).

{
  "acceptMetadata": [
    "http://purl.org/net/sword/3.0/types/Metadata"
  ]
}

2.2. Indicating Metadata Format

During deposit, the client SHOULD specify a Metadata-Format header which contains the identifier for the format. For example, if supplying the default SWORD metadata format:

Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata

2.3. HTTP Exchange

POST /Service-URL HTTP/1.1 Authorization: ... Content-Disposition: ... Content-Type: application/json Digest: ... Metadata-Format: http://purl.org/net/sword/3.0/types/Metadata

[Metadata Document]


HTTP/1.1 201 Content-Type: application/json

[Resource created, responds with Status Document]

2.4. Default Format

SWORD provides a default metadata format which MUST be supported by the server.

2.5. Default Metadata Example

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@id" : "http://example.com/object/1/metadata",
  "@type" : "Metadata",

  "dc:title" : "The title",
  "dcterms:abstract" : "This is my abstract",
  "dc:contributor" : "A.N. Other"
}

2.6. Alternative Format Example

POST Service-URL
Content-Type: application/xml
Content-Disposition: attachment; metadata=true
Digest: SHA-256=74b2851bd2760785b0987ba219debea69c228353f7ccc67a2bdcd9819f97fc71
Metadata-Format: http://www.loc.gov/mods/v3

<mods xmlns:mods="http://www.loc.gov/mods/v3">
  <originInfo>
    <place>
      <placeTerm type="code" authority="marccountry">nyu</placeTerm>
      <placeTerm type="text">Ithaca, NY</placeTerm>
    </place>
    <publisher>Cornell University Press</publisher>
    <copyrightDate>1999</copyrightDate>
  </originInfo>
</mods>

3. Segmented File Upload

If a client has a very large file that it wishes to transfer to the server by value, then in may be beneficial to do this in several small operations, rather than as a single large operation.

In order to transfer a large file, the client can break it down into a number of equally sized segments of binary data (the final segment may be a different size to the rest). It can then initialise a Segmented File Upload with the server, and then transfer the segments. The server will reconstitute these segments into a single file, and then the client may deposit this file by-reference.

3.1. Announcing Support for Segmented File Upload

Servers MAY support Segmented File Upload. To do so, it must provide a staging area where file segments can be uploaded prior to the client requesting a specific deposit operation. In the Service Document:

{
  "maxAssembledSize": 30000000000000,
  "maxSegmentSize": 16777216000,
  "maxSegments": 1000,
  "minSegmentSize": 1,
  "staging": "http://example.com/staging",
  "stagingMaxIdle": 3600
}

3.2. Process for Segmented File Upload

  1. Obtain the Staging-URL from the Service from which to request an Temporary-URL

  2. Request a Temporary-URL from the Service, via a Segmented Upload Initialisation request.

  3. Upload all the file segments to the Temporary-URL

  4. Carry out the desired deposit operation as a By-Reference deposit, using the Temporary-URL as the by-reference file.

3.3. Segmented Upload Initialisation

POST /Staging-URL HTTP/1.1


HTTP/1.1 201

[Temporary-URL created]

3.4. Uploading File Segments

POST /Temporary-URL HTTP/1.1 Authorization: ... Content-Disposition: ... Content-Length: ... Digest: ...

[Segment to be added to the Resource.]


HTTP/1.1 204

[Segment Received]

3.5. Retrieving Information

At any point after creating a Temporary-URL, the client may request information on the state of their Segmented File Upload. This can be done via a GET to the Temporary-URL.

{
    "@context": "https://swordapp.github.io/swordv3/swordv3.jsonld",
    "@id": "http://example.com/temporary/1",
    "@type": "Temporary",

    "received": [
        1,
        2,
        4
    ],
    "expecting": [
        3,
        5
    ],
    "assembledSize": 10000000,
    "segmentSize": 2000000
}

4. By-Reference Deposit

By-Reference Deposit is when the client provides the server with URLs for Files which it would like the server to retrieve asynchronously.

This could be useful in a number of contexts, such as when the files are very large, and are stored on specialist staging hardware, or where the files are already readily available elsewhere.

4.1. Announcing Support for By-Reference Deposit

Servers MAY support By-Reference deposit. If a server supports By-Reference it SHOULD indicate this in the Service Document using the field byReferenceDeposit:

{
  "byReferenceDeposit": true
}

4.2. Usage instead of Binary Deposit

Clients may use a By-Reference Deposit anywhere a by-value deposit could be carried out. Instead of sending any Binary content, the client sends the By-Reference Document containing one or more (depending on context) URLs to files which the server can retrieve.

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@type" : "ByReference",

  "byReferenceFiles" : [
    {
      "@id" : "http://www.otherorg.ac.uk/by-reference/file.zip",
      "contentType" : "application/zip",
      "contentLength" : 123456,
      "contentDisposition" : "attachment; filename=file.zip",
      "packaging" : "http://purl.org/net/sword/packaging/SimpleZip",
      "digest" : "SHA256=....",
      "ttl" : "2018-04-16T00:00:00Z",
      "dereference" : true
    }
  ]
}

4.3. Usage with Segemented File Upload

If carrying out a Segmented File Upload, the final deposit stage is to send the Temporary-URL to the server as part of a By-Reference deposit.

{
  "@context" : "https://swordapp.github.io/swordv3/swordv3.jsonld",

  "@type" : "ByReference",

  "byReferenceFiles" : [
    {
      "@id" : "[Temporary-URL]",
      "contentType" : "application/zip",
      "contentLength" : 123456,
      "contentDisposition" : "attachment; filename=file.zip",
      "packaging" : "http://purl.org/net/sword/packaging/SimpleZip",
      "digest" : "SHA256=...."
    }
  ]
}

4.4. Server-Side processing

  1. The server receives a By-Reference Document with one or more files listed and creates records for each of these files that it plans to dereference.

  2. The server responds to the client with the appropriate response for the action

  3. At its own pace the server obtains all the files that are marked for dereference.

  4. Once the Files are downloaded and processed, the server sets the file status appropriately in the Status Document

  5. If there is an error in downloading or otherwise processing the file, the server sets the status to error and provides a meaningful log message.