SWORD 001: HTTP header fields for packaged content delivery

Credits

SWORD 2.0 Technical Lead: Richard Jones, Cottage Labs

SWORD 2.0 Community Manager: Stuart Lewis, University of Auckland

SWORD 2.0 Technical Advisory Group
Julie Allinson, University of York
Tim Brody, University of Southampton
David Flanders, JISC
Graham Klyne, University of Oxford
Alister Miles, University of Oxford
Ben O'Steen, Cottage Labs
Mark MacGillivray, Cottage Labs
Rob Sanderson, LANL
Nick Sheppard, Leeds Metropolitan University
Eddie Shin, MediaShelf
Ian Stuart, University of Edinburgh
Ed Summers, Library of Congress
David Tarrant, University of Southampton
Graham Triggs, BioMed Central
Scott Wilson, University of Bolton

Further acknowledgements of input
Aaron Birkland (Cornell University), Tim Donohue (DuraSpace), Jim Downing (University of Cambridge), Ross Gardler (OSS Watch), Steve Midgley (US Department of Education), Glen Robson (National Library of Wales), Peter Sefton (University Of Southern Queensland), Adrian Stevenson (UKOLN), Paul Walk (UKOLN), Nigel Ward (University of Queensland)

1. Introduction

Document management systems often present a requirement to move content between applications and servers in groups of files and associated metadata. HTTP and AtomPub protocols provide a good match for this requirement, but do not make certain information easily available to the systems exchanging data. This specification describes HTTP headers to enhance the delivery of packaged content over HTTP to document servers and similar systems from client software, and authority information relating to that transfer.

The specification has been informed by the JISC funded SWORD series of projects to develop deposit technology for repository systems. A first version of SWORD addressed simple one-time fire-and-forget transfer of packaged content from a client environment to a repository. Subsequently, the need to fully support CRUD against stuch repositories and other scholarly systems has become evident, and the HTTP headers presented here have been derived from a detailed requirements analysis of the sector.

During the requirements analysis, existing internet standards were considered. In particular the possibility of extending the existing Accept headers, or the use of Media Features were considered in detail, and found to either not be sufficiently flexible or to offer a degree of complexity to the work which was unnecessary.

1.1 Introduction to Packaging

"Packaging" in the context of this document refers to resources delivered over HTTP which are comprised of component resources. For example, a ZIP file containing multiple files can be considered a "package". Further, some ZIP files have a well defined internal structure which is not accurately portrayed by the Content-Type header (which would usually be application/zip).

The aim of this specification is to enable the client and server to talk meaningfully about those packaged, compound objects

2. Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

The version of BNF used in this document is taken from [RFC2616], and many of the nonterminals used are defined there. Note that the underlying charset is US-ASCII.

3. Packaging

The Packaging header applies to resources delivered over HTTP which are comprised of component resources, and is for uniquely identifying these well structured packaged objects in a similar way that Content-Type does for MIME formats.

The Packaging header is constructed as follows:

Packaging = "Packaging" ":" absoluteURI

Note that absoluteURI is defined in [RFC2616].

For example:

POST /collection HTTP/1.1
Host: example.org
Content-Type: application/zip
Content-Length: [content length]
Content-MD5: [md5-digest]
Content-Disposition: attachment; filename=[filename]
Slug: [suggested identifier]
Packaging: http://purl.org/net/terms/package/default

[request entity]

The Packaging request header SHOULD be used by the client during HTTP POST to give information to the server about the packaging format used to construct the content being POSTed or PUT. Servers SHOULD use this information to unpack the supplied content into its component parts. If the server does not understand the package format it MUST either store the content as delivered without unpacking or respond with 415 (Unsupported Media Type).

The Packaging request header MUST contain a URI. At this point there is no registry of allowed URIs. MIME media types are not permitted in this field, and if a MIME media type exists to describe the packaging format, it SHOULD be used in the corresponding Content-Type header.

4. Accept-Packaging

The Accept-Packaging header applies to resources retrieved over HTTP which are comprised of component resources, and is for requesting these well structured packaged objects from a server. It should be considered similar to the standard Accept headers provided by HTTP, but it currently only takes a single string, and does not support q values.

The Accept-Packaging header is constructed as follows:

Accept-Packaging = "Accept-Packaging" ":" absoluteURI

Note that absoluteURI is defined in [RFC2616].

For example:

GET /item/27 HTTP/1.1
Host: example.org
Accept-Packaging: http://purl.org/net/terms/package/default

The Accept-Packaging request header SHOULD be used by the client during HTTP GET to indicate to the server the package format that the content identified by the URI should be returned in. Servers MUST either return the content in the specified format or return a 406 Not Acceptable.

5. On-Behalf-Of

The On-Behalf-Of request header MAY be used by the client to alert the server as to the identity of the true owner of the content during PUT, POST, DELETE and GET. Traditional HTTP authentication is insufficient in this regard as it is possible that content will be delivered machine-to-machine and the authenticating user will be the client not the content owner. Servers SHOULD use this information to determine whether and how they accept the content.

On-Behalf-Of = "On-Behalf-Of" ":" (token | quoted-string)

For example:

POST /collection HTTP/1.1
Host: example.org
Authorization: [auth]
Content-Type: application/zip
Content-Length: [content length]
Content-MD5: [md5-digest]
Content-Disposition: attachment; filename=[filename]
Slug: [suggested identifier]
Packaging: http://purl.org/net/terms/package/default
On-Behalf-Of: jbloggs

[request entity]

6. In-Progress

The In-Progress request header MAY be used by the client to inform the server that the current content payload is not yet complete in some unspecified way during PUT, POST or DELETE. For example, there may by further content packages that the client plans to deliver to the server before the full content has been delivered, or the client may need to carry out other actions against the server before confirming that the server can proceed to fully process the content. Exact interpretation of this header is left to the server, so it is necessary that server/client pairs will have to have a common understanding of its meaning which is beyond the scope of this document.

In-Progress = "In-Progress" ":" ("true" | "false")

For example:

POST /collection HTTP/1.1
Host: example.org
Authorization: Basic ZGFmZnk6c2VjZXJldA==
Content-Type: application/zip
Content-Length: [content length]
Content-MD5: [md5-digest]
Content-Disposition: attachment; filename=[filename]
Slug: [suggested identifier]
Packaging: http://purl.org/net/terms/package/default
On-Behalf-Of: jbloggs
In-Progress: true

[request entity]

7. Metadata-Relevant

The Metadata-Relevant request header MAY be used by the client to instruct the server to (attempt to) extract metadata from the supplied content package, during PUT, POST or DELETE. Content packages commonly contain both file content and metadata about its contents, and during unpacking servers may process this metadata in a way which is meaningful to them. If the content package is being supplied to an HTTP resource which is not interested in metadata, then it may be that the enclosed information will not be correctly or adequately treated. This directive allows the client to indicate to the server that there is metadata contained within the package which may be of interest to related resources (for example a resource which contains the resource receiving the content), and that the server should be free to update those resources accordingly.

Metadata-Relevant = "Metadata-Relevant" ":" ("true" | "false")

For example:

POST /media-resource/27 HTTP/1.1
Host: example.org
Authorization: [auth]
Content-Type: application/zip
Content-Length: [content length]
Content-MD5: [md5-digest]
Content-Disposition: attachment; filename=[filename]
Slug: [suggested identifier]
Packaging: http://purl.org/net/terms/package/default
On-Behalf-Of: jbloggs
In-Progress: true
Metadata-Relevant: false

[request entity]

8. References

8.1. Normative References

[RFC2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2616] R. Fielding, UC Irvine, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

Copyright and License Notice

Copyright (C) SWORD (2011). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to SWORD, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by SWORD or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and SWORD DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.