Network Working Group O. Mazahir Internet Draft D. Thaler Intended status: Standards Track M. Cox Expires: August 2014 G. Montenegro Microsoft Corporation 14 February 2014 Deterministic URI Encoding draft-montenegro-httpbis-uri-encoding-00 Abstract The "http" and "https" URI schemes do not have a fixed character encoding. This document defines HTTP headers to enable an explicit indication of the character encoding. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. Mazahir, et. al. [Page 1] Internet-Draft Deterministic URI Encoding February 2014 The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August, 2014. Copyright Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................2 1.1. Requirements Language.....................................3 2. URI Path and Query Encoding Headers............................3 3. IANA Considerations............................................4 3.1. URI-Path-Encoding.........................................4 3.2. URI-Query-Encoding........................................4 4. Security Considerations........................................5 5. Acknowledgments................................................5 6. References.....................................................5 6.1. Normative References......................................5 6.2. Informative References....................................5 7. Author's Addresses.............................................6 1. Introduction The "http" and "https" URI schemes don't have a fixed character encoding. The URI RFC [RFC3986] talks about the generic syntax for URI components: . Legacy URI components (before 2005) tend to use UTF-8 "or some other superset of the US-ASCII character encoding" . New schemes (after 2005) use UTF-8 with percent encoding for reserved characters. The first bullet explains why the character encoding for "http" and "https" URIs is not deterministic. This is particularly Mazahir, et. al. [Page 2] Internet-Draft Deterministic URI Encoding February 2014 problematic when parsing URIs at the server side or at intermediate proxies (e.g., when looking for a cache hit). URI's have different components with different character encoding issues. Per the IDNA rules in [RFC5890], the host component is encoded using A-labels. There is more non-determinism with respect to the path and query components. Furthermore, these two components are not necessarily encoded the same way [Handbook]. This document defines HTTP headers that explicitly state the character encoding for the path and query components. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. URI Path and Query Encoding Headers The URI Path encoding is conveyed in the following header: URI-Path-Encoding = "URI-Path-Encoding" ":" 1charset The URI Query encoding is conveyed in the following header: URI-Query-Encoding = "URI-Query-Encoding" ":" 1charset charset is defined in section 3.4 of [RFC2616]. The expected value indicates the character encoding for the path or query component in the URI prior to percent encoding. (A value of UTF-8 does not mean that the URI carries raw UTF-8.) If the user agent is certain that the path component was formed from percent-encoded UTF-8, it sets the header as follows: URI-Path-Encoding: UTF-8 Similarly, for the query component: URI-Query-Encoding: UTF-8 Mazahir, et. al. [Page 3] Internet-Draft Deterministic URI Encoding February 2014 This signals that the query component in the URI is in UTF-8 with percent encoding. Absence of the URI-Path-Encoding or URI-Query-Encoding header is equivalent to the legacy situation of non-determinism with respect to the path or query component, respectively, as mentioned above in section 1. Likewise, if the URI-Path-Encoding or URI-Query-Encoding header is set to an invalid value or unrecognized charset, this is equivalent to the legacy situation of non-determinism with respect to the path or query component, respectively, mentioned above in section 1. 3. IANA Considerations IANA is requested to add these headers to the "Permanent Message Header Field Names" registry. Per [RFC3864], the template for these headers is specified below. 3.1. URI-Path-Encoding Applicable protocol: http Status: standard Author/change controller: IETF (iesg@ietf.org) Specification document(s): This document. 3.2. URI-Query-Encoding Applicable protocol: http Status: standard Author/change controller: IETF (iesg@ietf.org) Mazahir, et. al. [Page 4] Internet-Draft Deterministic URI Encoding February 2014 Specification document(s): This document. 4. Security Considerations Due to the non-deterministic character encoding of URI's, URI parsing at servers or proxies currently may involve trying different possible character encodings searching for a match. This represents a potential attack vector [RFC6943]. The headers proposed in this document could be used to reduce the attack surface by enabling a more explicit interpretation of the data within a URI, thus preventing unintended consequences. 5. Acknowledgments Thanks to Ivan Pashov and Wade Hilmo for useful discussions in this space. This document was prepared using 2-Word-v2.0.template.doc. 6. References 6.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. 6.2. Informative References [Handbook] Zalewski, M., "Browser Security Handbook, part 1", http://code.google.com/p/browsersec/wiki/Part1 Mazahir, et. al. [Page 5] Internet-Draft Deterministic URI Encoding February 2014 March 2011. [RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration Procedures for Message Header Fields", BCP 90, RFC 3864, September 2004. [RFC5890] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, August 2010. [RFC6943] Thaler, D., Ed., "Issues in Identifier Comparison for Security Purposes", RFC 6943, May 2013. 7. Author's Addresses Osama Mazahir Microsoft Corporation Email: OsamaM@microsoft.com Dave Thaler Microsoft Corporation Email: DThaler@microsoft.com Matthew Cox Microsoft Corporation Email: MaCox@microsoft.com Gabriel Montenegro Microsoft Corporation Phone: Email: gabriel.montenegro@microsoft.com Mazahir, et. al. [Page 6]