rfc9485v4.txt | rfc9485.txt | |||
---|---|---|---|---|
Internet Engineering Task Force (IETF) C. Bormann | Internet Engineering Task Force (IETF) C. Bormann | |||
Request for Comments: 9485 Universität Bremen TZI | Request for Comments: 9485 Universität Bremen TZI | |||
Category: Standards Track T. Bray | Category: Standards Track T. Bray | |||
ISSN: 2070-1721 Textuality | ISSN: 2070-1721 Textuality | |||
September 2023 | October 2023 | |||
I-Regexp: An Interoperable Regular Expression Format | I-Regexp: An Interoperable Regular Expression Format | |||
Abstract | Abstract | |||
This document specifies I-Regexp, a flavor of regular expression that | This document specifies I-Regexp, a flavor of regular expression that | |||
is limited in scope with the goal of interoperation across many | is limited in scope with the goal of interoperation across many | |||
different regular expression libraries. | different regular expression libraries. | |||
Status of This Memo | Status of This Memo | |||
skipping to change at line 81 ¶ | skipping to change at line 81 ¶ | |||
(abbreviated as "regexp") flavor, I-Regexp. | (abbreviated as "regexp") flavor, I-Regexp. | |||
I-Regexp does not provide advanced regular expression features such | I-Regexp does not provide advanced regular expression features such | |||
as capture groups, lookahead, or backreferences. It supports only a | as capture groups, lookahead, or backreferences. It supports only a | |||
Boolean matching capability, i.e., testing whether a given regular | Boolean matching capability, i.e., testing whether a given regular | |||
expression matches a given piece of text. | expression matches a given piece of text. | |||
I-Regexp supports the entire repertoire of Unicode characters | I-Regexp supports the entire repertoire of Unicode characters | |||
(Unicode scalar values); both the I-Regexp strings themselves and the | (Unicode scalar values); both the I-Regexp strings themselves and the | |||
strings they are matched against are sequences of Unicode scalar | strings they are matched against are sequences of Unicode scalar | |||
values (often represented in UTF-8 encoding form [STD63] for | values (often represented in UTF-8 encoding form [RFC3629] for | |||
interchange). | interchange). | |||
I-Regexp is a subset of XML Schema Definition (XSD) regular | I-Regexp is a subset of XML Schema Definition (XSD) regular | |||
expressions [XSD-2]. | expressions [XSD-2]. | |||
This document includes guidance for converting I-Regexps for use with | This document includes guidance for converting I-Regexps for use with | |||
several well-known regular expression idioms. | several well-known regular expression idioms. | |||
The development of I-Regexp was motivated by the work of the JSONPath | The development of I-Regexp was motivated by the work of the JSONPath | |||
Working Group (WG). The WG wanted to include support for the use of | Working Group (WG). The WG wanted to include support for the use of | |||
skipping to change at line 337 ¶ | skipping to change at line 337 ¶ | |||
libraries in severely constrained environments may not be able to | libraries in severely constrained environments may not be able to | |||
support I-Regexp conformance. | support I-Regexp conformance. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document has no IANA actions. | This document has no IANA actions. | |||
8. Security Considerations | 8. Security Considerations | |||
While technically out of the scope of this specification, Section 10 | While technically out of the scope of this specification, Section 10 | |||
("Security Considerations") of [STD63] applies to implementations. | ("Security Considerations") of [RFC3629] applies to implementations. | |||
Particular note needs to be taken of the last paragraph of Section 3 | Particular note needs to be taken of the last paragraph of Section 3 | |||
("UTF-8 definition") of [STD63]; an I-Regexp implementation may need | ("UTF-8 definition") of [RFC3629]; an I-Regexp implementation may | |||
to mitigate limitations of the platform implementation in this | need to mitigate limitations of the platform implementation in this | |||
regard. | regard. | |||
As discussed in Section 6, more complex regexp libraries may contain | As discussed in Section 6, more complex regexp libraries may contain | |||
exploitable bugs, which can lead to crashes and remote code | exploitable bugs, which can lead to crashes and remote code | |||
execution. There is also the problem that such libraries often have | execution. There is also the problem that such libraries often have | |||
performance characteristics that are hard to predict, leading to | performance characteristics that are hard to predict, leading to | |||
attacks that overload an implementation by matching against an | attacks that overload an implementation by matching against an | |||
expensive attacker-controlled regexp. | expensive attacker-controlled regexp. | |||
I-Regexps have been designed to allow implementation in a way that is | I-Regexps have been designed to allow implementation in a way that is | |||
skipping to change at line 444 ¶ | skipping to change at line 444 ¶ | |||
ietf-jsonpath-base-20>. | ietf-jsonpath-base-20>. | |||
[PCRE2] "Perl-compatible Regular Expressions (revised API: | [PCRE2] "Perl-compatible Regular Expressions (revised API: | |||
PCRE2)", <http://pcre.org/current/doc/html/>. | PCRE2)", <http://pcre.org/current/doc/html/>. | |||
[RE2] "RE2 is a fast, safe, thread-friendly alternative to | [RE2] "RE2 is a fast, safe, thread-friendly alternative to | |||
backtracking regular expression engines like those used in | backtracking regular expression engines like those used in | |||
PCRE, Perl, and Python. It is a C++ library.", commit | PCRE, Perl, and Python. It is a C++ library.", commit | |||
73031bb, <https://github.com/google/re2>. | 73031bb, <https://github.com/google/re2>. | |||
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | ||||
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November | ||||
2003, <https://www.rfc-editor.org/info/rfc3629>. | ||||
[RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, | [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, | |||
DOI 10.17487/RFC7493, March 2015, | DOI 10.17487/RFC7493, March 2015, | |||
<https://www.rfc-editor.org/info/rfc7493>. | <https://www.rfc-editor.org/info/rfc7493>. | |||
[STD63] Yergeau, F., "UTF-8, a transformation format of ISO | ||||
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November | ||||
2003, <https://www.rfc-editor.org/info/rfc3629>. | ||||
[UNICODE-GLOSSARY] | [UNICODE-GLOSSARY] | |||
Unicode, Inc., "Glossary of Unicode Terms", | Unicode, Inc., "Glossary of Unicode Terms", | |||
<https://unicode.org/glossary/>. | <https://unicode.org/glossary/>. | |||
Acknowledgements | Acknowledgements | |||
Discussion in the IETF JSONPATH WG about whether to include a regexp | Discussion in the IETF JSONPATH WG about whether to include a regexp | |||
mechanism into the JSONPath query expression specification and | mechanism into the JSONPath query expression specification and | |||
previous discussions about the YANG pattern and Concise Data | previous discussions about the YANG pattern and Concise Data | |||
Definition Language (CDDL) .regexp features motivated this | Definition Language (CDDL) .regexp features motivated this | |||
End of changes. 6 change blocks. | ||||
9 lines changed or deleted | 9 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |