-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What characters are permitted #846
Conversation
This encodes the conclusions from interim discussions: 1. CR, LF, and NUL cannot appear anywhere in field names or values. 2. SP and HTAB cannot appear at the start or end of field names or values. 3. COLON cannot appear anywhere in a field name, except for the colon at the start of a pseudo-header field name. The strong requirements about validating fields according to ABNF has been replaced. The text is clearer about how the pieces fit together: it is HPACK that allows any octet, whereas HTTP/2 makes certain choices invalid. Closes httpwg#815.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good cleanup
This allows whitespace and control characters in field names. Is that intentional? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #846 (comment)
This was intentional, but it can be revised by expanding the set of characters we prohibit. I had imagined that this would not be different from HTTP/1.1 where "Foo\tBar: ?1" might parse, but is not valid in the same way that "Foo<BEL>Bar: ?1" is not. I thought that we had agreed that a close policing of the field name grammar was not required from HTTP/2 implementations. |
draft-ietf-httpbis-http2bis.xml
Outdated
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41 | ||
to 0x5a) MUST be treated as <xref target="malformed">malformed</xref>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obvious observation but this sentence loses the "field" subject, which seems worse than the previous version
Co-authored-by: Lucas Pardue <[email protected]>
Co-authored-by: Mark Nottingham <[email protected]>
@mnot, take another look. I've refactored your suggestion. |
colon (ASCII COLON, 0x3a). | ||
</li> | ||
<li> | ||
A field value MUST NOT contain the zero value (ASCII NUL, 0x0), line feed (ASCII LF, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to repeat a requirement from above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it's also not clear to me why we would allow whitespace inside the field name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does it say those things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not. I was confused because of the statement about field values. Sorry.
Yes. |
@gregw, if you can suggest text that would address your remaining concerns, that would be good, but I think we're going to include this text in the next draft. Of course, there will be ample time to improve on this; we just want to make sure that we have an updated draft to discuss at the upcoming interim meeting. |
@martinthomson What I think is missing is a description of what this change is trying to achieve. The text identifies that there is a set of
If it is indeed option 3. then I assume that there are some fields in the set So in short, I can see lots of text defining Finally, to harp on my original complaint, I still do not understand why h2-field-name = h2-pseudo-field-name / 1*fn-char
h2-pseudo-field-name = ":" 1*fn-char ; or list of actual names
fn-char = %x21-39 / %x3b-40 / %x5B-7E ; visible symbols except ':', digits and lowercase alpha
h2-field-value = 1*fv-char
fv-char = %x01-09 / %x0b-0c / %x0e-ff ; exclude null, CR and LF Or better yet, just go with HTTP field names, but with names in lower-case: h2-field-name = h2-pseudo-field-name / 1*fn-char
h2-pseudo-field-name = ":" 1*fn-char ; or list of actual names
fn-char = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / %x61-7A
h2-field-value = field-value Edit: removed comment about ESC... I would looking at an ASCII table from before I was born! |
No it doesn't. It gives a text description of a set of invalid H2 fields. The text does not say these are the only limitations on h2 field validity (noting that there is also an ABNF description that applies when -semantics applies), neither does it say that any field that doesn't contain any invalid characters as described in this section is valid. I do think the text could be clearer in saying that this is a non-exhaustive list of reasons why a header field may be invalid, but at the same time I'm reluctant to have this document spend too much time re-hashing reasons that exist in other specifications that are presumed to apply.
This question is one for -semantics, not for HTTP/2. -semantics gives the answer to this: you MUST NOT emit (§ 2.2), but as a receiver you are cautioned that (§ 2.3):
Additionally (§ 2.4):
This gives you the permission you want: you are encouraged to parse the field defensively and you may attempt to recover, which implies that by default you may fail, and that -semantics does not specify exactly how you will handle these failures. HTTP/2 continues to grant your application that freedom. The specification does not bind you: you can handle this as you want. |
OK True. Which is actually worse, as it leaves the set of
Also worse, as it means the document only vaguely defines validity. There is actually no definition of what is a valid h2-field.
This reason could be used to say that there should be no additional invalidity check in this document. Just allow any hpack-valid field and leave the rest to be a matter of semantics. So I'm back to why is this document in-precisely defining valid/invalid h2-fields. It is not for intermediaries as there is a section in the document dedicated to that. It is not for locally interpreted HTTP as you say HTTP semantics should be applied. Why does this document say that a field name containing space is invalid, yet one containing a double quote is not? Neither are valid HTTP, both are valid hpack. There is no good reason I can see for either and whilst I don't know of any specific attack that either could be used for, I would not be surprised if both could be used to some evil ends. This vague definition is going to have a significant carbon footprint. Many Implementations will ultimately end up double validating field names: once to exclude the Maybe there is a reason for this double validation and the resulting CPU cycles, but I have yet to see why validity is only partially defined. |
It suggests it but does not require it. The other text requires validation. Most intermediaries will not apply the full ABNF validation because a) it requires perfect understanding of the ABNF of all header fields, which they will not have, and b) even if they did, the -semantics ABNF is sufficiently costly to validate that they won't do it. Cheaper validation steps are more likely to be implemented.
This document (and -semantics) is imprecisely defining them because a precise definition runs afoul of the real world. The ABNF you're citing represents, fundamentally, guidance. If you want to reject things that don't conform to the ABNF you are free to do so (as -semantics says), but there is no document that obligates you to reject them. This is deliberate, because there is (IMO) absolutely no point in adding normative requirements that, if followed, would harm interoperability. In this case the document is calling out specific cases that MUST be rejected. Practically speaking the intention of this is to encode requirements from other messaging formats, such as -messaging, to ensure that HTTP/2 implementations do not encode header fields that cannot be safely translated to HTTP/1.1. This provides a lower bound on validation: so long as everybody does this, we can safely rely on HTTP/2 messages being represented in HTTP/1.1's framing. Note that this doesn't meant they'll be valid HTTP/1.1 messages, just that they are not going to parse incorrectly. With that in mind, the constraints added here are, in order:
This problem is solved by not doing that. The ABNF in -semantics, if enforced, is a strict superset of the requirements here except for (2). If your implementation will enforce the requirements in -semantics, you can choose to not enforce these ones at the h2 layer, knowing that the -semantics layer will cover you. The RFC does not bind your implementation, only your observable behaviour: if your implementation ends up rejecting header fields that meet these characteristics, it doesn't matter why you did it, only that you did. |
I don't get that. We're still talking about field names, right? |
No, all of this applies to names and values:
|
Well. Validating names and values are very different things. It would be best to clearly separate these topics. |
I get it that there is a minimum standard being set. In section "HTTP Fields" it says that h2 implementations MUST treat as malformed any fields that violate any of the described conditions and an intermediary MUST NOT forward any such headers. I don't see any normative text that says that an implementation MAY or SHOULD validate against the HTTP ABNF and that if they do so then that is a superset of the conditions in section "HTTP Fields" I would very much prefer to see text in the "HTTP Fields" section that clearly says h2 implementations MAY validate against the HTTP ABNF and that any implementations that do not MUST validate fields against the following conditions. I'll have a try at this later today.... stand by.... |
Field names are strings of ASCII characters that are compared in a case-insensitive | ||
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2 | ||
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41 | ||
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>. | ||
</t> | ||
<t> | ||
HPACK is capable of carrying field names or values that are not valid in HTTP. Though | ||
HPACK can carry any octet, fields are not valid in the following cases: | ||
</t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field names are strings of ASCII characters that are compared in a case-insensitive | |
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2 | |
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41 | |
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>. | |
</t> | |
<t> | |
HPACK is capable of carrying field names or values that are not valid in HTTP. Though | |
HPACK can carry any octet, fields are not valid in the following cases: | |
</t> | |
Field names are strings of ASCII characters that are compared in a case-insensitive | |
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2 | |
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41 | |
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>. | |
</t> | |
<t> | |
HPACK is capable of carrying field names or values that are not valid in HTTP, thus endpoints | |
MUST perform some additional validation and treat as <xref target="malformed">malformed</xref> | |
that are not <xref target="PseudoHeaderFields">pseudo-header fields</xref> and that do not comply | |
with the validation. An endpoint SHOULD, in addition to the lowercase condition above, validate | |
fields against the HTTP ABNF grammar from <xref target="HTTP" section="5"/>. Any endpoint that | |
does not validate against the HTTP ABNF grammar MUST at least validate against the following cases: | |
</t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving aside the grammar of the text for a moment (I only want to discuss that if we decide to go ahead with adding it), I think the core question is whether or not this is a SHOULD. If -semantics does not make this a SHOULD, why should we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating the HTTP fields in -semantics is a MUST, but I'm not proposing that here. This change works equally as well with MAY instead of SHOULD. The point being making it clear that the conditions listed below are a minimum validation and that more strict validation may be done instead.
Of course it is yet clearer with ABNF:
Field names are strings of ASCII characters that are compared in a case-insensitive | |
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2 | |
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41 | |
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>. | |
</t> | |
<t> | |
HPACK is capable of carrying field names or values that are not valid in HTTP. Though | |
HPACK can carry any octet, fields are not valid in the following cases: | |
</t> | |
Field names are strings of ASCII characters that are compared in a case-insensitive | |
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2 | |
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41 | |
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>. | |
</t> | |
<t> | |
Since HPACK is capable of carrying field names or values that are neither valid in HTTP | |
nor secure to transmit as HTTP, fields MUST at least be minimally validated by treating as | |
<xref target="malformed">malformed</xref> any field that does not comply with the ABNF: | |
<pre> | |
field-name = h2-pseudo-field-name / 1*fn-char | |
pseudo-field-name = ":" 1*fn-char | |
fn-char = %x21-39 / %x3b-40 / %x5B-7E ; lowercase visible symbols except ':' | |
field-value = 1*fv-char | |
fv-char = %x01-09 / %x0b-0c / %x0e-ff ; octets excluding null, CR and LF | |
</pre> | |
Alternately, fields MAY be more strictly validated and treated as | |
<xref target="malformed">malformed</xref> if they do not comply with the | |
HTTP ABNF grammar from <xref target="HTTP" section="5"/> modified for | |
lowercase field names: | |
<pre> | |
field-name = h2-pseudo-field-name / 1*fn-char | |
pseudo-field-name = ":" 1*fn-char | |
fn-char = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / %x61-7A | |
field-value = *field-content | |
field-content = field-vchar [ 1*( SP / HTAB / field-vchar ) field-vchar ] | |
field-vchar = VCHAR / %x80-FF | |
</pre> | |
</t> |
then delete all the bullet points below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating the HTTP fields in -semantics is a MUST, but I'm not proposing that here.
Is it? I can’t find that text in the document, can you link me to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In section 2.4 it says:
A recipient MUST interpret a received protocol element according to the semantics defined for it by this specification
The ABNF is given for the fields in section 5., so by my reading that would be a protocol element that MUST be interpreted according to the ABNF defined by the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree with the reading, but section 2.4 does then caveat that MUST pretty substantially. For example, the sentence you quotes reads, in its entirety:
A recipient MUST interpret a received protocol element according to the semantics defined for it by this specification, including extensions to this specification, unless the recipient has determined (through experience or configuration) that the sender incorrectly implements what is implied by those semantics.
On top of that unless, the next paragraph is:
Unless noted otherwise, a recipient MAY attempt to recover a usable protocol element from an invalid construct.
I don't think this drastically changes the MUST, but I do think that we potentially want to walk the SHOULD back to a MAY here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa, which "SHOULD" are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinthomson I believe @Lukasa is referring to the "SHOULD" in the first version of suggested text above:
An endpoint SHOULD, in addition to the lowercase condition above, validate fields against the HTTP ABNF grammar ...
If text like that is adopted, I don't particularly care if that is a SHOULD or MAY. The key thing I suggestion is that this document explicitly calls out the possibility of validating against the full HTTP grammer in this "HttpFields" section and indicates that the minimal validation described is just that - a minimal validation.
However, I prefer the form of my second suggested text above:
Since HPACK is capable of carrying field names or values that are neither valid in HTTP nor secure to transmit as HTTP, fields MUST at least be minimally validated ...
Alternately, fields MAY be more strictly validated ... with the HTTP ABNF grammar ...
Finally, althought I prefer the ABNF usage in my second suggested text (specially as it clarifies the lowercase situation), but even if a text description is used I think that casn work with the form: "MUST minimally validate... Alternatively MAY more strictly validate...."
@gregw, I'm reading your comments as supportive of the technical change, but a strong preference for a different way of presenting the information. I want to clarify that this change is about the mandatory validation that endpoints perform on fields. It looks like we all agree with stipulating a minimum amount of validation that applies to all implementations, especially those that would otherwise just forward fields without processing. Additionally, we all agree that additional validation is permitted, up to the point that knowledge of the semantics for the fields is applied in validating the values. Where we might disagree is in the way that these requirements are described. I've chosen to use words; @gregw would prefer ABNF. As we agree on the technical substance and the question of presentation is an editorial decision in which Cory and I have discretion I'm going to merge this pull request. We would like to publish a revision ahead of the upcoming interim. If you disagree with this decision, especially the technical aspects, then I encourage you to open a new issue. One aspect that seems like it might worth discussing is whether we encourage additional validation rather than simply permitting it. That is, "SHOULD/MAY fully validate". |
@martinthomson that's a correct summation of my position. I would have preferred that some text for SHOULD/MAY fully validate to be resolved in this PR (with or without ABNF), but will open another issue. |
Improve on httpwg#846 that fixes for httpwg#815 by adding extra clarity: + The validation for uppercase characters is no longer listed separately + It is clearly stated that violations of the full HTTP ABNF field definition MAY be treated as *Malformed*
This encodes the conclusions from interim discussions:
values.
the start of a pseudo-header field name.
The strong requirements about validating fields according to ABNF has
been replaced.
The text is clearer about how the pieces fit together: it is HPACK that
allows any octet, whereas HTTP/2 makes certain choices invalid.
Closes #815.