Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What characters are permitted #846

Merged
merged 7 commits into from
Jun 3, 2021

Conversation

martinthomson
Copy link
Collaborator

This encodes the conclusions from interim discussions:

  1. CR, LF, and NUL cannot appear anywhere in field names or values.
  2. SP and HTAB cannot appear at the start or end of field names or
    values.
  3. COLON cannot appear anywhere in a field name, except for the colon at
    the start of a pseudo-header field name.

The strong requirements about validating fields according to ABNF has
been replaced.

The text is clearer about how the pieces fit together: it is HPACK that
allows any octet, whereas HTTP/2 makes certain choices invalid.

Closes #815.

This encodes the conclusions from interim discussions:

1. CR, LF, and NUL cannot appear anywhere in field names or values.
2. SP and HTAB cannot appear at the start or end of field names or
values.
3. COLON cannot appear anywhere in a field name, except for the colon at
the start of a pseudo-header field name.

The strong requirements about validating fields according to ABNF has
been replaced.

The text is clearer about how the pieces fit together: it is HPACK that
allows any octet, whereas HTTP/2 makes certain choices invalid.

Closes httpwg#815.
Copy link
Collaborator

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good cleanup

@mnot
Copy link
Member

mnot commented Apr 27, 2021

This allows whitespace and control characters in field names. Is that intentional?

Copy link
Contributor

@reschke reschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson
Copy link
Collaborator Author

This was intentional, but it can be revised by expanding the set of characters we prohibit. I had imagined that this would not be different from HTTP/1.1 where "Foo\tBar: ?1" might parse, but is not valid in the same way that "Foo<BEL>Bar: ?1" is not. I thought that we had agreed that a close policing of the field name grammar was not required from HTTP/2 implementations.

Comment on lines 2852 to 2853
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41
to 0x5a) MUST be treated as <xref target="malformed">malformed</xref>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obvious observation but this sentence loses the "field" subject, which seems worse than the previous version

draft-ietf-httpbis-http2bis.xml Outdated Show resolved Hide resolved
draft-ietf-httpbis-http2bis.xml Show resolved Hide resolved
@martinthomson
Copy link
Collaborator Author

@mnot, take another look. I've refactored your suggestion.

colon (ASCII COLON, 0x3a).
</li>
<li>
A field value MUST NOT contain the zero value (ASCII NUL, 0x0), line feed (ASCII LF,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to repeat a requirement from above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's also not clear to me why we would allow whitespace inside the field name...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it say those things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. I was confused because of the statement about field values. Sorry.

@reschke
Copy link
Contributor

reschke commented May 25, 2021

Yes.

@martinthomson
Copy link
Collaborator Author

@gregw, if you can suggest text that would address your remaining concerns, that would be good, but I think we're going to include this text in the next draft. Of course, there will be ample time to improve on this; we just want to make sure that we have an updated draft to discuss at the upcoming interim meeting.

@gregw
Copy link
Contributor

gregw commented May 26, 2021

@martinthomson What I think is missing is a description of what this change is trying to achieve.

The text identifies that there is a set of valid-hpack-fields, but notes that that is a super-set of of valid-HTTP-fields.
It then gives text description of what the set of valid-h2-fields is, but it is not clear exactly what the intent of that set is meant to be? The possibilities are:

  1. valid-h2-fields == valid-HTTP-fields-with-lowercase-names union h2-pseudo-fields. But if this was the case then why not define it in terms of the HTTP ABNF?
  2. valid-h2-fields == valid-HTTP-fields-with-lowercase-names union h2-pseudo-fields union invalid-fields-from-grandfathered-implementations. This seems a strange thing to do, but the motivation was originally given that this was describing what existing implementations do? If this is the case then it should be described as such.
  3. valid-h2-fields == valid-hpack-fields intersection fields-safe-from-encapsulation-attacks. I think this is what the text is actually striving for, but a) it would be good to say so; b) it is unclear to me why h2 needs a superset of option 1. above... if so it should be said why.

If it is indeed option 3. then I assume that there are some fields in the set valid-h2-fields that are nether in the valid-HTTP-fields nor h2-pseudo-fields sets. What are those fields and what should an implementation do with them? We are told later in the update that if we are an intermediary then we can validate strictly against the HTTP ABNF and that if we don't we risk invalid fields (and by inference of the section title encapsulation attacks). But what do we do with those fields if we are not an intermediary? Do we pass them onto our applications and let them work it out that they are not members of valid-HTTP-fields? There is nothing there that says we MAY treat such messages as malformed - so at the very least, it would be a good addition to the text to say that non-intermediaries can do stricter validation.

So in short, I can see lots of text defining valid-h2-headers, but it is entirely unmotivated as to why it is << valid-hpack-fields and >> valid-HTTP-fields-with-lowercase-names.

Finally, to harp on my original complaint, I still do not understand why valid-h2-fields is defined in 4 bullet points of prose that have already confused some readers. There is a paragraph about how field names have to be lowercase and then the bullet points ignore that and define all visible characters except space (so if space is visible does that mean tab, CR, LF are also visible?) and except colon.
Some simple ABNF would be so much better:

h2-field-name = h2-pseudo-field-name / 1*fn-char
h2-pseudo-field-name = ":" 1*fn-char ; or list of actual names
fn-char = %x21-39 / %x3b-40 / %x5B-7E ; visible symbols except ':', digits and lowercase alpha

h2-field-value = 1*fv-char
fv-char = %x01-09 / %x0b-0c / %x0e-ff ; exclude null, CR and LF

Or better yet, just go with HTTP field names, but with names in lower-case:

h2-field-name = h2-pseudo-field-name / 1*fn-char
h2-pseudo-field-name = ":" 1*fn-char ; or list of actual names
fn-char = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / %x61-7A

h2-field-value = field-value

Edit: removed comment about ESC... I would looking at an ASCII table from before I was born!

@Lukasa
Copy link
Collaborator

Lukasa commented May 26, 2021

It then gives text description of what the set of valid-h2-fields is

No it doesn't. It gives a text description of a set of invalid H2 fields. The text does not say these are the only limitations on h2 field validity (noting that there is also an ABNF description that applies when -semantics applies), neither does it say that any field that doesn't contain any invalid characters as described in this section is valid.

I do think the text could be clearer in saying that this is a non-exhaustive list of reasons why a header field may be invalid, but at the same time I'm reluctant to have this document spend too much time re-hashing reasons that exist in other specifications that are presumed to apply.

But what do we do with those fields if we are not an intermediary? Do we pass them onto our applications and let them work it out that they are not members of valid-HTTP-fields? There is nothing there that says we MAY treat such messages as malformed - so at the very least, it would be a good addition to the text to say that non-intermediaries can do stricter validation.

This question is one for -semantics, not for HTTP/2. -semantics gives the answer to this: you MUST NOT emit (§ 2.2), but as a receiver you are cautioned that (§ 2.3):

A recipient should parse a received protocol element defensively, with only marginal expectations that the element will conform to its ABNF grammar and fit within a reasonable buffer size.

Additionally (§ 2.4):

A recipient must interpret a received protocol element according to the semantics defined for it by this specification, including extensions to this specification, unless the recipient has determined (through experience or configuration) that the sender incorrectly implements what is implied by those semantics. For example, an origin server might disregard the contents of a received Accept-Encoding header field if inspection of the User-Agent header field indicates a specific implementation version that is known to fail on receipt of certain content codings.

Unless noted otherwise, a recipient may attempt to recover a usable protocol element from an invalid construct. HTTP does not define specific error handling mechanisms except when they have a direct impact on security, since different applications of the protocol require different error handling strategies. For example, a Web browser might wish to transparently recover from a response where the Location header field doesn't parse according to the ABNF, whereas a systems control client might consider any form of error recovery to be dangerous.

This gives you the permission you want: you are encouraged to parse the field defensively and you may attempt to recover, which implies that by default you may fail, and that -semantics does not specify exactly how you will handle these failures. HTTP/2 continues to grant your application that freedom. The specification does not bind you: you can handle this as you want.

@gregw
Copy link
Contributor

gregw commented May 26, 2021

It then gives text description of what the set of valid-h2-fields is

No it doesn't. It gives a text description of a set of invalid H2 fields.

OK True. Which is actually worse, as it leaves the set of valid-h2-fields undefined other than we know it is a subset of the compliment of the set of invalid fields define here.

The text does not say these are the only limitations on h2 field validity

Also worse, as it means the document only vaguely defines validity. There is actually no definition of what is a valid h2-field.

This question is one for -semantics, not for HTTP/2. -semantics gives the answer to this: you MUST NOT emit (§ 2.2), but as a receiver you are cautioned that (§ 2.3):

This reason could be used to say that there should be no additional invalidity check in this document. Just allow any hpack-valid field and leave the rest to be a matter of semantics.
The fields excluded by this definition are only dangerous if forwarded, which is already covered by "Intermediary Encapsulation Attacks" section that suggests that full HTTP ABNF grammar should be applied (in addition to the vaguely defined exclusions of "HTTP Fields").

So I'm back to why is this document in-precisely defining valid/invalid h2-fields. It is not for intermediaries as there is a section in the document dedicated to that. It is not for locally interpreted HTTP as you say HTTP semantics should be applied. Why does this document say that a field name containing space is invalid, yet one containing a double quote is not? Neither are valid HTTP, both are valid hpack. There is no good reason I can see for either and whilst I don't know of any specific attack that either could be used for, I would not be surprised if both could be used to some evil ends.

This vague definition is going to have a significant carbon footprint. Many Implementations will ultimately end up double validating field names: once to exclude the invalid-h2-fields in their h2 layer and then later in their semantic layer to include only the valid-HTTP-fields. Double validation could be avoided if this document either didn't restrict field validity or precisely defined valid fields as a subset of valid-HTTP-fields so that semantic layers could trust the fields received from their h2 layers as being valid HTTP.

Maybe there is a reason for this double validation and the resulting CPU cycles, but I have yet to see why validity is only partially defined.

@Lukasa
Copy link
Collaborator

Lukasa commented May 27, 2021

The fields excluded by this definition are only dangerous if forwarded, which is already covered by "Intermediary Encapsulation Attacks" section that suggests that full HTTP ABNF grammar should be applied (in addition to the vaguely defined exclusions of "HTTP Fields").

It suggests it but does not require it. The other text requires validation. Most intermediaries will not apply the full ABNF validation because a) it requires perfect understanding of the ABNF of all header fields, which they will not have, and b) even if they did, the -semantics ABNF is sufficiently costly to validate that they won't do it. Cheaper validation steps are more likely to be implemented.

So I'm back to why is this document in-precisely defining valid/invalid h2-fields.

This document (and -semantics) is imprecisely defining them because a precise definition runs afoul of the real world. The ABNF you're citing represents, fundamentally, guidance. If you want to reject things that don't conform to the ABNF you are free to do so (as -semantics says), but there is no document that obligates you to reject them. This is deliberate, because there is (IMO) absolutely no point in adding normative requirements that, if followed, would harm interoperability.

In this case the document is calling out specific cases that MUST be rejected. Practically speaking the intention of this is to encode requirements from other messaging formats, such as -messaging, to ensure that HTTP/2 implementations do not encode header fields that cannot be safely translated to HTTP/1.1. This provides a lower bound on validation: so long as everybody does this, we can safely rely on HTTP/2 messages being represented in HTTP/1.1's framing. Note that this doesn't meant they'll be valid HTTP/1.1 messages, just that they are not going to parse incorrectly.

With that in mind, the constraints added here are, in order:

  1. A cheap bitwise check that can be safely vectorised and applied to reject most invalid field names. This is sufficient to get close to the token ABNF, but is cheaper.
  2. A check that reserves pseudo-headers and is a single-byte comparison.
  3. Three vectorizeable field-value checks (NUL, LF, CR) all ensuring valid HTTP/1.1 framing.
  4. 4 single-byte comparisons against the first byte of the field-value.

This vague definition is going to have a significant carbon footprint. Many Implementations will ultimately end up double validating field names: once to exclude the invalid-h2-fields in their h2 layer and then later in their semantic layer to include only the valid-HTTP-fields.

This problem is solved by not doing that.

The ABNF in -semantics, if enforced, is a strict superset of the requirements here except for (2). If your implementation will enforce the requirements in -semantics, you can choose to not enforce these ones at the h2 layer, knowing that the -semantics layer will cover you. The RFC does not bind your implementation, only your observable behaviour: if your implementation ends up rejecting header fields that meet these characteristics, it doesn't matter why you did it, only that you did.

@reschke
Copy link
Contributor

reschke commented May 27, 2021

a) it requires perfect understanding of the ABNF of all header fields, which they will not have, and b) even if they did, the -semantics ABNF is sufficiently costly to validate that they won't do it.

I don't get that. We're still talking about field names, right?

@Lukasa
Copy link
Collaborator

Lukasa commented May 27, 2021

I don't get that. We're still talking about field names, right?

No, all of this applies to names and values:

An intermediary can reject fields that contain invalid field names or values for other
reasons, in particular those that do not conform to the HTTP ABNF grammar from <xref
target="HTTP" section="5"/>. Intermediaries that do not perform any validation of fields
other than the minimum required by <xref target="HttpHeaders"/> could forward messages
that contain invalid field names or values.

@reschke
Copy link
Contributor

reschke commented May 27, 2021

Well.

Validating names and values are very different things. It would be best to clearly separate these topics.

@gregw
Copy link
Contributor

gregw commented May 27, 2021

I get it that there is a minimum standard being set.

In section "HTTP Fields" it says that h2 implementations MUST treat as malformed any fields that violate any of the described conditions and an intermediary MUST NOT forward any such headers.
Then again in section "Intermediary Encapsulation Attacks" is says the validations of the "HTTP Fields" MUST be implemented, which is kind of a duplication as they are already at MUST strength.

I don't see any normative text that says that an implementation MAY or SHOULD validate against the HTTP ABNF and that if they do so then that is a superset of the conditions in section "HTTP Fields"

I would very much prefer to see text in the "HTTP Fields" section that clearly says h2 implementations MAY validate against the HTTP ABNF and that any implementations that do not MUST validate fields against the following conditions. I'll have a try at this later today.... stand by....

Comment on lines +2850 to +2858
Field names are strings of ASCII characters that are compared in a case-insensitive
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>.
</t>
<t>
HPACK is capable of carrying field names or values that are not valid in HTTP. Though
HPACK can carry any octet, fields are not valid in the following cases:
</t>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Field names are strings of ASCII characters that are compared in a case-insensitive
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>.
</t>
<t>
HPACK is capable of carrying field names or values that are not valid in HTTP. Though
HPACK can carry any octet, fields are not valid in the following cases:
</t>
Field names are strings of ASCII characters that are compared in a case-insensitive
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>.
</t>
<t>
HPACK is capable of carrying field names or values that are not valid in HTTP, thus endpoints
MUST perform some additional validation and treat as <xref target="malformed">malformed</xref>
that are not <xref target="PseudoHeaderFields">pseudo-header fields</xref> and that do not comply
with the validation. An endpoint SHOULD, in addition to the lowercase condition above, validate
fields against the HTTP ABNF grammar from <xref target="HTTP" section="5"/>. Any endpoint that
does not validate against the HTTP ABNF grammar MUST at least validate against the following cases:
</t>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving aside the grammar of the text for a moment (I only want to discuss that if we decide to go ahead with adding it), I think the core question is whether or not this is a SHOULD. If -semantics does not make this a SHOULD, why should we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating the HTTP fields in -semantics is a MUST, but I'm not proposing that here. This change works equally as well with MAY instead of SHOULD. The point being making it clear that the conditions listed below are a minimum validation and that more strict validation may be done instead.

Of course it is yet clearer with ABNF:

Suggested change
Field names are strings of ASCII characters that are compared in a case-insensitive
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>.
</t>
<t>
HPACK is capable of carrying field names or values that are not valid in HTTP. Though
HPACK can carry any octet, fields are not valid in the following cases:
</t>
Field names are strings of ASCII characters that are compared in a case-insensitive
fashion. Field names MUST be converted to lowercase when constructing a HTTP/2
message. A request or response containing an uppercase character ('A' to 'Z', ASCII 0x41
to 0x5a) in a field name MUST be treated as <xref target="malformed">malformed</xref>.
</t>
<t>
Since HPACK is capable of carrying field names or values that are neither valid in HTTP
nor secure to transmit as HTTP, fields MUST at least be minimally validated by treating as
<xref target="malformed">malformed</xref> any field that does not comply with the ABNF:
<pre>
field-name = h2-pseudo-field-name / 1*fn-char
pseudo-field-name = ":" 1*fn-char
fn-char = %x21-39 / %x3b-40 / %x5B-7E ; lowercase visible symbols except ':'
field-value = 1*fv-char
fv-char = %x01-09 / %x0b-0c / %x0e-ff ; octets excluding null, CR and LF
</pre>
Alternately, fields MAY be more strictly validated and treated as
<xref target="malformed">malformed</xref> if they do not comply with the
HTTP ABNF grammar from <xref target="HTTP" section="5"/> modified for
lowercase field names:
<pre>
field-name = h2-pseudo-field-name / 1*fn-char
pseudo-field-name = ":" 1*fn-char
fn-char = "!" / "#" / "$" / "%" / "&amp;" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / %x61-7A
field-value = *field-content
field-content = field-vchar [ 1*( SP / HTAB / field-vchar ) field-vchar ]
field-vchar = VCHAR / %x80-FF
</pre>
</t>

then delete all the bullet points below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating the HTTP fields in -semantics is a MUST, but I'm not proposing that here.

Is it? I can’t find that text in the document, can you link me to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In section 2.4 it says:

A recipient MUST interpret a received protocol element according to the semantics defined for it by this specification

The ABNF is given for the fields in section 5., so by my reading that would be a protocol element that MUST be interpreted according to the ABNF defined by the specification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with the reading, but section 2.4 does then caveat that MUST pretty substantially. For example, the sentence you quotes reads, in its entirety:

A recipient MUST interpret a received protocol element according to the semantics defined for it by this specification, including extensions to this specification, unless the recipient has determined (through experience or configuration) that the sender incorrectly implements what is implied by those semantics.

On top of that unless, the next paragraph is:

Unless noted otherwise, a recipient MAY attempt to recover a usable protocol element from an invalid construct.

I don't think this drastically changes the MUST, but I do think that we potentially want to walk the SHOULD back to a MAY here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa, which "SHOULD" are you referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson I believe @Lukasa is referring to the "SHOULD" in the first version of suggested text above:

An endpoint SHOULD, in addition to the lowercase condition above, validate fields against the HTTP ABNF grammar ...
If text like that is adopted, I don't particularly care if that is a SHOULD or MAY. The key thing I suggestion is that this document explicitly calls out the possibility of validating against the full HTTP grammer in this "HttpFields" section and indicates that the minimal validation described is just that - a minimal validation.

However, I prefer the form of my second suggested text above:

Since HPACK is capable of carrying field names or values that are neither valid in HTTP nor secure to transmit as HTTP, fields MUST at least be minimally validated ...
Alternately, fields MAY be more strictly validated ... with the HTTP ABNF grammar ...

Finally, althought I prefer the ABNF usage in my second suggested text (specially as it clarifies the lowercase situation), but even if a text description is used I think that casn work with the form: "MUST minimally validate... Alternatively MAY more strictly validate...."

@martinthomson
Copy link
Collaborator Author

@gregw, I'm reading your comments as supportive of the technical change, but a strong preference for a different way of presenting the information.

I want to clarify that this change is about the mandatory validation that endpoints perform on fields. It looks like we all agree with stipulating a minimum amount of validation that applies to all implementations, especially those that would otherwise just forward fields without processing. Additionally, we all agree that additional validation is permitted, up to the point that knowledge of the semantics for the fields is applied in validating the values.

Where we might disagree is in the way that these requirements are described. I've chosen to use words; @gregw would prefer ABNF.

As we agree on the technical substance and the question of presentation is an editorial decision in which Cory and I have discretion I'm going to merge this pull request. We would like to publish a revision ahead of the upcoming interim.

If you disagree with this decision, especially the technical aspects, then I encourage you to open a new issue. One aspect that seems like it might worth discussing is whether we encourage additional validation rather than simply permitting it. That is, "SHOULD/MAY fully validate".

@martinthomson martinthomson merged commit b242930 into httpwg:main Jun 3, 2021
@gregw
Copy link
Contributor

gregw commented Jun 3, 2021

@martinthomson that's a correct summation of my position. I would have preferred that some text for SHOULD/MAY fully validate to be resolved in this PR (with or without ABNF), but will open another issue.

gregw added a commit to gregw/http2-spec that referenced this pull request Jun 3, 2021
Improve on httpwg#846 that fixes for httpwg#815 by adding extra clarity:
 + The validation for uppercase characters is no longer listed separately
 + It is clearly stated that violations of the full HTTP ABNF field definition MAY be treated as *Malformed*
@gregw gregw mentioned this pull request Jun 3, 2021
@martinthomson martinthomson deleted the allowed-characters branch June 3, 2021 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

characters allowed in field values
6 participants