Automatic decompression should sanitize `Content-Encoding` and `Content-Length` headers from the response #1729

lucacasonato · 2023-12-21T19:54:06Z

What is the issue with the Fetch Standard?

The fetch() spec allows browsers to perform decompression of HTTP responses in fetch() if an appropriate content-encoding header is set on the response. In this case, the Response.prototype.body stream no longer reflects the raw bytes (modulo protocol framing) received on the wire, but instead a processed version of the bytes after being passed through a decompression routine.

This decompression is meant to be transparent to users: they do not have to explicitly opt in or enable it. Further, they can not even disable this (ref #1524).

Unfortunately, the decompression is currently not very transparent: given an arbitrary response object, it is ambiguous whether the Response's body has been decompressed or is still compressed.

This causes real world problems:

it poses a hazards when implementers add new encodings for automatic decompression, because a user that was previously manually decompressing a response with an unsupported content encoding, can now not tell whether they need to perform decompression or not after a browser adds native support for decompressing this content encoding
proxies can not tell what headers they need to send downstream (For compressed responses, omit Content-Length and Content-Encoding headers wintercg/fetch#23)

Proposal

I propose we strip out Content-Length (because it represents the content length prior to decompression), and Content-Encoding (because it represents the encoding prior to decompression) from Response headers when we perform automatic response body decompression in fetch(). I am not suggestion this affects responses created with new Response() or responses returned from fetch() that do not have automatic response body decompression performed.

Compatibility

I don't think this change will break any existing code. It may skew some folks' monitoring tools. I make this assumption based on the following thoughts:

The Content-Length before decompression is meaningless if you only have the decompressed body. You can not infer how long the real response is based on the Content-Length in both gzip and br.
The original Content-Encoding is not useful in combination with a compressed body. The only use I can think of is monitoring usecases where you want to determine what percentage of your assets were served with compression (and with which compression).

Prior art

In the JavaScript space:

Both Deno and Cloudflare implement this proposed fix to allow for the proxy use case mentioned above

In other programming languages:

Go's http std lib module has auto decompression enabled by default. It strips out Content-Length and Content-Encoding when it performs decompression. It has a flag on the response to determine if auto-decompression has taken place. See https://pkg.go.dev/net/http#Response.Uncompressed
Rust's reqwest crate supports auto decompression and enables it by default for clients if the gzip or brotli compile time flags are set. It strips out Content-Length and Content-Encoding when it performs decompression. It has no flag to check if decompression has been performed or not. See https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.gzip
Python's requests: does auto decompression by default, and sets Content-Length to the post decompression content length. It does not remove the Content-Encoding header
Ruby's Net::HTTP: does auto compression by default, removing Content-Encoding, and rewriting
Content-Length to the length after decompression.

The text was updated successfully, but these errors were encountered:

annevk · 2024-01-08T12:30:20Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic decompression should sanitize `Content-Encoding` and `Content-Length` headers from the response #1729

Automatic decompression should sanitize `Content-Encoding` and `Content-Length` headers from the response #1729

lucacasonato commented Dec 21, 2023

annevk commented Jan 8, 2024

Automatic decompression should sanitize Content-Encoding and Content-Length headers from the response #1729

Automatic decompression should sanitize Content-Encoding and Content-Length headers from the response #1729

Comments

lucacasonato commented Dec 21, 2023

What is the issue with the Fetch Standard?

Proposal

Compatibility

Prior art

annevk commented Jan 8, 2024

Automatic decompression should sanitize `Content-Encoding` and `Content-Length` headers from the response #1729

Automatic decompression should sanitize `Content-Encoding` and `Content-Length` headers from the response #1729