Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redirects and etag/if-none-match #1770

Open
wanderview opened this issue Aug 23, 2024 · 20 comments
Open

redirects and etag/if-none-match #1770

wanderview opened this issue Aug 23, 2024 · 20 comments

Comments

@wanderview
Copy link
Member

What is the issue with the Fetch Standard?

@MrPickles and I were looking into how redirects and http caching work together recently. We noticed that redirects (301/302) don't seem to send if-none-match headers even if the redirect response contains an etag. This behavior seems consistent across chrome, firefox, and safari.

Does the spec currently reflect this? I can't seem to find where it defines this restriction. Step 25.2.2.2.1 of http-network-or-cache-fetch doesn't seem to check the status code of the stored response at all. I also don't see it in the http-caching RFC. Am I missing it somewhere or do we need to add this check to the spec?

For reference, I think this is where chromium restricts the status code:

https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_cache_transaction.cc;l=3029;drc=aeeec53b304d068aa96fe84816d1d6c9d96f454e

@mnot
Copy link
Member

mnot commented Aug 26, 2024

Just to clarify:

Given request reqA that receives a response resA, and resA contains a redirection status code and Location header pointing to B. You seem to be expecting reqB to contain If-None-Match header fields, but it's not clear where the Etags in them were obtained: is it from a stored response from B, or from resA?

If it's a stored response from B, I agree that the request should contain a conditional (although it is not required to by HTTP, at least). However, if it's from reqA, that's problematic; ETags obtained from A have nothing to do with B.

@MrPickles
Copy link

it's not clear where the Etags in them were obtained: is it from a stored response from B, or from resA?

This is the order of operations:

  1. You visit A. The response contains an Etag header and a Location header that points to B. The response has a redirection status code.
  2. You get redirected to B. The response from B doesn't matter in this case.
  3. You visit A again. There's no If-None-Match header in the request, even though the earlier response from A had an Etag header.

So the question here is whether it's appropriate for subsequent requests to A to contain the header since a previous response had an ETag, even though the status code is a redirect.

@mnot
Copy link
Member

mnot commented Aug 26, 2024

Ah, I see, thanks.

@annevk
Copy link
Member

annevk commented Aug 26, 2024

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

I guess the Chromium code says it makes sense for 206 as well, but isn't a 304 always translated to a 200?

@MrPickles
Copy link

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

That's my understanding, at least. (I suppose it's also plausible that the storedResponse variable not having any body contents could cause it to be considered null.)

I guess the Chromium code says it makes sense for 206 as well, but isn't a 304 always translated to a 200?

What does 304 translating to a 200 mean? From what I can tell, a 304 response causes the client to completely ignore the response body.

So 304 behavior in practice seems inconsistent among browsers. See https://cr.kungfoo.net/mrpickles/http_cache/res_304_with_200_first.php for a quick demo of 304 and ETag behavior.

  • On Chrome and Safari, the client will send If-None-Match headers on subsequent requests if the original response had a 200 and an ETag.
  • On Firefox, the client never seems to be send If-None-Match headers. I'm not sure if this is some browser setting, but it was the case for both normal and private windows.
  • On Chrome, making a request to a page that always 304's is effectively a no-op.
  • On Safari and Firefox, making a request to a page that always 304's will render the (empty) page.

@MrPickles
Copy link

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

Does something like MrPickles@92fee8d look to be on the right track? (This is my first time doing web standards stuff.) If so, I can open up a pull request.

@annevk
Copy link
Member

annevk commented Aug 26, 2024

What does 304 translating to a 200 mean?

We end up replacing a 304 network response with its stored response (so that a fetch caller almost never sees a 304, unless they control the revalidating headers themselves). We currently don't assert that the stored response has a 200 status code, but I think the HTTP specifications do end up requiring that. Might need to be further clarified as well. @mnot's input on this would be appreciated.

Does something like MrPickles@92fee8d look to be on the right track?

That does look good, modulo whether we should check for "ok status", "200", or "200 or 206".

@MrPickles
Copy link

For reference, I think this is where chromium restricts the status code:

https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_cache_transaction.cc;l=3029;drc=aeeec53b304d068aa96fe84816d1d6c9d96f454e

There may be additional filtering elsewhere. From observation, the browser doesn't send If-None-Match headers if the original response returned an ETag but had a 206 status code.

Does something like MrPickles@92fee8d look to be on the right track?

That does look good, modulo whether we should check for "ok status", "200", or "200 or 206".

Would it reasonable to start with specifying "ok status" and then tightening the spec as we gain more certainty around the intended behavior? Since step 8.25.2 currently specifies no restrictions, I'd imagine we'd want to start with the smallest reasonable "logical spec diff" and incrementally proceed from there.

@annevk
Copy link
Member

annevk commented Aug 28, 2024

Typically for changes to the standard we try to apply "measure twice cut once". As per https://whatwg.org/working-mode#changes we'll also need tests and from those we'd be able to inform ourselves as to what to require and which implementations might need changes. (This isn't always possible, but I think for this change it should be.)

@MrPickles
Copy link

Ack, sounds good.

I did a bit of poking around to see how the different browsers react (demo here). From what I'm seeing, the browser behaviors are inconsistent, and I haven't been able to pattern-match on what they actually do.

Let me get back to you on this once I've investigated a bit more and have a more rigorous description of each browser's behavior.

@MrPickles
Copy link

Alright, so I've documented some patterns from the three browsers. This is from making subsequent requests to the same URL that always returns an ETag but returns a random response status code.

Below is the behavior of each browser if you make multiple requests to the same URL. I can't believe I'm saying this, but it's easier to describe the existing behavior using (semi-)normative language...

Chrome

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let sendIfNoneMatchHeaders be a boolean flag. By default, it is false.
  3. Let req be the HTTP request the browser is going to send.
  4. If sendIfNoneMatchHeaders is true and etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  5. Send req to the URL.
  6. Let res be the HTTP response from req.
  7. Let status be the HTTP status code from res.
  8. Let responseEtag be the contents of the ETag header from res.
  9. If status is 200, then:
    1. If etag null, then:
      1. Set etag to be responseEtag.
    2. Set sendIfNoneMatchHeaders to true.
  10. If status is not 200 or 206, then:
    1. Set sendIfNoneMatchHeaders to true.
  11. Rinse and repeat.

So the way Chrome works is that it'll start storing ETags once it gets a 200. And it'll keep sending the ETag as long as it keeps getting 200's or 206's from the same URL. If the "chain gets broken" by a non-200/206, getting another 200 response will allow the browser to continue sending the original ETag.

Firefox

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let neverSendIfNoneMatchHeadersEverAgain be a boolean flag. By default, it is false.
  3. Let req be the HTTP request the browser is going to send.
  4. If neverSendIfNoneMatchHeadersEverAgain is false and etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  5. Send req to the URL.
  6. Let res be the HTTP response from req.
  7. Let status be the HTTP status code from res.
  8. Let responseEtag be the contents of the ETag header from res.
  9. If status is 2xx and etag is null, then:
    1. Set etag to be responseEtag.
  10. If status is not 2xx, then:
    1. Set neverSendIfNoneMatchHeadersEverAgain to true.
  11. Rinse and repeat.

Firefox seems to stop sending ETags indefinitely as soon as it receives a non-2xx from the server. But otherwise, it will save the ETag from a 2xx response and keep sending the ETag back as long as it has a 2xx response.

Safari

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let req be the HTTP request the browser is going to send.
  3. If etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  4. Send req to the URL.
  5. Let res be the HTTP response from req.
  6. Let responseEtag be the contents of the ETag header from res.
  7. Let shouldClearEtag to be whether (etag is not null).
  8. Set etag to be responseEtag.
  9. If shouldClearEtag, then:
    1. Set etag to null.
  10. Rinse and repeat.

Safari is weird. It doesn't care about the response code. Rather, it seems that for every other request, it clears the ETag.

@MrPickles
Copy link

The browser behaviors seem pretty inconsistent, so I'm at a loss for what the proper spec should be. Unless we keep the spec super vague, at least one browser will be violating the spec. How would you recommend proceeding?

@wanderview
Copy link
Member Author

Any opinions? It seems like chrome and firefox are pretty close. We could try to sidebar about it at TPAC.

@MrPickles
Copy link

I was chatting with @wanderview and it would seem as if what Chrome does makes the most sense (and the spec should generally match Chrome's behavior). Here are some thoughts:

  • Safari's behavior effectively ignores status code, which doesn't seem reasonable. For example, why would a browser want to cache if there's a 500?
  • Firefox never sends if-none-match headers ever again if it receives a non 2xx code, which also doesn't seem quite reasonable. A server could have an intentional 302 and then later legitimately change to a 200. But then Firefox won't cache future variants.
  • Chrome is pretty reasonable all things considered. And 200 and 206 are the only status codes that imply that there's content to cache.

I can draft a PR with the proposed spec changes and write a matching WPT. (But please let me know if the proposed spec or the approach seems unreasonable.)

@mnot
Copy link
Member

mnot commented Oct 7, 2024

For example, why would a browser want to cache if there's a 500?

And 200 and 206 are the only status codes that imply that there's content to cache.

Non-200 status codes can be stored and revalidated, and are in practice. Please don't deviate from the HTTP caching spec without good reason - reinventing it causes interop issues with non-browser caches.

@wanderview
Copy link
Member Author

wanderview commented Oct 8, 2024

Non-200 status codes can be stored and revalidated, and are in practice. Please don't deviate from the HTTP caching spec without good reason - reinventing it causes interop issues with non-browser caches.

Are any browsers following that spec currently here, though? See the results of testing here:

#1770 (comment)

Edit: And we are open to suggestions for a path to interop here.

@mnot
Copy link
Member

mnot commented Oct 11, 2024

I was speaking more in general - e.g., see the link to the cache tests, where browsers do cache non-200 status codes.

@annevk
Copy link
Member

annevk commented Dec 6, 2024

@MrPickles apologies for the late reply, but the general approach sounds reasonable. Mark has a point though that we need to figure out the discrepancies with https://cache-tests.fyi/#status as following HTTP is a goal where that is possible.

@MrPickles
Copy link

I took a look through the cache tests and spec, and from what I can tell, it doesn't actually seem to conflict with the fetch spec changes. (Note that this isn't any proposal to change any spec, but rather my observations on the existing specs and whether they'd conflict.)

If we look at the cache tests, the logic basically goes as follows:

  1. The client sends a request with Cache-Control headers set accordingly.
  2. The server gives a response with any status code.
  3. The client seconds a second request, and the server gives a 304 response.
  4. In such a case, the original response should have been cached.

The section of the fetch spec goes over the slightly different logic:

  1. The client sends a request (not necessarily with Cache-Control headers set accordingly).
  2. The server sends an ETag value back. The response status code may vary.
  3. Depending on the response status code, should the client include an If-None-Match header in subsequent requests?

The fetch spec changes discussed in this thread focus more on when the client chooses to send If-None-Match headers, whereas the caching spec is more about when the client is obligated to cache results. So they don't necessarily conflict. For example, the browser may arbitrarily choose whether or not to send If-None-Match headers, but if the HTTP response is a 304, it'll still need to use the cached response from a previous request.

So from what I can tell, tweaking behavior around the fetch spec does not have to conflict with the HTTP cache spec, since the behavior we're looking to adjust is around request headers. This is my understanding after looking around at the specs, so please double check my thought process and correct me if I'm mistaken or missing something.

Miscellaneous/tangential observations

To amend a previous quote:

And 200 and 206 are the only status codes that imply that there's content to cache.

And 200 and 206 are the only status codes that imply that there's probably interesting content to cache, such that the browser might actively want to ask the server to confirm whether it should use the cached results.

Also, from what I can tell, the "conditional requests" section of the cache tests (https://cache-tests.fyi/#conditional-inm) cover how the cache should response if If-None-Match headeres are sent, but it does not specify when the client must send those headers. So that doesn't conflict with the fetch spec discussion in this thread either.

The caching spec does bring up a fair point that all response status codes should be cache-able. So checking status code in step 8.25.2 of network-or-cache-fetch might be too broad. It may make sense to modify step 8.25.2.2.2 or around there. (This is just me musing to myself, not an actual change proposal.)

@annevk
Copy link
Member

annevk commented Dec 19, 2024

I initially thought that would go against the specification (as Mark did mention revalidation above, which is what you're talking about), but https://httpwg.org/specs/rfc9111.html#rfc.section.4.3 does suggest to me that revalidation is optional (emphasis mine):

it can use the conditional request mechanism

It seems that we could decide that for the web platform that while we can cache all status codes, we're not going to validate all of them as we don't believe that to be useful or web compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants