Incorrect values in `pages.jsonl` for javascript redirects #760

Mr0grog · 2025-02-09T19:56:36Z

This is an unusual situation and what’s right is probably debatable, but there are a few pages I’m crawling where the server responds with a 403 error, but the error page includes javascript that immediately navigates to a different URL, which has a 200 status. The listing in pages.jsonl records values from the page that was redirected to via JS, i.e. it lists a 200 status and the title from the target page.

This page is a good example: https://www.energy.gov/justice/no-fear-act-data — the page has been removed, but instead of stopping to show the error, it just immediately directs the user’s browser to the DOE home page at https://www.energy.gov/. If you use a client that doesn’t run JS, you’ll see this snippet in the source:

<script type="text/javascript">
  window.location.href = "https://www.energy.gov/";
</script>

There’s probably room for debate as to what should be recorded in pages.jsonl here. I’m hitting this in a case where the redirect target is not really a meaningful equivalent and is functioning more in a way that hides the error, and so I’d like to clearly differentiate HTTP vs. client redirects here. But I can also imagine lots of sites on static file servers (e.g. GitHub pages) using this technique to implement dynamic routing. Maybe the pages.jsonl entry could record info about both responses in this kind of case?

I imagine the same or similar issues exist with <meta http-equiv="refresh"> redirects, too.

The text was updated successfully, but these errors were encountered:

github-project-automation bot moved this to Triage in Webrecorder Projects Feb 9, 2025

github-project-automation bot added this to Webrecorder Projects Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect values in `pages.jsonl` for javascript redirects #760

Incorrect values in `pages.jsonl` for javascript redirects #760

Mr0grog commented Feb 9, 2025

Incorrect values in pages.jsonl for javascript redirects #760

Incorrect values in pages.jsonl for javascript redirects #760

Comments

Mr0grog commented Feb 9, 2025

Incorrect values in `pages.jsonl` for javascript redirects #760

Incorrect values in `pages.jsonl` for javascript redirects #760