Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Exclude TIMEOUT errors when disabling streams #7369

Merged
merged 2 commits into from
Sep 26, 2024

Conversation

joeyparrish
Copy link
Member

@joeyparrish joeyparrish commented Sep 24, 2024

In #7368, we get stuck in a loop loading forever. This regression was introduced in v4.4.0 and affects all v4.4, v4.5, v4.6, v4.7, and v4.8 releases, as well as v4.9.0-28, v4.9.2-caf1, v4.10.0-20, and v4.11.0-6.

The loop is composed of these elements:

  1. an error that triggers disabling a stream
  2. an error that doesn't resolve itself over time
  3. an error that is slow enough to trigger that the first streams get re-enabled
  4. VOD content that doesn't change while we sit in the loop
  5. enough streams to avoid exhausting them during the cycle

Only TIMEOUT errors can trigger this bug AFAICT, so we should exclude those from the logic to disable streams. Note also that live streaming already retries indefinitely by default, and that normal ABR logic will change streams for us if we timeout due to a lack of bandwidth.

Disabling streams on TIMEOUT was suggested initially in #4764, but was not a requirement of the OP. It was added out of caution in #4769, but not really vetted. Because it was not ever explicitly needed, excluding it is not a regression.

Closes #7368

@shaka-bot
Copy link
Collaborator

shaka-bot commented Sep 25, 2024

Incremental code coverage: 100.00%

Copy link
Member

@avelad avelad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is not valid, we should also accept errors of the TIMEOUT or BAD_HTTP_STATUS type, for Low Latency these errors are common and we should treat them as well.

@joeyparrish
Copy link
Member Author

For me this is not valid, we should also accept errors of the TIMEOUT or BAD_HTTP_STATUS type, for Low Latency these errors are common and we should treat them as well.

The TIMEOUT error is what triggers the loading loop described in #7368.

@joeyparrish
Copy link
Member Author

See my latest comments in the issue.

@joeyparrish joeyparrish changed the title fix: Restore missing logic for disabling streams on error fix: Exclude TIMEOUT errors when disabling streams Sep 26, 2024
Copy link
Contributor

@JulianDomingo JulianDomingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any objections with this approach; I agree TIMEOUT events caused by transient network events do not make the stream inherently broken (plus Joey finding out TIMEOUT errors were initially added as a cautionary measure).

@joeyparrish joeyparrish merged commit 67826ac into shaka-project:main Sep 26, 2024
16 of 17 checks passed
@joeyparrish joeyparrish deleted the fix-vod-loop-1 branch September 26, 2024 18:26
joeyparrish added a commit that referenced this pull request Sep 26, 2024
In #7368, we get stuck in a loop loading forever. This regression was
introduced in v4.4.0 and affects all v4.4, v4.5, v4.6, v4.7, and v4.8
releases, as well as v4.9.0-28, v4.9.2-caf1, v4.10.0-20, and v4.11.0-6.

The loop is composed of these elements:

1. an error that triggers disabling a stream
2. an error that doesn't resolve itself over time
3. an error that is slow enough to trigger that the first streams get
re-enabled
4. VOD content that doesn't change while we sit in the loop
5. enough streams to avoid exhausting them during the cycle

Only `TIMEOUT` errors can trigger this bug AFAICT, so we should exclude
those from the logic to disable streams. Note also that live streaming
already retries indefinitely by default, and that normal ABR logic will
change streams for us if we timeout due to a lack of bandwidth.

Disabling streams on `TIMEOUT` was suggested initially in #4764, but was
not a requirement of the OP. It was added out of caution in #4769, but
not really vetted. Because it was not ever explicitly needed, excluding
it is not a regression.

Closes #7368

Backported to v4.9.2-caf

Release-As: 4.9.2-caf2
joeyparrish added a commit that referenced this pull request Sep 26, 2024
In #7368, we get stuck in a loop loading forever. This regression was
introduced in v4.4.0 and affects all v4.4, v4.5, v4.6, v4.7, and v4.8
releases, as well as v4.9.0-28, v4.9.2-caf1, v4.10.0-20, and v4.11.0-6.

The loop is composed of these elements:

1. an error that triggers disabling a stream
2. an error that doesn't resolve itself over time
3. an error that is slow enough to trigger that the first streams get
re-enabled
4. VOD content that doesn't change while we sit in the loop
5. enough streams to avoid exhausting them during the cycle

Only `TIMEOUT` errors can trigger this bug AFAICT, so we should exclude
those from the logic to disable streams. Note also that live streaming
already retries indefinitely by default, and that normal ABR logic will
change streams for us if we timeout due to a lack of bandwidth.

Disabling streams on `TIMEOUT` was suggested initially in #4764, but was
not a requirement of the OP. It was added out of caution in #4769, but
not really vetted. Because it was not ever explicitly needed, excluding
it is not a regression.

Closes #7368
joeyparrish added a commit that referenced this pull request Sep 26, 2024
In #7368, we get stuck in a loop loading forever. This regression was
introduced in v4.4.0 and affects all v4.4, v4.5, v4.6, v4.7, and v4.8
releases, as well as v4.9.0-28, v4.9.2-caf1, v4.10.0-20, and v4.11.0-6.

The loop is composed of these elements:

1. an error that triggers disabling a stream
2. an error that doesn't resolve itself over time
3. an error that is slow enough to trigger that the first streams get
re-enabled
4. VOD content that doesn't change while we sit in the loop
5. enough streams to avoid exhausting them during the cycle

Only `TIMEOUT` errors can trigger this bug AFAICT, so we should exclude
those from the logic to disable streams. Note also that live streaming
already retries indefinitely by default, and that normal ABR logic will
change streams for us if we timeout due to a lack of bandwidth.

Disabling streams on `TIMEOUT` was suggested initially in #4764, but was
not a requirement of the OP. It was added out of caution in #4769, but
not really vetted. Because it was not ever explicitly needed, excluding
it is not a regression.

Closes #7368
joeyparrish added a commit that referenced this pull request Sep 26, 2024
In #7368, we get stuck in a loop loading forever. This regression was
introduced in v4.4.0 and affects all v4.4, v4.5, v4.6, v4.7, and v4.8
releases, as well as v4.9.0-28, v4.9.2-caf1, v4.10.0-20, and v4.11.0-6.

The loop is composed of these elements:

1. an error that triggers disabling a stream
2. an error that doesn't resolve itself over time
3. an error that is slow enough to trigger that the first streams get
re-enabled
4. VOD content that doesn't change while we sit in the loop
5. enough streams to avoid exhausting them during the cycle

Only `TIMEOUT` errors can trigger this bug AFAICT, so we should exclude
those from the logic to disable streams. Note also that live streaming
already retries indefinitely by default, and that normal ABR logic will
change streams for us if we timeout due to a lack of bandwidth.

Disabling streams on `TIMEOUT` was suggested initially in #4764, but was
not a requirement of the OP. It was added out of caution in #4769, but
not really vetted. Because it was not ever explicitly needed, excluding
it is not a regression.

Closes #7368
@avelad avelad added type: bug Something isn't working correctly priority: P1 Big impact or workaround impractical; resolve before feature release labels Oct 18, 2024
@avelad avelad added this to the v4.12 milestone Oct 18, 2024
@shaka-bot shaka-bot added the status: archived Archived and locked; will not be updated label Nov 25, 2024
@shaka-project shaka-project locked as resolved and limited conversation to collaborators Nov 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority: P1 Big impact or workaround impractical; resolve before feature release status: archived Archived and locked; will not be updated type: bug Something isn't working correctly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stuck in a loop when VOD segments timeout
5 participants