Add more exit codes to detect interruption reason #764

benoit74 · 2025-02-10T10:29:12Z

Following recent changes in handling of exit code, and as suggested, I propose to refine exit codes a bit further, especially to be better informed of situations where we would like to indicate something to the end user in Kiwix usage.

I replaced interrupted and browserCrashed properties of the crawler with a single interrupt_reason property, whose value is an enum indicating (when possible) the interruption reason.

I slightly changed the message on cancellation since there was now a repetition of "gracefully finishing current pages".

These changes made me realize that there was maybe a flaw in the serializeAndExit function since according to my code analysis the interrupted property is never set when this function is called, unless I miss something.

Tested locally and works as expected, some cases (reasonably easily feasible ones) covered by automated tests.

There is finally a small fix of documentation.

Store interupt reason directly instead of interrupted + browser crashed flags, use it to infer proper exit code, and more exit code, one per interruption reason. Other small changes: - Rename `InterruptedGraceful` exit code into `Cancelled` for clarity / consistency with redis operation - Rename `InterruptedImmediate` exit code to `Interrupted` since there is no more confusion possible with `Cancelled` and other graceful interuptions have their own exit codes - fix handling of SIGINT in serializeAndExit for which there is no interrupted value positioned

benoit74 · 2025-02-10T12:38:54Z

@tw4l @ikreymer this is ready for review

ikreymer · 2025-02-10T18:00:30Z

Thanks, overall looks good! I actually have another PR that adds a conflict with this, but will clean it up.

These changes made me realize that there was maybe a flaw in the serializeAndExit function since according to my code analysis the interrupted property is never set when this function is called, unless I miss something.

It's possible for interrupted to be true, if an interrupt request was made, but the crawler has not yet exited.
The first interrupt attempt will wait for crawler to exit gracefully (finish current pages), the second will shut down the browser immediately.

- only attempt to close browser if not browser crashed - add timeout for browser.close() - ensure browser crash results in healthchecker failure - bump to 1.5.3

…wler into benoit74-more_exit_codes

- keep browser.crashed flag in Browser, skip close if crashed - healthchecker checks browser.crash - keep markBrowserCrashed()

src/util/constants.ts

tw4l · 2025-02-10T19:30:38Z

Nice to see this improvement! It would be great to document these exit codes in the crawler documentation so that folks don't have to dig into the code to understand what the codes mean.

benoit74 · 2025-02-10T20:30:55Z

It's possible for interrupted to be true, if an interrupt request was made, but the crawler has not yet exited.
The first interrupt attempt will wait for crawler to exit gracefully (finish current pages), the second will shut down the browser immediately.

Indeed, my bad.

It would be great to document these exit codes in the crawler documentation so that folks don't have to dig into the code to understand what the codes mean.

Definitely, will tackle this as well. Probably in a distinct PR to move this code change asap to release, should documentation take more discussions than expected ... Where should this be placed? Should we create a new Exit codes page in the User Guide, right after the Outputs page? Or as a section of this Outputs page?

tw4l · 2025-02-10T20:40:50Z

Definitely, will tackle this as well. Probably in a distinct PR to move this code change asap to release, should documentation take more discussions than expected ... Where should this be placed? Should we create a new Exit codes page in the User Guide, right after the Outputs page? Or as a section of this Outputs page?

Thank you! Much appreciated :) I think a new Exit Codes section would be the simplest/most easily discovered way to document this.

tw4l

Looks good, thanks for the contribution!

ikreymer · 2025-02-10T21:04:43Z

Yep, looks good! I merged some changes that readded browser.crashed, which is also used by healthchecker.
Tested with crawler that was getting stuck per #763, and seems to be good now.

Can add docs in a follow-up, this should be good to go!

src/main.ts

ikreymer · 2025-02-10T21:10:35Z

src/main.ts

-      crawler.gracefulFinishOnInterrupt();
+    if (!crawler.interruptReason) {
+      logger.info("SIGNAL: interrupt request received, finishing current pages before exit...");
+      crawler.gracefulFinishOnInterrupt(InterruptReason.Cancelled);


Actually, this isn't quite correct - this isn't necessarily cancelled.
When we run in k8s, interrupt does not mean to cancel the crawl, it means to stop and possibly restart the crawler..
The signal could come from user or from k8s system (such as memory pressure, etc...)

ikreymer · 2025-02-10T21:24:05Z

Ah but there is some confusion but cancellation / interruption.
The way it works is: first interrupt signal results in waiting for crawler to finish pages, second interrupt signal results in immediate termination.

However, this does not mean the crawl is cancelled, esp. in k8s, but that the crawler needs to be restarted (for whatever reason). It could mean that its being moved to a different node, etc.. This is very important that the deletion of pod does not actually cancel the crawl automatically. The 'cancel' state is only entered through a special message sent via redis. In
The exit codes should differentiate between the graceful and immediate interrupt (though on k8s, it may only second one anyway). When a crawl is canceled, by request of the user, the exit code is actually success (so pod may shut down, nothing more to do!)

This is slightly more confusing because need to support different environments, both k8s and plain docker/podman

Note: It might make sense to revisit cancellation to make it clearer, but than can be done as follow-up.

…erruptForce for first and subsequent signal interruptions

benoit74 marked this pull request as draft February 10, 2025 11:47

benoit74 added 2 commits February 10, 2025 12:21

Align default number of retries in documentation with code value

8f326d3

benoit74 force-pushed the more_exit_codes branch from 6d786f0 to 8f326d3 Compare February 10, 2025 12:21

benoit74 marked this pull request as ready for review February 10, 2025 12:38

ikreymer and others added 4 commits February 10, 2025 10:16

Improved handling of browser stuck / crashed (webrecorder#763)

846f035

- only attempt to close browser if not browser crashed - add timeout for browser.close() - ensure browser crash results in healthchecker failure - bump to 1.5.3

Merge branch 'more_exit_codes' of github.com:benoit74/browsertrix-cra…

1b887cf

…wler into benoit74-more_exit_codes

additional cleanup:

bab56cd

- keep browser.crashed flag in Browser, skip close if crashed - healthchecker checks browser.crash - keep markBrowserCrashed()

cleanup, interrupt_reason -> interruptReason for style consistency

80b05f3

tw4l reviewed Feb 10, 2025

View reviewed changes

src/util/constants.ts Show resolved Hide resolved

tw4l approved these changes Feb 10, 2025

View reviewed changes

ikreymer reviewed Feb 10, 2025

View reviewed changes

src/main.ts Outdated Show resolved Hide resolved

Update src/main.ts

4851494

ikreymer reviewed Feb 10, 2025

View reviewed changes

ikreymer added 2 commits February 10, 2025 13:27

remove Interrupt.Cancelled, instead add SignalInterrupt and SignalInt…

66727c3

…erruptForce for first and subsequent signal interruptions

tweak text to avoid dupe message

ff6bfc3

ikreymer approved these changes Feb 10, 2025

View reviewed changes

ikreymer merged commit fc56c2c into webrecorder:main Feb 10, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more exit codes to detect interruption reason #764

Add more exit codes to detect interruption reason #764

benoit74 commented Feb 10, 2025 •

edited

Loading

benoit74 commented Feb 10, 2025

ikreymer commented Feb 10, 2025

tw4l commented Feb 10, 2025

benoit74 commented Feb 10, 2025

tw4l commented Feb 10, 2025

tw4l left a comment

ikreymer commented Feb 10, 2025

ikreymer Feb 10, 2025

ikreymer commented Feb 10, 2025 •

edited

Loading

Add more exit codes to detect interruption reason #764

Add more exit codes to detect interruption reason #764

Conversation

benoit74 commented Feb 10, 2025 • edited Loading

benoit74 commented Feb 10, 2025

ikreymer commented Feb 10, 2025

tw4l commented Feb 10, 2025

benoit74 commented Feb 10, 2025

tw4l commented Feb 10, 2025

tw4l left a comment

Choose a reason for hiding this comment

ikreymer commented Feb 10, 2025

ikreymer Feb 10, 2025

Choose a reason for hiding this comment

ikreymer commented Feb 10, 2025 • edited Loading

benoit74 commented Feb 10, 2025 •

edited

Loading

ikreymer commented Feb 10, 2025 •

edited

Loading