Fix: Ensure `executeRequest` waits for standard output to complete when using `comlink` #144

higoo-higoo · 2024-11-26T03:05:42Z

Description of Issue

When using comlink, executeRequest function sometimes finishes before the standard output is fully completed.
This behavior appears to be caused by the lack of synchronization between callback handling and the simple function logic in comlink.

How to Reproduce the Issue

npm run quickstart, jlpm build , jlpm docs:lite with "exposeAppInBrowser": true and jlpm serve
Access JupyterLite.
Execute the following code in your browser's console:

window.jupyterapp.serviceManager.kernels._kernelConnections.values().next().value.iopubMessage.connect((e, t) => {
    const c = t.content;
    
    if (c?.execution_state === 'busy') {
        console.log('+++++++execution start+++++++');
    }

    console.log('!!', t.content);
    
    if (c?.execution_state === 'idle') {
        console.log('-------execution finish-------');
    }
});

Execute any code in JupyterLite.

Expected behavior: The log -------execution finish------- should only appear after all standard output is processed.
Actual behavior: The log sometimes appears prematurely, before the standard output is fully handled.

Changes Made

I identified the issue as being caused by an inconsistency in the execution order between callback handling and the simple function logic when using comlink.

To resolve this:

I implemented a mechanism where the worker sends a message through a callback at the end of the process .
The kernel now waits for this message before concluding the executeRequest function.
This ensures proper synchronization and prevents premature termination of executeRequest.

fix msg type

github-actions · 2024-11-26T03:05:56Z

👈 Try it on ReadTheDocs

jtpio · 2024-12-09T10:16:27Z

packages/pyodide-kernel/src/kernel.ts

@@ -198,6 +198,9 @@ export class PyodideKernel extends BaseKernel implements IKernel {
        );
        break;
      }
+      case 'execute_return': {


Wondering if such message should be named something else, or prefixed with an underscore or similar? Otherwise it looks like it may be an official message part of the Jupyter protocol, while it isn't.

sure, jpk: would be solid. It's not really private, and a downstream may well want to react to it. We'd pay a minimal tax to do the .split but adding more fields at this time seems even worse and more breaking.

Most of these should be hanging off a const or namespace somewhere (but not enum: those can break in round-tripped JSON settings).

So, should we rename the message to jpk:execute_return after all??

martenrichter · 2024-12-12T06:45:03Z

Do you think this also solves the ipywidget problem, which is also related to the commit introducing coimlink:
See here, the last comment for my analysis:
#143
Or do you think it is a separate issue in comlink communication?

jtpio · 2024-12-12T07:29:35Z

Do you think this also solves the ipywidget problem, which is also related to the commit introducing coimlink:

Not sure yet, but probably worth trying with this link (preview for the PR): https://jupyterlite-pyodide-kernel--144.org.readthedocs.build/en/144/_static/lab/index.html

martenrichter · 2024-12-12T08:37:27Z

Nope, it still persists. So it must be another race condition introduced by comlink (or something else in the comlink commit).

martenrichter · 2024-12-15T07:36:51Z

packages/pyodide-kernel/src/kernel.ts

@@ -244,6 +247,8 @@ export class PyodideKernel extends BaseKernel implements IKernel {
    await this.ready;
    const result = await this._remoteKernel.execute(content, this.parent);
    result.execution_count = this.executionCount;
+    await this._executed.promise;
+    this._executed = new PromiseDelegate<void>();


I have a stupid question. There is always one PromiseDelegate. The execute request starts before interacting with the promise_delegate. If two execution requests are started almost at the same time, what identifies which promise to resolve once the message arrives?

martenrichter · 2024-12-15T08:49:09Z

Another question: Does it make sense to add this message mechanism? I think the diagnosis is that Comlink sometimes does not accurately handle calls to async functions in the worker. But this means that potentially, all async functions are affected by the same issue. So, it may be better to find the root cause of why Comlink is failing. It is not that simple and uses UUIDS for every message, but I am also not sure if it can handle async.

martenrichter · 2024-12-15T09:10:17Z

The real problem is that comlink changes the ordering of the messages from "this._sendWorkerMessage" and the return of the worker's async methods. So, the _sendWorkerMessages arrive after the return from the async method that sets the parent header with the msg_id.

martenrichter · 2024-12-15T18:07:37Z

Ok, I have found the underlying problem.
This line:

pyodide-kernel/packages/pyodide-kernel/src/kernel.ts

Line 89 in fb9c10d

remote.registerCallback(proxy(this._processWorkerMessage.bind(this)));

ends up in comlink:
https://github.com/GoogleChromeLabs/comlink/blob/fd4b52666b1ec62784f8b45cb1108c7e40bc481d/src/comlink.ts#L213
creating a separate MessageChannel for the callback used in _sendWorkerMessage.

This explains all the troubles, as the call to the worker is done via the normal worker's message ports, but the _sendWorkerMessage uses a separate Message Channel. The UAs do not guarantee that two separate channels are executed in the same order, but ordering is crucial for the Jupyter protocol.
I see the following options:
1.) Proceed forward in the spirit of this PR, and add to every call to the worker a completion message, but have for every call a separate Promise may be using a Map using some kind of unique id, maybe the message id.
2.) Find a way to use comlink so that _sendWorkerMessage does not use a separate channel from the other mechanism.
3.) Go back to using postMessage for comlink, maybe with an added id jupyterMessage so that comlink knows this is not a message it should process.

jtpio · 2024-12-16T14:54:57Z

Many thanks @martenrichter for the thorough investigation!

2.) Find a way to use comlink so that _sendWorkerMessage does not use a separate channel from the other mechanism.

Right, if there is a way to do that with comlink and stick to the comlink API if possible, it would be great. Otherwise switching back to raw postMessage would likely be fine since it's more important to fix the actual user issue 👍

martenrichter · 2024-12-16T14:58:51Z

Many thanks @martenrichter for the thorough investigation!

2.) Find a way to use comlink so that _sendWorkerMessage does not use a separate channel from the other mechanism.

Right, if there is a way to do that with comlink and stick to the comlink API if possible, it would be great. Otherwise switching back to raw postMessage would likely be fine since it's more important to fix the actual user issue 👍

I have posted a PR for 3.). My follow-up investigation showed that using native comlink is only possible if I submit an upstream patch, which will be complex. I do not know how welcome this will be from the comlink team side.

higoo-higoo added 2 commits November 25, 2024 21:24

await message via sendWorkerMessage before return in executeRequest

50e3794

fix msg type

del parentHeader in last sendWorkerMessage

9f10f8b

jtpio added the bug Something isn't working label Dec 5, 2024

Merge branch 'main' into fix/comlink-worker

62e7717

jtpio reviewed Dec 9, 2024

View reviewed changes

martenrichter reviewed Dec 15, 2024

View reviewed changes

martenrichter mentioned this pull request Dec 15, 2024

Use postMessage instead of a comlink produced MessageChannel #148

Merged

jtpio closed this in #148 Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Ensure `executeRequest` waits for standard output to complete when using `comlink` #144

Fix: Ensure `executeRequest` waits for standard output to complete when using `comlink` #144

higoo-higoo commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024

jtpio Dec 9, 2024

bollwyvl Dec 9, 2024

higoo-higoo Dec 13, 2024

martenrichter commented Dec 12, 2024

jtpio commented Dec 12, 2024

martenrichter commented Dec 12, 2024

martenrichter Dec 15, 2024

martenrichter commented Dec 15, 2024

martenrichter commented Dec 15, 2024

martenrichter commented Dec 15, 2024

jtpio commented Dec 16, 2024

martenrichter commented Dec 16, 2024

Fix: Ensure executeRequest waits for standard output to complete when using comlink #144

Fix: Ensure executeRequest waits for standard output to complete when using comlink #144

Conversation

higoo-higoo commented Nov 26, 2024 • edited Loading

Description of Issue

How to Reproduce the Issue

Changes Made

github-actions bot commented Nov 26, 2024

jtpio Dec 9, 2024

Choose a reason for hiding this comment

bollwyvl Dec 9, 2024

Choose a reason for hiding this comment

higoo-higoo Dec 13, 2024

Choose a reason for hiding this comment

martenrichter commented Dec 12, 2024

jtpio commented Dec 12, 2024

martenrichter commented Dec 12, 2024

martenrichter Dec 15, 2024

Choose a reason for hiding this comment

martenrichter commented Dec 15, 2024

martenrichter commented Dec 15, 2024

martenrichter commented Dec 15, 2024

jtpio commented Dec 16, 2024

martenrichter commented Dec 16, 2024

Fix: Ensure `executeRequest` waits for standard output to complete when using `comlink` #144

Fix: Ensure `executeRequest` waits for standard output to complete when using `comlink` #144

higoo-higoo commented Nov 26, 2024 •

edited

Loading