Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback not getting invoked for exec call #2133

Open
quoctruong opened this issue Dec 30, 2024 · 3 comments
Open

Callback not getting invoked for exec call #2133

quoctruong opened this issue Dec 30, 2024 · 3 comments

Comments

@quoctruong
Copy link

Describe the bug
I am looking for a way to run a long running job in a Kubernetes container that can withstand temporary network disconnection (the disconnection normally lasts for less than 1 second). I attempt to do this using exec. However, the callback function for exec does not get called if there is a network disconnect.

To Reproduce

  1. Call exec with a long running method like sleep.
  2. Disconnect the connection to the node (I simulate this by scaling down or deleting konnectivity-agent pods on the server).
  3. The callback is never returned.

Expected behavior
An error or some indication of the disconnect would be helpful. Even better is if there is a way to establish a reconnection.

If not, is there a suggested way to run these long running jobs and not getting them disrupted when a network disconnect event happens? For context, I am using GKE and these disruptions happen when there is a maintenance event.

** Example Code**
The code that I am using (this is from https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/src/k8s/index.ts#L223) where neither the resolve or reject function is called in the callback:

  await new Promise(function (resolve, reject) {
    exec
      .exec(
        namespace,
        podName,
        containerName,
        command,
        process.stdout,
        process.stderr,
        stdin ?? null,
        false /* tty */,
        resp => {
          // kube.exec returns an error if exit code is not 0, but we can't actually get the exit code
          if (resp.status === 'Success') {
            resolve(resp.code)
          } else {
            reject(resp?.message)
          }
        }
      )
      .catch(e => reject(e)).finally(() => console.log("done"))
  })
@brendandburns
Copy link
Contributor

brendandburns commented Dec 30, 2024

I don't think exec is what you want.

It sounds like you are trying to create a Kubernetes Job

https://kubernetes.io/docs/concepts/workloads/controllers/job/

When you create a Job you can monitor it for completion and get its logs/status code.

If you really do want to run a command within an existing container, I would suggest that implementing some sort of REST or RPC call within your existing container to create the command is a better approach.

exec is intended for relatively short, interactive commands within an existing container, it's really not built for what you are trying to do.

@quoctruong
Copy link
Author

Thank you @brendandburns. I'm using a third party software from GitHub (GitHub actions) and unfortunately that's what they designed it for so I am looking for potential easy workaround if possible. I guess I will try the Job approach instead. Thanks for the suggestion!

@brendandburns
Copy link
Contributor

fwiw, I would expect an error in the case of a true network disconnect (e.g. socket closed), if the underlying TCP connection stays open, I would expect this to continue to work correctly.

If this is using the 1.0.0 branch, this may be related to #2127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants