Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: send logs directly from builder pods back to builder pod #31

Open
Cryptophobia opened this issue Mar 21, 2018 · 5 comments
Open

Comments

@Cryptophobia
Copy link
Member

From @arschles on February 29, 2016 19:35

Note: I believe others have suggested a similar or identical solution to this problem in the past. Hopefully this issue solidifies those ideas.

Rel deis/builder#185
Rel deis/builder#199
Rel #298

Problem Statement

As of this writing, the builder does the following to do a build:

  1. Launch a builder pod (slugbuilder or dockerbuilder)
  2. Poll the k8s API for the pod's existence
  3. Begin streaming pod logs after the pod exists

We've found issues with this approach, all of which stem from the fact that the pod may not be reported as running during any polling event. This is a race condition, from which so far we've found the following symptoms:

  1. The pod has started & completed inside of one polling interval
    1. Attempted solution in fix(pkg/gitreceive): use a watch for pod additions (Prototype) deis/builder#185. Note that this will not address the problem laid out in (2)
  2. The pod has started, completed and been garbage collected inside of one polling interval
    1. Temporary fix that relies on internal k8s GC implementation at: feat(race): change heritage label for every pod launch deis/builder#206

Solution Details

Because of this race condition, we can't rely on polling, and even if we successfully use the event stream (#185), k8s GC doesn't guarantee that pod logs will still be available after the pod is done. This proposal calls for the builder pod to stream its logs back to the builder that launched it.

Here are the following changes (as of this writing) that would need to happen to make this work:

  1. Each git-receive hook process runs a websocket server (on a unique port, assigned by the builder SSH server) that accepts incoming logs from the builder pod. It uses these logs for the following purposes:
    1. Writes them to STDOUT (for the builder to write back to the SSH connection)
    2. Look for a FINISHED message that indicates the builder pod is done
  2. Each git-receive process launches builder pods with its "phone-home" IP and port, which is the websocket server that they should write their logs to
  3. The builder pods now include a program that launch the builder logic (a shell script for slugbuilder and a python program for dockerbuilder). This program's purpose is to:
    1. Stream STDOUT & STDERR via a websocket connection to the phone-home address
    2. Send a FINISHED message when the builder logic exits

After the builder's git-receive hook receives the FINISHED message, or after a generous timeout, it can shut down the websocket server and continue with the logic it already has. The builder no longer would need to rely on polling the k8s API if this proposal were implemented.'

Copied from original issue: deis/builder#207

@Cryptophobia
Copy link
Member Author

From @smothiki on February 29, 2016 22:46

We are anyways thinking about implementing JOBs . Which might change a lot of behavior. Also A POD getting garbage collected immediately without changing the event type is not an expected K8s behavior.
The intended behavior is
Event - pod status
Added - A pod is created
Modified -- status changes from pending to running
Deleted -- status Succeeded or something with error code 0 or greater.

because of some labels mess we are not observing the POD status change from pending to running rather GC starts collecting the POD the event will be Deleted directly even though the POD is running . which is not an intended behavior . No point in streaming the logs back if the the POD is garbage collected in the middle of an execution.

@Cryptophobia
Copy link
Member Author

From @smothiki on February 29, 2016 22:48

deis/builder#185 this will solve a lot of things. I feel there is no need of special web socket connection to stream logs back.

@Cryptophobia
Copy link
Member Author

From @arschles on March 3, 2016 23:39

@smothiki I'm not sure how #185 would solve this particular problem if we don't launch jobs. However, I am 👍 on using jobs for our builds when they come out of extensions. If I understand http://kubernetes.io/v1.1/docs/user-guide/jobs.html correctly, we'll be able to make an API call to get the logs of the job even if it's complete at the time of calling.

@Cryptophobia
Copy link
Member Author

From @arschles on April 15, 2016 17:39

promoting to beta3

@Cryptophobia
Copy link
Member Author

From @arschles on April 21, 2016 15:53

Punting to beta4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant