Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: show build commands on terminal #1965

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kinkie
Copy link
Contributor

@kinkie kinkie commented Dec 17, 2024

test-builds.sh records output from the build in
per-layer logs and we are already collecting that.
Send output from the build also to standard output,
for more convenient reading in Github's 'actions'
interface

test-builds.sh record output from the build in
bt<layer>.log and we are already collecting that.
Send output from the build also to standard output,
for more convenient reading in Github's 'actions'
interface
@rousskov rousskov self-requested a review December 17, 2024 22:29
@rousskov rousskov added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Dec 17, 2024
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test-builds.sh record output from the build in bt.log

It does not. Output is sent to layer-specific log files.

Send output from the build also to standard output, for more convenient reading in Github's 'actions' interface

I expect the result to be less convenient for many, especially when the build fails (which is the primary use case here IMO!): Navigating long logs using GitHub Actions API is painful and archived logs are just a few mouse clicks away. If anything, we should be reducing noise in the current GitHub Actions output.

I do not know whether it would help your use case, I would support adding the tail of a failed log to test-builds.sh output (even by default).

I also do not like proposed code duplication, but I hope that will go away while addressing the primary concern discussed above.

@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Dec 17, 2024
@kinkie
Copy link
Contributor Author

kinkie commented Dec 17, 2024

test-builds.sh record output from the build in bt.log

It does not. Output is sent to layer-specific log files.

Markdown ate my homework. I had written "bt_layer_.log" . Correcting

Send output from the build also to standard output, for more convenient reading in Github's 'actions' interface

I expect the result to be less convenient for many, especially when the build fails (which is the primary use case here IMO!): Navigating long logs using GitHub Actions API is painful and archived logs are just a few mouse clicks away. If anything, we should be reducing noise in the current GitHub Actions output.

After this PR:
in case of failure to build a PR,

  1. I click on "details" for the failed check in the "discussion" page
  2. expand the appropriate section
  3. fill in "search" with the relevant keyword, and then scroll through the logs. It's not great, but 90% of the time it's sufficient. And for the other 10% I can still to the current workflow below, which is not going away

Before this PR:
in case of failure to build a PR,

  1. I click on "details" for the failed check in the "discussion" page
  2. expand the "publish search logs" section
  3. search for the link to download the artifact
  4. click on it to download it to local disk
  5. switch to a terminal (or on MacOS, to the finder)
  6. figure out where the file was downloaded
  7. change to that directory
  8. figure out if the file was renamed, e.g. to avoid overwriting a previous one, unzip it
  9. open the uncompressed content with 'less' or any other similar program
  10. search for the relevant keyword and scroll through the logs
  11. delete the compressed and uncompressed logs from my local disk

I'm not sure what is your workflow, but for me this PR simplifies things significantly

Anyone who prefers the "before" version of the workflow is unaffected: the 'Run' subsection of the workflow output with all the logs is closed by default (at least, it is for me), it doesn't occupy any more screen real estate than it does now.

In other words, this proposal improves things for at least me; I would like to get a concrete example of how it causes a measurable regression for you as a representatives of the many you mention

In fact, after this change, another workflow becomes possible if one has installed the gh CLI tool:

  1. gh run list - make a note of the "id field" corresponding to the workflow one is interested in
  2. gh run view --log <id> downloads the workflow output and opens it using less

I do not know whether it would help your use case, I would support adding the tail of a failed log to test-builds.sh output (even by default).

It can help, but not as much as having the full log. Sometimes, especially in cases of high parallelism in builds, an error can be dozens of lines back from the end of the log

I also do not like proposed code duplication, but I hope that will go away while addressing the primary concern discussed above.

We could make verbose output to be the default for test-builds.sh, but I expect it would be more disruptive than this change. I am not sure if you had something else in mind

@kinkie kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Dec 17, 2024
yadij
yadij previously approved these changes Dec 19, 2024
Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one fix of a pre-existing bug that shows up in the touched lines of this PR.
I do not insist on this.

.github/workflows/quick.yaml Outdated Show resolved Hide resolved
@yadij yadij requested a review from rousskov December 19, 2024 04:27
Co-authored-by: Amos Jeffries <[email protected]>
@kinkie
Copy link
Contributor Author

kinkie commented Dec 19, 2024

Just one fix of a pre-existing bug that shows up in the touched lines of this PR. I do not insist on this.

@yadij accepting this dismissed your review; please re-approve

@rousskov
Copy link
Contributor

I would like to get a concrete example of how it causes a measurable regression for you

I will work on providing one, but it will take me some time to simulate the problems I experience when searching for errors in large GitHub Action console output and to check suspicious assertions in your response to my review. I seriously doubt that spending this time is the best use of our scarce resources, but c'est la vie.

I would support adding the tail of a failed log to test-builds.sh output (even by default).

It can help, but not as much as having the full log. Sometimes, especially in cases of high parallelism in builds, an error can be dozens of lines back from the end of the log

We can easily tail "dozens of lines". Obviously, the interesting bits may not be present in any partial output, but the full log is always a few mouse clicks/seconds away.

@kinkie
Copy link
Contributor Author

kinkie commented Dec 19, 2024

I would like to get a concrete example of how it causes a measurable regression for you

I will work on providing one, but it will take me some time to simulate the problems I experience when searching for errors in large GitHub Action console output and to check suspicious assertions in your response to my review. I seriously doubt that spending this time is the best use of our scarce resources, but c'est la vie.

I agree you should not spend the time! I do not understand what the problem is, to be honest. You would not be forced to search for errors in large console output, your current way of operating will not be taken away from you or anyone else.
As you say, the full log is always a few clicks away. In fact, it would be exactly the same clicks away as it is now

@rousskov
Copy link
Contributor

I agree you should not spend the time! I do not understand what the problem is, to be honest.

Glad we agree. Unfortunately, I do not see how I can explain what the problem is and address your rebuttal of my blocking review without spending that time. Clearly, my initial attempt to explain has failed. Now I need to validate my "this change makes things worse for primary use case" assumption with a specific example and, ideally, also illustrate that searching through large console logs for errors is not as easy as the author says. That takes time.

@kinkie
Copy link
Contributor Author

kinkie commented Dec 19, 2024

I agree you should not spend the time! I do not understand what the problem is, to be honest.

Glad we agree. Unfortunately, I do not see how I can explain what the problem is and address your rebuttal of my blocking review without spending that time. Clearly, my initial attempt to explain has failed. Now I need to validate my "this change makes things worse for primary use case" assumption with a specific example and, ideally, also illustrate that searching through large console logs for errors is not as easy as the author says. That takes time.

Sorry about that.
Hoping to be of help, could you walk me though your process with
https://github.com/squid-cache/squid/actions/runs/12408472448/job/34640210517 (one of the job runs for this same PR) as opposed to https://github.com/squid-cache/squid/actions/runs/12400872862/job/34619011412 (same job for PR 1964? It has (or at least should have) the exact same contents?

@rousskov
Copy link
Contributor

could you walk me though your process with https://github.com/squid-cache/squid/actions/runs/12408472448/job/34640210517 (one of the job runs for this same PR) as opposed to https://github.com/squid-cache/squid/actions/runs/12400872862/job/34619011412 (same job for PR 1964? It has (or at least should have) the exact same contents?

Sure: My primary "process" for both of the given examples is the same -- I typically ignore successful builds. My primary concerns (in this PR scope) are not about the builds we ignore.

If you would like for me to stray from the primary path and use your first example to illustrate some of the (secondary) problems, then keep reading. YMMV, but when navigating large "Run build..." step logs using GitHub Actions web UI in my environment, I can observe the following:

  1. GitHub Actions search UI lies about the number of times character sequence "error" occurs in that log. UI says 100. There are about 1922 matches in that build log AFAICT! The last "error" according to UI (i.e. "error" match number 100) is clearly not the last one. Thus, search -- a key function of any log navigation UI -- does not really work well here.
  2. The browser only shows a portion of the log when using a scroll bar and "scrolling to the end" for the very first time after loading that page (before the entire log gets fetched/cached by UI code, I guess). I have to "scroll to the end" several times to get to the real end of the log. Needless to say, the end of the output is often the interesting part or the starting point. Thus, scrolling with a mouse -- another key GUI function -- does not work well here either. Using keyboard-based navigation appears to work, but some users may not know that they have not reached the end (using a scroll bar) or that navigating using keyboard works better.

AFAICT, longer logs exacerbate these problems. Certain errors may significantly increase log size. In general, logs tend to grow with time as well. The current example cannot illustrate that, of course.

Time spent so far: 37 minutes.

@kinkie
Copy link
Contributor Author

kinkie commented Dec 19, 2024

could you walk me though your process with https://github.com/squid-cache/squid/actions/runs/12408472448/job/34640210517 (one of the job runs for this same PR) as opposed to https://github.com/squid-cache/squid/actions/runs/12400872862/job/34619011412 (same job for PR 1964? It has (or at least should have) the exact same contents?

Sure: My primary "process" for both of the given examples is the same -- I typically ignore successful builds. My primary concerns (in this PR scope) are not about the builds we ignore.

If you would like for me to stray from the primary path and use your first example to illustrate some of the (secondary) problems, then keep reading. YMMV, but when navigating large "Run build..." step logs using GitHub Actions web UI in my environment, I can observe the following:

  1. GitHub Actions search UI lies about the number of times character sequence "error" occurs in that log. UI says 100. There are about 1922 matches in that build log AFAICT! The last "error" according to UI (i.e. "error" match number 100) is clearly not the last one. Thus, search -- a key function of any log navigation UI -- does not really work well here.

Then don't use them, you can continue downloading the artefact and searching there. I'm not proposing to take that away

on top of that:
$ gh run view --job 34640210517 --log | grep error | wc -l
2821

  1. The browser only shows a portion of the log when using a scroll bar and "scrolling to the end" for the very first time after loading that page (before the entire log gets fetched/cached by UI code, I guess). I have to "scroll to the end" several times to get to the real end of the log. Needless to say, the end of the output is often the interesting part or the starting point. Thus, scrolling with a mouse -- another key GUI function -- does not work well here either. Using keyboard-based navigation appears to work, but some users may not know that they have not reached the end (using a scroll bar) or that navigating using keyboard works better.

Sure. After this PR lands - if ever - you can keep downloading the artefacts2 and search there

AFAICT, longer logs exacerbate these problems. Certain errors may significantly increase log size. In general, logs tend to grow with time as well. The current example cannot illustrate that, of course.

Time spent so far: 37 minutes.

Sorry about that. Unfortunately I still do not see how this change will prevent you from continuing to operate in the same way you have so far, while allowing me to choose on a case by case basis whether to rely on the 2 additional workflows that will be available

I feel like my message is not getting through: you and anyone else who wants or needs to rely on the artefacts will be perfectly able to continue using artefacts, without ever seeing or interacting with the second copy of the output that is emitted to stdout/stderr. I am not encouraging, much less forcing, anyone to stop relying on artefacts as their primary - or only - source of debugging information.

You have so far not articulated any regression in functionality for yourself.
I feel that you are answering a different question than the one I've asked. My question was "how would your workflow be worse off if this lands", not "how is this functionality worse than the alternative in some cases"

@rousskov
Copy link
Contributor

rousskov commented Dec 20, 2024

Francesco: could you walk me though your process with ...?

Alex: [Walks through the process of using the referenced run pages, to the best of his ability].

Francesco: I'm not proposing to take [your ability to download artifacts] away.

Yes, I am well aware of that fact (naturally?). None of my concerns state or imply otherwise!

Unfortunately I still do not see how this change will prevent you from continuing to operate in the same way you have so far

I know. As I said, I need to validate my assumption (that this change may make things worse for primary use case) with a specific example. That should overcome the current barriers, but it takes time. So far, I was just responding to your other (mostly unrelated) request.

We are making progress though because the second (and secondary) item from the same TODO has been completed: "illustrate that searching through large console logs for errors is not as easy as the author says".

I feel like my message is not getting through: you and anyone else ... will be perfectly able to continue using artefacts ... I am not encouraging, much less forcing, anyone to stop relying on artefacts as their primary - or only - source of debugging information.

Your message is getting through just fine AFAICT. However, that message is not relevant to my change request1. FWIW, the same situation happens in the vast majority of cases where the rebuttal is based on an obvious, surface-level, in-your-face fact that I ought to be aware of. There are always exceptional cases, of course!

You have so far not articulated any regression in functionality

I believe my change request described my concerns reasonably well, but the level of "articulation"2 is naturally a matter of opinion, and I am sorry I failed to be articulate enough. I need more time to detail what has been described with an example, as discussed earlier, and I hope that example would help get us on the same page.

My question was "how would your workflow be worse off if this lands",

Not exactly AFAICT: My original review can already be used to answer this simpler "how things can become worse" question, at least on a superficial level. Your actual request went a lot further: "I would like to get a concrete example of how it causes a measurable regression for you" (emphasis mine). That is still a TODO for me.

not "how is this functionality worse than the alternative in some cases"

It may be important to keep in mind that GitHub Actions GUI being discussed here is the primary/first/initial source of information for many, especially among less experienced/new contributors. If a PR makes it worse (in some cases), then it is a valid argument against that PR, regardless of whether there are other/unaffected ways to obtain the same or similar info. This is an aggravating factor if you will. The task of proving that my concerns are valid (with a "concrete example" that demonstrates "measurable regression") is still a TODO on my side.

Time spent so far: 69 minutes.

Footnotes

  1. There are other red flags in that "message", but I want to stay focused on the primary problem.

  2. Just to avoid misunderstanding, the level of "articulation" is is not the same as the level of detail. I did not detail or illustrated my concerns (in hope to save time).

@kinkie
Copy link
Contributor Author

kinkie commented Dec 20, 2024

Francesco: could you walk me though your process with ...?

Alex: [Walks through the process of using the referenced run pages, to the best of his ability].

Francesco: I'm not proposing to take [your ability to download artifacts] away.

Yes, I am well aware of that fact (naturally?). None of my concerns state or imply otherwise!

Unfortunately I still do not see how this change will prevent you from continuing to operate in the same way you have so far

I know. As I said, I need to validate my assumption (that this change may make things worse for primary use case) with a specific example. That should overcome the current barriers, but it takes time. So far, I was just responding to your other (mostly unrelated) request.

We are making progress though because the second (and secondary) item from the same TODO has been completed: "illustrate that searching through large console logs for errors is not as easy as the author says".

I feel like my message is not getting through: you and anyone else ... will be perfectly able to continue using artefacts ... I am not encouraging, much less forcing, anyone to stop relying on artefacts as their primary - or only - source of debugging information.

Your message is getting through just fine AFAICT. However, that message is not relevant to my change request1. FWIW, the same situation happens in the vast majority of cases where the rebuttal is based on an obvious, surface-level, in-your-face fact that I ought to be aware of. There are always exceptional cases, of course!

You have so far not articulated any regression in functionality

I believe my change request described my concerns reasonably well, but the level of "articulation"2 is naturally a matter of opinion, and I am sorry I failed to be articulate enough. I need more time to detail what has been described with an example, as discussed earlier, and I hope that example would help get us on the same page.

I am glad we are making progress. The original review comment states:

I expect the result to be less convenient for many, especially when the build fails (which is the primary use case here IMO!):

We have progressed in that we agree that the primary focus should be the case of failed builds.

My question was "how would your workflow be worse off if this lands",

Let me reframe that: you evoked "many", whom we have not heard from yet, whom you expect would be inconvenienced in the primary use case of a build failure.

I countered that by evoking many, of whom I am one, who would benefit in a measurable way in a significant fraction of the occurrences of the primary use case (based on real world experience).

Not exactly AFAICT: My original review can already be used to answer this simpler "how things can become worse" question, at least on a superficial level. Your actual request went a lot further: "I would like to get a concrete example of how it causes a measurable regression for you" (emphasis mine). That is still a TODO for me.

"How things can become worse" is not what I asked. How things do become worse is what we need to find here. And so far the best concrete case you have presented is "someone who is not experienced in how we do CI might turn up and contribute , and might be tempted to use a sometimes less effective workflow" which is not particularly compelling: it applies in a fraction of a fraction of a fraction of events, and remediation is one github comment from one of us away - hardly a high barrier.

But then, once we have found an adverse example and assessed its likelihood of happening, we have to match that against the measurable benefit of the alternative (landing this PR) and then we can take an informed decision

@rousskov
Copy link
Contributor

My question was "how would your workflow be worse off if this lands",

Let me reframe that: ...

IMO, the new framing and past "measurements" are rather unfair. I bet you disagree, but I hope we can continue to make progress even under these adverse conditions.

BTW, while we are talking about structuring the debate, we should keep in mind that the primary problem can be addressed in several ways. Where feasible, we should be selecting the "best feasible" way rather than dealing with a "this way or nothing" false dichotomy.

My original review can already be used to answer this simpler "how things can become worse" question, at least on a superficial level.

"How things can become worse" is not what I asked. How things do become worse is what we need to find here.

We can use "do become" phrasing if you prefer: It is your question, and you can shape it any way you like, of course. AFAICT, the next things I have to do are exactly the same regardless of that phrase choice ("would be worse", "can become worse", or "does become worse"). In other words, for me, all those variations are effectively the same in this context.

Overall, we are (or should be) evaluating expected PR outcomes. That set of outcomes is not limited to a subset of cases one of us has already witnessed (or is asked to witness). We cannot afford to support every expectation with a test case. Where feasible, I hope we can continue to agree on judgement calls not backed by a dedicated test case. Said that, I will spend time to test my assumption in this case.

And so far the best concrete case you have presented is "someone who is not experienced in how we do CI might turn up and contribute , and might be tempted to use a sometimes less effective workflow" which is not particularly compelling: it applies in a fraction of a fraction of a fraction of events, and remediation is one github comment from one of us away - hardly a high barrier.

I agree that the relative number of relevant events is small from our point of view1. Our current chances of encountering a new contributor are small to begin with! It looks like I value new contributor satisfaction and my time spent to address contributor dissatisfaction much higher than you do (so my multiplier is much higher), but I hope we can make progress despite those valuation differences.

once we have found an adverse example and assessed its likelihood of happening, we have to match that against the measurable benefit of the alternative (landing this PR) and then we can take an informed decision

Yes, we need to assess and balance expected benefits/harms; that will require judgement calls, and, FWIW, I will continue to place a high premium on new contributor experience. And, again, rejecting or landing current PR code are not the only two alternatives we should be considering.

Time spent so far: 131 minutes.

Footnotes

  1. On the other hand, that same "relevant event" probability is close to 100% from a new contributor point of view!

@kinkie
Copy link
Contributor Author

kinkie commented Dec 21, 2024

My question was "how would your workflow be worse off if this lands",

Let me reframe that: ...

IMO, the new framing and past "measurements" are rather unfair. I bet you disagree, but I hope we can continue to make progress even under these adverse conditions.

You are right, I disagree. But I also hope we can progress

BTW, while we are talking about structuring the debate, we should keep in mind that the primary problem can be addressed in several ways. Where feasible, we should be selecting the "best feasible" way rather than dealing with a "this way or nothing" false dichotomy.

If "best feasible" is the goal, the solution gets naturally biased to "more options are better than fewer options" just by virtue of having more choice, but I bet you disagree ;)

My original review can already be used to answer this simpler "how things can become worse" question, at least on a superficial level.

"How things can become worse" is not what I asked. How things do become worse is what we need to find here.

We can use "do become" phrasing if you prefer: It is your question, and you can shape it any way you like, of course. AFAICT, the next things I have to do are exactly the same regardless of that phrase choice ("would be worse", "can become worse", or "does become worse"). In other words, for me, all those variations are effectively the same in this context.

Well, in terms of risk, "can become worse" or "would be worse" is about the likelihood of becomign worse. "do become" requies certainty, or at least a high-confidence prediction

Overall, we are (or should be) evaluating expected PR outcomes. That set of outcomes is not limited to a subset of cases one of us has already witnessed (or is asked to witness). We cannot afford to support every expectation with a test case. Where feasible, I hope we can continue to agree on judgement calls not backed by a dedicated test case. Said that, I will spend time to test my assumption in this case.

I agree we should not spend infinite amounts of time on wild goose chases.
At the same time, I hope you agree we need to bias our reasoning to maximise impact, which we can define using a common definition of impact: effect of an event multiplied by its likelihood. (e.g. we should assess the impact of buying a lottery ticket [£2.50 * 1 today] against the impact of winning the lottery [£22m * 0.00000001 today])

And so far the best concrete case you have presented is "someone who is not experienced in how we do CI might turn up and contribute , and might be tempted to use a sometimes less effective workflow" which is not particularly compelling: it applies in a fraction of a fraction of a fraction of events, and remediation is one github comment from one of us away - hardly a high barrier.

I agree that the relative number of relevant events is small from our point of view1. Our current chances of encountering a new contributor are small to begin with!

Yes, unfortunately.

It looks like I value new contributor satisfaction and my time spent to address contributor dissatisfaction much higher than you do (so my multiplier is much higher), but I hope we can make progress despite those valuation differences.

I disagree with this statement, in part because you are misassessing what I value, in part because it feels you are assuming that a new contributor would be dissatisfied by having more options to choose among - and we do not even know that!
In fact, my assumption would be exactly the opposite: that a new contributor (and at least one existing one, possibly two looking at the current PR approvals) would be more satisfied by having more options, even though not all options are ideal for all use cases

once we have found an adverse example and assessed its likelihood of happening, we have to match that against the measurable benefit of the alternative (landing this PR) and then we can take an informed decision

Yes, we need to assess and balance expected benefits/harms; that will require judgement calls, and,

FWIW, I will continue to place a high premium on new contributor experience.

We share the goal. Problem is, so far we can't agree on what is that would make new developers' experience as good as we can.
Two problems:

  1. new contributors are hypotheticals so far in that none chimed in in this discussion, while existing contributors are a certainty; and
  2. we do not know what would make a new developer's experience better among the options we have on the table so far. We can try to guess, but your and my guesses differ

And, again, rejecting or landing current PR code are not the only two alternatives we should be considering.

I appreciate you offered to print some lines from the output, but I feel it is a worse choice than printing everything. Do you have any alternative idea to propose?

Time spent so far: 131 minutes.

Footnotes

  1. On the other hand, that same "relevant event" probability is close to 100% from a new contributor point of view!

We do not know this. It would be great if any new developer could chime in here!

@rousskov
Copy link
Contributor

I see no progress in the last exchange -- just another round full of misinterpretations and false assumptions. We can probably clarify most of them in a few rounds, but I doubt its worth the cost, so I suggest that we pause. I hope that my next message will discuss a specific error example that demonstrates or dismisses the alleged PR problem. It will take time to construct that, but I do plan on doing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants