-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add failfast mode #2308
base: master
Are you sure you want to change the base?
Add failfast mode #2308
Conversation
Commit adds new command line option, -K, that aborts all running jobs after first failed job.
Fixing missing parentheses.
Reporting on the current status of this PR: On Linux, it works flawless! The build stops on first error, which is great! Very nice work! On Windows, it stops on first error, but I do get a terminal prompt:
I am running |
Thanks for the checks, @emmenlau! As I read some userexchange discussions, this message on Windows CMD is "by design". :-( https://superuser.com/a/35701/982937 Do you have any suggestions how to continue here? |
Yep, after some googling I was also under the impression that CTRL-C is not really awesome for these kind of things.
Well, I'm really no expert, but sending CTRL-C to oneself seems anyways a "hard" way to end the whole process tree. Are these even separate processes or "just" threads, or even something else? What other ways does Win32 provide to kill them? Do we even need to kill, or can we just send an "interrupt" signal and catch that on the receiving side to exit the processes? For example https://stackoverflow.com/a/1173396/7200132 has some suggestions. But I did not look into ninja's code, I'm not even sure what kind of processes we are talking about, and how to cascade the kill to sub-processes that ninja started... |
Dear @kuchara , this would be a nice feature for us, but I'm not knowledgeable on how to proceed with this feature on Windows. Are you planning to look into this? Thanks for the nice work! |
Previous attempt has problems on passing signals on Windows. This time, reusing similar functions as used when closing (Clear(), which is expected to interrupt its running children). For posix, when there is no external signal, 0 is passed to kill() syscall (thus doing nothing). To mitigate this, setting SIGINT signal just before calling Clear(), which is essentially simulating SIGINT handling.
Hello @emmenlau ! I'd love to finish this. Finally was able to spent some more time to inspect ninja source code, and I've noticed that regular Clear() function done in build.cc is actually sending signals to its children, so decided to give it a try. Could you try the newest change I've just uploaded? If that does not work, my only suggestion would be to enable this feature only for posix systems, until there will be someone who could help with Windows case. |
Awesome @kuchara , this works very nicely on Windows! I have not tested Linux and MacOSX yet, but the change looks very promising! Great work! |
|
||
void SubprocessSet::Abort() { | ||
SetInterruptedFlag(SIGINT); | ||
Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patch. Can you ensure that SIGINT is also sent to the console subprocess, if there is one?
Otherwise if the failure comes from a non-console command, the console subprocess will continue to run and block Ninja.
Let me clarify:
Ninja launches subprocesses in their own process group, with the exception of commands in the "console" pool. These are launched in the same process group as Ninja itself. This is done to ensure that when Ctrl-C is pressed, both Ninja and the console subprocess receive a SIGINT (I suspect this is required for proper terminal handling too, but I don't remember the details).
By default, the console subprocess receiving the SIGINT would exit, while Ninja would catch the SIGINT, report it when exiting DoWork() to the Builder(), which ends up invoking Clear(). The latter relays the SIGINT to all non-console subprocesses to stop them as well (hence the use_console check in that function).
With your code, if the failure comes from a non-console subprocess, calling SetInterruptedFlag(SIGINT) will tell the next DoWork() call to report an interrupt, but will not send SIGINT to the console process which will continue running. And because it also shared the stdout / stderr descriptors, it will continue printing to the terminal. And Ninja will likely be stuck waiting for it in the Subprocess destructor (which ends up calling waitpid()).
It looks like something similar is true for Win32.
Also, because this is very subtle, I strongly recommend adding a robust unit test for this behavior to your PR.
For example one with a long console subprocess that sleeps for 10 seconds, and another non-console one that fails immediately. The test should verify that Ninja exits immediately with -K. You should also test the case where the console process fails first of course.
Hope this helps
Commit adds new command line option, -K, that aborts all running jobs after first failed job.
This implements #2083.