Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the Git for Windows SDK initialization again #1841

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Dec 17, 2024

While waiting for way too many builds in https://github.com/gitgitgadget/git/actions to finish, I noticed that the minimal Git for Windows SDK initialization now takes a whopping 2.5 minutes. That's way too much. It used to take a little over a minute when uncached, and 2-5 seconds when cached.

Let's fix this regression by reverting to using the setup-git-for-windows-sdk GitHub Action (also because that Action will soon see another dramatic speed-up, see git-for-windows/setup-git-for-windows-sdk#965).

Cc: Patrick Steinhardt [email protected]

…K again

It used to be the case that initializing the minimal SDK (i.e. a
radically slimmed-down subset of Git for Windows' development
environment intended to perform the CI builds and little else) took
a bit over one minute, would then be cached, and subsequent jobs would
take at most half a dozen seconds to initialize said minimal SDK.

It is important that this step is fast because we have to run the test
suite in parallel, in a set of matrix jobs, to offset the slowness of
the shell-based test suite, and each and every job has to initialize the
very same minimal SDK.

While it may sound as if parallelizing the jobs might only waste the
generously-provided build minutes but at least the _wallclock_ time
would pass quick, in reality it matters a lot: Frequently Git for
Windows' or GitGitGadget PRs get stuck waiting for quite a while before
CI builds start because other PRs' builds still spend substantial
amounts of time to run, blocking due to the concurrency limit being
reached.

Since 91839a8 (ci: create script to set up Git for Windows SDK,
2024-10-09), the situation has worsened: every job that requires the
minimal Git for Windows SDK spends roughly two-and-a-half minutes doing
so.

With the switch away from the GitHub Action `setup-git-for-windows-sdk`,
we incurred more downsides:

- It is no longer possible for said Action to fix problems independently
  from the Git repository, e.g. when new rules about GitHub Actions
  require changes in the way the minimal SDK is initialized.

- The minimal SDK was installed specifically outside of the worktree so
  as not to clutter it nor incur an additional cost to verify that the
  worktree is clean.

Therefore, even if it would be nice to have a shared process between
GitHub and GitLab based CI builds, let's switch the GitHub-based CI back
to the tried-and-tested `setup-git-for-windows-sdk` Action.

This commit partially reverts 91839a8 (ci: create script to set up
Git for Windows SDK, 2024-10-09).

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Dec 17, 2024

/submit

Copy link

gitgitgadget bot commented Dec 17, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1841/dscho/ci-windows-jobs-speedup-v1

To fetch this version to local tag pr-1841/dscho/ci-windows-jobs-speedup-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1841/dscho/ci-windows-jobs-speedup-v1

Copy link

gitgitgadget bot commented Dec 17, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:

> This commit partially reverts 91839a88277 (ci: create script to set up
> Git for Windows SDK, 2024-10-09).

Thanks, will queue.

> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index 9301a1edd6d..916a64b6736 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -113,15 +113,13 @@ jobs:
>        cancel-in-progress: ${{ needs.ci-config.outputs.skip_concurrent == 'yes' }}
>      steps:
>      - uses: actions/checkout@v4
> -    - name: setup SDK
> -      shell: powershell
> -      run: ci/install-sdk.ps1
> +    - uses: git-for-windows/setup-git-for-windows-sdk@v1
>      - name: build
> -      shell: powershell
> +      shell: bash
>        env:
>          HOME: ${{runner.workspace}}
>          NO_PERL: 1
> -      run: git-sdk/usr/bin/bash.exe -l -c 'ci/make-test-artifacts.sh artifacts'
> +      run: . /etc/profile && ci/make-test-artifacts.sh artifacts
>      - name: zip up tracked files
>        run: git archive -o artifacts/tracked.tar.gz HEAD
>      - name: upload tracked files and build artifacts
> @@ -149,12 +147,10 @@ jobs:
>      - name: extract tracked files and build artifacts
>        shell: bash
>        run: tar xf artifacts.tar.gz && tar xf tracked.tar.gz
> -    - name: setup SDK
> -      shell: powershell
> -      run: ci/install-sdk.ps1
> +    - uses: git-for-windows/setup-git-for-windows-sdk@v1
>      - name: test
> -      shell: powershell
> -      run: git-sdk/usr/bin/bash.exe -l -c 'ci/run-test-slice.sh ${{matrix.nr}} 10'
> +      shell: bash
> +      run: . /etc/profile && ci/run-test-slice.sh ${{matrix.nr}} 10
>      - name: print test failures
>        if: failure() && env.FAILED_TEST_ARTIFACTS != ''
>        shell: bash
>
> base-commit: 631ddbbcbd912530e1b78e5d782e72879f7f1fb2

Copy link

gitgitgadget bot commented Dec 17, 2024

This patch series was integrated into seen via git@6a94bd7.

@gitgitgadget gitgitgadget bot added the seen label Dec 17, 2024
Copy link

gitgitgadget bot commented Dec 18, 2024

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Tue, Dec 17, 2024 at 12:33:10PM -0800, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <[email protected]>
> writes:
> 
> > This commit partially reverts 91839a88277 (ci: create script to set up
> > Git for Windows SDK, 2024-10-09).
> 
> Thanks, will queue.

Okay. Too bad that things regressed this badly with the change. I agree
that reverting is the right thing to do for now. I may revisit this
again in the next release cycle to investigate whether we can get it up
to par with the GitHub Actions. It would be great if the build infra was
shared between our CIs, so I think there's some value in it. But if the
answer is "no" then I guess that's ultimately fine, as well.

Thanks!

Patrick

Copy link

gitgitgadget bot commented Dec 18, 2024

This branch is now known as js/github-windows-setup-fix.

Copy link

gitgitgadget bot commented Dec 18, 2024

This patch series was integrated into seen via git@972174e.

Copy link

gitgitgadget bot commented Dec 18, 2024

This patch series was integrated into next via git@6d59340.

@gitgitgadget gitgitgadget bot added the next label Dec 18, 2024
Copy link

gitgitgadget bot commented Dec 19, 2024

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Patrick,

On Wed, 18 Dec 2024, Patrick Steinhardt wrote:

> On Tue, Dec 17, 2024 at 12:33:10PM -0800, Junio C Hamano wrote:
> > "Johannes Schindelin via GitGitGadget" <[email protected]>
> > writes:
> >
> > > This commit partially reverts 91839a88277 (ci: create script to set up
> > > Git for Windows SDK, 2024-10-09).
> >
> > Thanks, will queue.
>
> Okay. Too bad that things regressed this badly with the change. I agree
> that reverting is the right thing to do for now. I may revisit this
> again in the next release cycle to investigate whether we can get it up
> to par with the GitHub Actions.

The way I implemented it should be eminently possible in PowerShell, too.

Something like `Start-Process`, launching not an `Invoke-WebRequest` but
instead `C:\Windows\system32\curl.exe` [*1*] to download the `.tar.gz`
file (not the `.zip` file, more about that below). Another `Start-Process`
should then be opened that executes `tar.exe` [*2*], and then the `stdout`
of the former should be piped to the latter [*3*].

I do think that it makes a lot of sense to start extracting as soon as the
first byte arrives instead of storing the archive as a temporary file and
extracting it only once it has arrived completely.

Now, why not use the `.zip` file instead of the `.tar.gz` file? In my
analysis [*4*], I pointed out that the `.zip` file is about 10MB smaller,
after all, and BSD tar (at least the version in `C:\Windows\system32`) is
able to handle those, too, right? Not so fast. In my experiments, when
streaming the `.zip` file to the `tar.exe -xf -` process, the `etc/`
and `.sparse/` directories were consistently dropped. A bug, I guess. I
ran out of time to investigate this in more depth.

Since the artifacts are now hosted in a GitHub Release and updated
regularly, and since those updates cannot be atomic (you can only upload a
release asset if no asset of the same name exists already, read: the
automation has to _delete_ that asset before uploading a new version), it
would also be good to kind of expect that the file may be intermittently
absent and add a back-off strategy [*5*].

> It would be great if the build infra was shared between our CIs, so I
> think there's some value in it. But if the answer is "no" then I guess
> that's ultimately fine, as well.

It _could_ be done. But the advantages of having it versioned outside of
the Git repository outweigh the benefits of that shared infrastructure, I
believe.

Ciao,
Johannes

Footnote *1*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/6db65223de699c4f75ab083f82f43947a53ad6ff#diff-6855ef61b94227f9264adab3ff9f2de95c2d7b4e451019cc0105896d32550eb0R58-R73

Footnote *2*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/6db65223de699c4f75ab083f82f43947a53ad6ff#diff-6855ef61b94227f9264adab3ff9f2de95c2d7b4e451019cc0105896d32550eb0R77-R86

Footnote *3*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/6db65223de699c4f75ab083f82f43947a53ad6ff#diff-6855ef61b94227f9264adab3ff9f2de95c2d7b4e451019cc0105896d32550eb0R88

Footnote *4*:
https://github.com/git-for-windows/git-sdk-64/pull/87/commits/fdb0cea373893ce7d40bcfcfbeb7fd091a3c4020

Footnote *5*:
https://github.com/git-for-windows/setup-git-for-windows-sdk/pull/965/commits/3d4ea07041d0740b21160a9d9a4181f569e706d8

Copy link

gitgitgadget bot commented Dec 19, 2024

This patch series was integrated into seen via git@819a28a.

Copy link

gitgitgadget bot commented Dec 20, 2024

There was a status update in the "New Topics" section about the branch js/github-windows-setup-fix on the Git mailing list:

Revert recent changes to the way windows environment is set up for
GitHub CI.

Will merge to 'master'.
source: <[email protected]>

Copy link

gitgitgadget bot commented Dec 21, 2024

This patch series was integrated into seen via git@fc27f9a.

Copy link

gitgitgadget bot commented Dec 22, 2024

This patch series was integrated into seen via git@5f6765d.

Copy link

gitgitgadget bot commented Dec 22, 2024

This patch series was integrated into seen via git@6c7681b.

Copy link

gitgitgadget bot commented Dec 23, 2024

This patch series was integrated into seen via git@bad5d1a.

Copy link

gitgitgadget bot commented Dec 23, 2024

This patch series was integrated into master via git@bad5d1a.

Copy link

gitgitgadget bot commented Dec 23, 2024

This patch series was integrated into next via git@bad5d1a.

@gitgitgadget gitgitgadget bot added the master label Dec 23, 2024
@gitgitgadget gitgitgadget bot closed this Dec 23, 2024
Copy link

gitgitgadget bot commented Dec 23, 2024

Closed via bad5d1a.

@dscho dscho deleted the ci-windows-jobs-speedup branch December 23, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant