Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenless V3 #533

Merged
merged 4 commits into from
Jun 11, 2024
Merged

Tokenless V3 #533

merged 4 commits into from
Jun 11, 2024

Conversation

joseph-sentry
Copy link
Contributor

@joseph-sentry joseph-sentry commented May 1, 2024

Purpose/Motivation

We don't want to make GH API calls when doing tokenless anymore

Links to relevant tickets

codecov/engineering-team#1574

What does this PR do?

  • Remove GitHub API calls from TokenlessAuthentication

@joseph-sentry joseph-sentry requested a review from a team as a code owner May 1, 2024 15:15
@codecov-qa
Copy link

codecov-qa bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 94.11765% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.45%. Comparing base (9b8d730) to head (eca8697).

✅ All tests successful. No failed tests found.

@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
- Coverage   91.49%   91.45%   -0.04%     
==========================================
  Files         615      615              
  Lines       16370    16354      -16     
==========================================
- Hits        14977    14957      -20     
- Misses       1393     1397       +4     
Flag Coverage Δ
unit 91.45% <94.11%> (-0.04%) ⬇️
unit-latest-uploader 91.45% <94.11%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
upload/views/reports.py 100.00% <ø> (ø)
codecov_auth/authentication/repo_auth.py 96.87% <96.42%> (-1.63%) ⬇️
upload/views/commits.py 97.14% <83.33%> (-2.86%) ⬇️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

Copy link

codecov-public-qa bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 94.11765% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 91.45%. Comparing base (9b8d730) to head (eca8697).

✅ All tests successful. No failed tests found ☺️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
- Coverage   91.49%   91.45%   -0.04%     
==========================================
  Files         615      615              
  Lines       16370    16354      -16     
==========================================
- Hits        14977    14957      -20     
- Misses       1393     1397       +4     
Flag Coverage Δ
unit 91.45% <94.11%> (-0.04%) ⬇️
unit-latest-uploader 91.45% <94.11%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
upload/views/reports.py 100.00% <ø> (ø)
codecov_auth/authentication/repo_auth.py 96.87% <96.42%> (-1.63%) ⬇️
upload/views/commits.py 97.14% <83.33%> (-2.86%) ⬇️

Impacted file tree graph

Copy link

codecov bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 94.11765% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95.91%. Comparing base (9b8d730) to head (eca8697).

✅ All tests successful. No failed tests found.

Files Patch % Lines
codecov_auth/authentication/repo_auth.py 96.42% 1 Missing ⚠️
upload/views/commits.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##               main       #533        +/-   ##
================================================
- Coverage   95.92000   95.91000   -0.01000     
================================================
  Files           793        793                
  Lines         17688      17671        -17     
================================================
- Hits          16968      16949        -19     
- Misses          720        722         +2     
Flag Coverage Δ
unit 91.45% <94.11%> (-0.04%) ⬇️
unit-latest-uploader 91.45% <94.11%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@giovanni-guidini
Copy link
Contributor

We need to be careful merging these changes to avoid re-introducing the bug of having an upload from a fork branch accidentally overwrite coverage in the upstream branch.

Considering the upload endpoints used by the CLI either we change the CLI to send the branch info every time (maybe through headers) OR we use the info we have to make the validation that the branch name is in the format fork:branch. If that is not the case, reject the upload.

Currently I believe it should be possible to make this check:

  • For the commit creation the branch should be part of the body
  • For the subsequent requests (create report and send the upload) we can pull the commit from the database and check its branch information.

Personally I feel that we should be able to accept / reject a request without having to look at the request body. So I'd opt to make the CLI send specific headers with the information we need.
(Also by removing the current checks maybe X-Tokenless-PR header is useless?)

@joseph-sentry
Copy link
Contributor Author

joseph-sentry commented May 2, 2024

So I'd opt to make the CLI send specific headers with the information we need.
(Also by removing the current checks maybe X-Tokenless-PR header is useless?)

We could just validate that the x-tokenless-pr header is in the format we expect and has the correct repo name (it matches the one in the url)?

@giovanni-guidini
Copy link
Contributor

X-Tokenless-PR is a number [1] that we used to get the correct PR from the provider. That does little for us considering the changes in this PR. If you just change the value the header name will be misleading.

X-Tokenless should not match the repo name in the URL if it comes from a fork [1], cause it's supposed to be the fork slug, while the slug in the URL should be the upstream's.

[1] https://github.com/codecov/codecov-cli/blob/7b028499a521f8df3cf3bb6b642246a19e07bee9/codecov_cli/services/report/__init__.py#L57-L58

@joseph-sentry joseph-sentry force-pushed the joseph/new-tokenless branch from 960927f to fecbe1b Compare May 6, 2024 20:06
@joseph-sentry joseph-sentry changed the title fix: don't make api calls in tokenless auth Tokenless V3 May 6, 2024
if ":" not in tokenless:
raise exceptions.AuthenticationFailed(tokenless_auth_failed_message)

# make sure it's backwards compatible with the old way that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be compatible if validation fails (in the line above) because "the old way" doesn't include a : in the X-Tokenless header, ...right?

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably belongs in another PR, but in addition to getting rid of the get_pull_request_info() check like you do here, we also want to get rid of the "recent CI run" check. that happens in upload/helpers.py. with that gone, i think we can delete the whole upload/tokenless directory

also, the other day you made a great point about the report/upload endpoints being able to get the branch from the commit rather than needing it to be sent by the CLI. i think we can take advantage of that to simplify this logic across the API, CLI, and CI actions

on the CLI side, we already have to send the branch name in the request body for the create-commit step, and we already modify the branch name for forks there. the other two steps include which commit they are working with, so the API will be able to look up the commit and get the branch from there. so none of the commands need to set X-Tokenless, and the action doesn't have to set any extra env vars. (i think this also avoids an edge case where an attacker passes authentication by passing X-Tokenless: fork:hahaha to the do-upload endpoint but the commit SHA they passed is actually a commit on main)

then on the API side, i think the only difference between the commit endpoint and the others is where we get the branch name from? if so, i think we can merge back into one TokenAuthentication class, and if the commit sha is part of the URL we get the branch from the commit DB object, otherwise we look for the branch in the request body because we're creating a new commit

my brain is a little fried right now so this comment might not be very useful. i'll do another pass tomorrow, or maybe we could hop on a call and talk about it

@joseph-sentry joseph-sentry force-pushed the joseph/new-tokenless branch from fecbe1b to b791fc6 Compare May 22, 2024 18:34
@codecov-notifications
Copy link

codecov-notifications bot commented May 22, 2024

Codecov Report

Attention: Patch coverage is 94.11765% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files Patch % Lines
codecov_auth/authentication/repo_auth.py 96.42% 1 Missing ⚠️
upload/views/commits.py 83.33% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can do this with one TokenAuthentication class. the regex with the commit sha already makes that part an optional group so the same regex should work for all three endpoints. the logic could be something like:

def _get_branch(self, request, commit_from_url):
    if commit_from_url:
        return commit_from_url.branch
    else:
        body = json.loads(str(request.body, "utf8"))
        return body.get("branch")

def authenticate(self, request):
    repo, commit_from_url = self._get_info_from_request_path(request)
    branch = self._get_branch(request, commit_from_url)

    if not branch:
        raise Whatever()
    ...

also, i left a couple comments in test files themselves but as a general note, can you make sure there are test cases that ensure a tokenless upload for a unprefixed branch name fails? the test cases covering tokenless rejections are all due to the repo being private, but we also want to reject based on the branch name

Comment on lines 270 to 279
# Validate provider
service = match.group(1)
try:
service_enum = Service(service)
# Currently only Github is supported
# TODO [codecov/engineering-team#914]: Extend tokenless support to other providers
if service_enum != Service.GITHUB:
raise exceptions.AuthenticationFailed(self.auth_failed_message)
raise exceptions.AuthenticationFailed(tokenless_auth_failed_message)
except ValueError:
raise exceptions.AuthenticationFailed(self.auth_failed_message)
raise exceptions.AuthenticationFailed(tokenless_auth_failed_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this stuff isn't necessary anymore. if we aren't making requests to the git provider it shouldn't matter who that provider is

we shouldn't close the ticket in the comment because the CLI still needs work to support other providers, but this logic here can go

upload/tests/views/test_commits.py Outdated Show resolved Hide resolved
upload/tests/views/test_commits.py Show resolved Hide resolved
@@ -100,6 +114,7 @@ def test_reports_post_no_auth(db, mocker):
repository = RepositoryFactory(
name="the_repo", author__username="codecov", author__service="github"
)
repository.private = False
token = "BAD"
commit = CommitFactory(repository=repository)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test never calls commit.save(). is it failing for the reason we expect, which is an invalid token? or is it failing because the commit isn't in the DB?

@joseph-sentry joseph-sentry force-pushed the joseph/new-tokenless branch from b791fc6 to f220f21 Compare May 30, 2024 16:36
raise exceptions.Throttled(
wait=retry_after, detail=self.rate_limit_failed_message
)
commitid = match.group(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to aggregate the match.group(x) calls into one paragraph so it's easier to see


def get_branch(self, request, commitid=None):
if commitid:
commit = Commit.objects.get(commitid=commitid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .get will throw an error if this doesn't exist instead of None

@joseph-sentry joseph-sentry force-pushed the joseph/new-tokenless branch 2 times, most recently from 113e734 to bdf2bf1 Compare June 3, 2024 15:22
Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you got rid of all of the exceptions that we used to throw? i think returning None means something different from raising an exception to Django. if we want to stop throwing we should probably do it as a separate change

aside from that, the implementation looks good! now just drilling into the tests because this area keeps biting us (and our users) and i want to make sure this is rock-solid so we can be free of it for a while haha. the tests generally exercise all the important behaviors, but with some tweaks we can be more confident that the system works the way we think and that the tests will start failing if any part of it breaks

  • logically, we have two inputs: private and prefixed_branch, and we want to test all combinations of those two inputs. all of them are tested at one point or another in your PR currently, but IMO we should have versions of all combos in all four test files:
    • private=False, prefixed_branch=True, auth succeeds
    • private=False, prefixed_branch=False, auth fails
    • private=True, prefixed_branch=True, auth fails
    • private=True, prefixed_branch=False, auth fails (we can actually skip this one; both of the failure reasons are already covered by other tests)
  • when deciding whether a branch is prefixed or not, we can get the branch from two different places. what should happen if the branch is found in both places? our tests should address that one way or the other:
    • write tests for get_branch() directly to guarantee that the request body's branch is ignored if the commit sha is passed in, and/or
    • in test cases that put the commit ID in the URL, put an "opposite" branch in the request body. in other words, if the URL commit's branch is prefixed, put an unprefixed branch in the request body. if the URL commit's branch is unprefixed, put a prefixed branch in the request body. that way, the test will fail if we ever start reading the branch name from the wrong place.

codecov_auth/authentication/repo_auth.py Show resolved Hide resolved
upload/views/commits.py Show resolved Hide resolved
codecov_auth/tests/unit/test_repo_authentication.py Outdated Show resolved Hide resolved
codecov_auth/tests/unit/test_repo_authentication.py Outdated Show resolved Hide resolved
codecov_auth/tests/unit/test_repo_authentication.py Outdated Show resolved Hide resolved
upload/tests/views/test_reports.py Outdated Show resolved Hide resolved
upload/tests/views/test_commits.py Outdated Show resolved Hide resolved
upload/tests/views/test_reports.py Outdated Show resolved Hide resolved
upload/tests/views/test_uploads.py Outdated Show resolved Hide resolved
upload/tests/views/test_uploads.py Outdated Show resolved Hide resolved
if a request is creating a commit then we require
the name of the branch it's creating a commit on to
contain a colon, because that means that it won't
affect the coverage of the branches that already belong
to that repository. This will only work for public
repositories.

Subsequent API calls to the reports and uploads endpoints
will check the branch and visibility of the repo those
requests are targetting.

Signed-off-by: joseph-sentry <[email protected]>
- consolidate Tokenless authentication methods
- TokenlessAuthencation now accepts GET requests without the user having
  to provide the branch.
- Ensures TokenlessAuthentication sends the clear error message every
  time it fails

Signed-off-by: joseph-sentry <[email protected]>
Signed-off-by: joseph-sentry <[email protected]>
Signed-off-by: joseph-sentry <[email protected]>
@joseph-sentry joseph-sentry force-pushed the joseph/new-tokenless branch from bdf2bf1 to eca8697 Compare June 6, 2024 20:31
Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

@joseph-sentry joseph-sentry added this pull request to the merge queue Jun 11, 2024
Merged via the queue into main with commit 7bed118 Jun 11, 2024
19 of 22 checks passed
@joseph-sentry joseph-sentry deleted the joseph/new-tokenless branch June 11, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants