Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable connect dvc to Google Drive. Access blocked! #10516

Open
psaboia opened this issue Aug 10, 2024 · 17 comments
Open

Unable connect dvc to Google Drive. Access blocked! #10516

psaboia opened this issue Aug 10, 2024 · 17 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push blocked help wanted p2-medium Medium priority, should be done, but less important

Comments

@psaboia
Copy link

psaboia commented Aug 10, 2024

Added by @shcheklein :

See details and workaround here - #10516 (comment)


Failed to authenticate GDrive: "This app is blocked"

Description

When I use DVC commands with a gdrive remote storage configuration, I encounter an issue where it's impossible to authenticate with my Google account.

Reproduce

After initiating the command

dvc get https://github.com/my-data-registry data/samples

a browser window opens for authentication, but upon selecting my Google account, I'm directed to a page displaying the message:

This app is blocked

This app tried to access sensitive info in your Google Account. To keep your account safe, Google blocked this access.

Environment information

Output of dvc doctor:

$ dvc doctor

Platform: Python 3.11.6 on macOS-13.5-arm64-arm-64bit
Subprojects:
	dvc_data = 3.15.2
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.4.0
	scmrepo = 3.3.7
Supports:
	gdrive (pydrive2 = 1.20.0),
	http (aiohttp = 3.10.2, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.10.2, aiohttp-retry = 2.8.3)
Config:
	Global: /Users/myself/Library/Application Support/dvc
	System: /Library/Application Support/dvc

I'm not sure if this is a bug, but any help with this issue would be greatly appreciated!

@fabricionarcizo
Copy link

I have precisely the same issue.

@psaboia psaboia changed the title Failed to authenticate GDrive: "This app is blocked" Unable connect dvc to Google Drive. Access blocked! Aug 14, 2024
@JohnConnor123
Copy link

JohnConnor123 commented Aug 14, 2024

Same problem

@shcheklein
Copy link
Member

TL;DR:

The DVC app (that is used by default by DVC) is blocked by Google because they changed some policies and we need pass the verifications again. There was nothing bad happening (like security breaches or violations) on our end. There is not easy way to pass it. For now the recommended way (and it was always the recommended way) - is to create a custom app. Here is the link. It's not very complicated and should work just fine for everyone.

Longer version

Tue, Nov 14, 2023 - Google reached out with this message:

As part of our commitment to user privacy and security, Google requires developers that use our APIs to demonstrate that their apps comply with our policies. We have identified that your app’s use of Restricted Drive API scopes may require additional verification steps.

Screenshot 2024-08-14 at 12 04 03 PM

DVC app indeed depends on the drive.files OAuth scope (that gives the full access to all the files / directories in the Google Drive). Since we don't know in advance which directory users would need to use a remote storage + for things like dvc import-url, dvc import (if a different remote is used).

  • all the tokens are stored locally, we don't use any servers, DVC team doesn't see them, etc. It is safe enough to our mind for the default mode, it's better of course to use the custom app otherwise as mentioned above.

Anyways, it would be better to have a more granular permissions. And it seems Google understand this, we also like it. The only issue is that there is no API or any way to let users pick a specific dir in CLI. Here is the relevant ticket for this. But it's not resolved yet.

So, we kinda stuck in limbo with this a bit - we can't pass verification (since they are requesting a video explainer where it's clear why we need drive.file), and we can't implement a granular scope management for the default app atm.

I'm open to any ideas on this.

Also a relevant discussion on the rclone forum - https://forum.rclone.org/t/google-drive-builtin-app-verification/43919/5 .

@psaboia
Copy link
Author

psaboia commented Aug 15, 2024

@shcheklein, thank you for the clarification! We will proceed with the custom app option.

@skshetry skshetry pinned this issue Aug 16, 2024
@tharhtetsan
Copy link

Same problem here

@psaboia
Copy link
Author

psaboia commented Aug 20, 2024

Same problem here

@tharhtetsan Find the answer here - #10516 (comment)

@ryukinix
Copy link

ryukinix commented Sep 9, 2024

😠 google disgraceful policy

@Drakunal
Copy link

the custom app using the Google cloud option works, but would have preferred the older way of authenticating with gdrive, which was fairly easy

@shcheklein shcheklein added p2-medium Medium priority, should be done, but less important and removed p1-important Important, aka current backlog of things to do labels Oct 5, 2024
@kell18
Copy link

kell18 commented Oct 22, 2024

Even the Google cloud option didn't work for me, it failed with ERROR: unexpected error - Failed to authenticate GDrive: 'access_token' during dvc push

@RodionfromHSE
Copy link

You can authenticate in google at your own. First, you need to create oauth client id (like here). Then, download the client id json and use the following code:

import json
from oauth2client.client import OAuth2WebServerFlow, flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import run_flow

# Path to your OAuth2 client_id.json file
CLIENT_SECRET_FILE = 'client_secret.json'

# The scope for the Google Drive API
SCOPES = ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/drive.appdata']

def get_token_oauth2client():
    # Load the client secrets from the JSON file
    flow = flow_from_clientsecrets(CLIENT_SECRET_FILE, scope=SCOPES)

    # Run the authentication flow and retrieve credentials
    storage = Storage('token_oauth2client.json')
    credentials = run_flow(flow, storage)

    with open('generated_token.json', 'w') as token_file:
        token_file.write(credentials.to_json())



    print("Token information saved to generated_token.json")

if __name__ == '__main__':
    get_token_oauth2client()

Then, you need to move generated_token.json in gdrive_user_credentials_file (look here).

Profit!

Yes, it's a clutch, but it's the only way I found so far.

cp: @kell18

@kell18
Copy link

kell18 commented Oct 26, 2024

Thanks for the answer @RodionfromHSE it'd work if I'd need to do it only once for myself, but it's for everyone in the team... I hope DVC will fix this issue soon!

@SchindlerTo
Copy link

SchindlerTo commented Nov 24, 2024

The solution from @RodionfromHSE is the only one that worked for me, thanks!
Related question/problem: following DVC documentation, the google token expires after 7 days. Any ideas how to extend this as re-authentification is always a hassle for my headless machines.

@ryukinix
Copy link

ryukinix commented Nov 25, 2024 via email

@shcheklein
Copy link
Member

@SchindlerTo @ryukinix take a look here iterative/PyDrive2#184 (comment) . I think that was a relevant discussion.

@canergen
Copy link

canergen commented Dec 12, 2024

Hi, thanks for the suggested fix.
I would like to be able to share uploaded dvc files from google-drive with users, who can't set this up. Files are to large for HuggingFace and google-drive is easy to set up. To clarify, these files are all publicly accessible. Is there a way to generate within the API a link that users can download a file using requests or gdown? https://dvc.org/doc/command-reference/get-url and other ways to download the file are unfortunately also blocked. It should be relatively easy to safe the exact URL and also to enable get-url and download public dvc-backed files.

@ryukinix
Copy link

ryukinix commented Dec 13, 2024 via email

@canergen
Copy link

@shcheklein Do you have any suggestion on how to output the URL instead of the file path from dvc on gdrive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push blocked help wanted p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests