-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Databricks] Supporting OAuth & Serverless compute #127
base: main
Are you sure you want to change the base?
Conversation
…ng to SDK for auth. This should enable full OAuth support.
R/databricks-utils.R
Outdated
# # Checks for OAuth Databricks token inside the RStudio API | ||
# if (is.null(token) && exists(".rs.api.getDatabricksToken")) { | ||
# getDatabricksToken <- get(".rs.api.getDatabricksToken") | ||
# token <- set_names(getDatabricksToken(databricks_host()), "oauth") | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled by SDK config component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, are we talking about this SDK? https://github.com/databricks/databricks-sdk-py/ And if so, can you point me to where it handles the RStudio token? I can't seem to find it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK won't detect the .rs.api.getDatabricks*
but maybe theres a gap in my understanding, I thought connect would also write to a config file as well, which the SDK should pickup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sending over this PR, it's looking great!
R/databricks-utils.R
Outdated
# # Checks for OAuth Databricks token inside the RStudio API | ||
# if (is.null(token) && exists(".rs.api.getDatabricksToken")) { | ||
# getDatabricksToken <- get(".rs.api.getDatabricksToken") | ||
# token <- set_names(getDatabricksToken(databricks_host()), "oauth") | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, are we talking about this SDK? https://github.com/databricks/databricks-sdk-py/ And if so, can you point me to where it handles the RStudio token? I can't seem to find it
@@ -71,22 +72,28 @@ spark_connect_method.spark_method_databricks_connect <- function( | |||
method <- method[[1]] | |||
token <- databricks_token(token, fail = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your comment on line 137, I think we should remove this line. And have token
only populated when the user passes it as an argument in the spark_connect()
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking for leaving this was that users explicitly setting the DATABRICKS_TOKEN
and DATABRICKS_HOST
vars should have those respected as it were set explicitly. The Databricks Python SDK won't detect those when its done from R.
databricks_token
function also looks for CONNECT_DATABRICKS_TOKEN
so i think its probably important to leave that intact?
I was expecting hierarchy to be:
- Explicit
token
DATABRICKS_TOKEN
CONNECT_DATABRICKS_TOKEN
.rs.api.getDatabricksToken(host)
- Python SDK explicit setting of profile
- Python SDK detection of
DEFAULT
profile
Where 1-4 are handled by databricks_token
R/sparklyr-spark-connect.R
Outdated
# sdk config | ||
conf_args <- list(host = master) | ||
# if token is found, propagate | ||
# otherwise trust in sdk to detect and do what it can? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we remove line 72, then this if
makes sense to leave
R/sparklyr-spark-connect.R
Outdated
conn <- exec(databricks_session, !!!remote_args) | ||
sdk_config <- db_sdk$core$Config(!!!conf_args) | ||
|
||
# unsure if this iss needed anymore? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to remove this from here, specially since we can't use httr2:::is_hosted_session()
since a :::
is not allowed. Do you think this is important for the package to do if the user is on desktop? If so, what do you think about isolating it in its own exported function? Maybe pysparklyr::databricks_desktop_login()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is required, will do some testing without.
Im reviewing this with a clearer mind and I think we should attempt to defer as much as possible to the SDK for Databricks auth and logic. Will have an attempt. |
Hey, want me to do another review, or are you still working on this? |
@edgararuiz its at the point where I'd appreciate some input if you have a spare moment. It's not finalised but want to ensure we agree on the direction / shape its taking! 🙏 |
@zacdav-db - Looking good. Feel free to remove |
@edgararuiz not passing tests locally, seems theres an install issue. Theres also one complication with using the SDK for auth everywhere, it requires the python env to be loaded, which depends on We probably don't want to force |
Not the prettiest first attempt, have added support for serverless and OAuth.
sdkConfig
as a mechanism to connect and authenticateFALSE
and currently still requiresversion
to be specified (this strictly isn't required)