Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unity Catalog] - Change/test Explore, Standardize and Transform notebooks #983

Open
3 tasks done
thesqlpro opened this issue Dec 19, 2024 · 3 comments · May be fixed by #997
Open
3 tasks done

[Unity Catalog] - Change/test Explore, Standardize and Transform notebooks #983

thesqlpro opened this issue Dec 19, 2024 · 3 comments · May be fixed by #997

Comments

@thesqlpro
Copy link
Contributor

thesqlpro commented Dec 19, 2024

Task for #765
Three notebooks that use mount paths to process data potentially need updating to work with Unity Catalog. Research and testing is required for this task. Currently each of the notebooks uses a mount path that is in ADLS to perform transformation/data engineering tasks.

DoD

  • Explore.py notebook works with Unity Catalog
  • Standardize.py notebook works with Unity Catalog
  • Transform.py notebook works with Unity Catalog
@ydaponte
Copy link
Collaborator

@thesqlpro, please do add a DoD and fill in the metadata.

@thesqlpro thesqlpro added the P1 High Priority label Dec 23, 2024
@thesqlpro
Copy link
Contributor Author

So far these notebooks function as expected with Unity Catalog as the mounts are created properly in the setup.py

@thesqlpro thesqlpro changed the title [Unity Catalog] - Change/test Standardize and Transform notebooks [Unity Catalog] - Change/test Explore, Standardize and Transform notebooks Dec 26, 2024
@thesqlpro
Copy link
Contributor Author

For explore.py no changes are needed. The mount points work fine as long as the ADF pipeline runs and stages the data in the proper paths.
For the other two notebooks, just a simple fix of adding USE CATALOG sensordata at the beginning of the notebook. Additionally to that, there needs to be some user permissions changes.

These are the permissions needed:

image

Because the SPN from ADF is going to be generated it will be difficult to put together the permissions needed. This is because the Data Control Language (DCL) is SQL based in the notebook and has some difficulties with dynamic SQL (having to inject the SPN somehow into the GRANT syntax for granting privileges). The APIs for setting permissions currently not supporting Unity Catalog actions (must be done through UI or SQL).

My solution is to grant these permissions to All account users since this catalog is only for testing purposes. This can be added to the setup.py (Sql) or the readme (via UI).

Currently available APIs for permissions and Unity Catalog.
https://docs.databricks.com/api/azure/workspace/permissions
https://docs.databricks.com/api/azure/workspace/catalogs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants