-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks Sample - Terraform IaC for Azure Databricks and Asset Bundle Deployment via CI/CD #911
base: main
Are you sure you want to change the base?
Changes from all commits
22247d7
931373d
70dd64e
820b75d
1513ade
0c19567
b8fdf45
3754af7
254ff18
56824a9
fa4e787
080ca8e
ed1e96a
c7f4208
3c8f9e4
47eb1bd
d874dd6
9417ead
e32e68d
50f2bcf
36cb922
82da343
309e849
784b520
e30e28b
7a2695b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Azure Databricks | ||
|
||
[Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/) is a data analytics platform optimized for the Microsoft Azure cloud services platform which lets you set up your Apache Spark™ environment in minutes, and enable you to autoscale, and collaborate on shared projects in an interactive workspace. | ||
|
||
## Samples | ||
|
||
- [IaC deployment of Azure Databricks](./databricks_ci_cd/README.md) - This sample demonstrates how to deploy an Azure Databricks environment using ARM templates. | ||
- [IaC Deployment of Azure Databricks using Terraform](./databricks_terraform/README.md) - This sample demonstrates how to deploy an Azure Databricks environment using Terraform and promote the source code using Databricks Asset Bundles to different environments. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Use Ubuntu Image | ||
FROM mcr.microsoft.com/devcontainers/python:3.11-bullseye | ||
|
||
# Update and install required system dependencies | ||
RUN apt update \ | ||
&& apt install -y sudo vim software-properties-common curl unzip\ | ||
&& apt clean | ||
|
||
# Copy and install dev dependencies | ||
COPY requirements-dev.txt /tmp/requirements-dev.txt | ||
RUN pip install -r /tmp/requirements-dev.txt && \ | ||
rm /tmp/requirements-dev.txt | ||
|
||
# Set the working directory | ||
WORKDIR /workspace | ||
|
||
# Default command | ||
CMD ["/bin/bash"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
{ | ||
"name": "Python DevContainer", | ||
"dockerFile": "Dockerfile", | ||
"context": "..", | ||
"features": { | ||
"ghcr.io/devcontainers/features/terraform:1": { | ||
"installTerrafromDocs": true | ||
}, | ||
"ghcr.io/devcontainers/features/azure-cli:1": { | ||
"extensions": "" | ||
}, | ||
"ghcr.io/devcontainers/features/github-cli:1": {}, | ||
"ghcr.io/audacioustux/devcontainers/taskfile:1": {} | ||
}, | ||
"customizations" :{ | ||
"vscode": { | ||
"extensions": [ | ||
"yzhang.markdown-all-in-one", | ||
"DavidAnson.vscode-markdownlint", | ||
"-dbaeumer.vscode-eslint" | ||
] | ||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
name: "Asset Bundle Dev Deployment" | ||
|
||
on: | ||
workflow_run: | ||
workflows: ["Asset Bundle Sandbox Deployment"] | ||
types: | ||
- completed | ||
|
||
env: | ||
ENV: dev | ||
WORKING_DIR: single_tech_samples/databricks/databricks_terraform/ | ||
|
||
jobs: | ||
deploy: | ||
name: "Deploy bundle" | ||
runs-on: ubuntu-latest | ||
environment: development | ||
defaults: | ||
run: | ||
working-directory: ${{ env.WORKING_DIR }} | ||
if: | | ||
github.event.workflow_run.conclusion == 'success' && | ||
github.event.workflow_run.head_branch == 'main' | ||
|
||
steps: | ||
- name: Checkout Repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Databricks CLI | ||
uses: databricks/setup-cli@main | ||
|
||
- name: Azure Login Using Service Principal | ||
uses: azure/login@v2 | ||
with: | ||
creds: ${{ secrets.AZURE_DEV_CREDENTIALS }} | ||
|
||
- name: Deploy Databricks Bundle | ||
run: | | ||
databricks bundle validate -t ${{ env.ENV }} -o json | ||
databricks bundle deploy -t ${{ env.ENV }} | ||
working-directory: . | ||
env: | ||
DATABRICKS_BUNDLE_ENV: ${{ env.ENV }} | ||
|
||
- name: Install Task | ||
uses: arduino/setup-task@v2 | ||
with: | ||
version: 3.x | ||
repo-token: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Set Test Flows | ||
run: task collect-tests | ||
|
||
- name: Run test workflows | ||
run: task run-tests | ||
env: | ||
# gets test_flows from Set Test Flows step | ||
# and passes to the run-tests task | ||
test_flows: ${{ env.test_flows }} | ||
# bundle file required variables | ||
DATABRICKS_BUNDLE_ENV: ${{ env.ENV }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
name: "ADB Asset Bundle CI Linting" | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- "single_tech_samples/databricks/databricks_terraform/**" | ||
|
||
env: | ||
UV_VERSION: ">=0.4.26" | ||
PYTHON_VERSION: "3.11" | ||
|
||
jobs: | ||
linting: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout the repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Install uv | ||
uses: astral-sh/setup-uv@v3 | ||
with: | ||
enable-cache: true | ||
version: ${{ env.UV_VERSION }} | ||
cache-dependency-glob: "**/requirements**.txt" | ||
|
||
- name: Install Python and Dependencies | ||
run: | | ||
uv python install ${{ env.PYTHON_VERSION }} | ||
uv tool install ruff | ||
|
||
- name: Run Ruff Lint | ||
run: | | ||
uv run ruff check single_tech_samples/databricks/databricks_terraform |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
name: "Asset Bundle Sandbox Deployment" | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
paths: | ||
- "single_tech_samples/databricks/databricks_terraform/**" | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- "single_tech_samples/databricks/databricks_terraform/**" | ||
|
||
env: | ||
ENV: sandbox | ||
WORKING_DIR: single_tech_samples/databricks/databricks_terraform/ | ||
|
||
jobs: | ||
deploy: | ||
name: "Deploy bundle" | ||
runs-on: ubuntu-latest | ||
environment: sandbox | ||
|
||
defaults: | ||
run: | ||
working-directory: ${{ env.WORKING_DIR }} | ||
|
||
steps: | ||
- name: Checkout Repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Databricks CLI | ||
uses: databricks/setup-cli@main | ||
|
||
- name: Azure Login Using Service Principal | ||
uses: azure/login@v2 | ||
with: | ||
creds: ${{ secrets.AZURE_INT_CREDENTIALS }} | ||
|
||
- name: Deploy Databricks Bundle | ||
run: | | ||
if [ "${{ github.event_name }}" == "pull_request" ]; then | ||
databricks bundle validate -t ${{ env.ENV }} -o json | ||
elif [ "${{ github.event_name }}" == "push" ]; then | ||
databricks bundle deploy -t ${{ env.ENV }} -o json | ||
fi | ||
env: | ||
DATABRICKS_BUNDLE_ENV: ${{ env.ENV }} | ||
|
||
- name: Install Task | ||
if: github.event_name == 'push' | ||
uses: arduino/setup-task@v2 | ||
with: | ||
version: 3.x | ||
repo-token: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
- name: Set Test Flows | ||
if: github.event_name == 'push' | ||
run: task collect-tests | ||
|
||
- name: Run test workflows | ||
if: github.event_name == 'push' | ||
run: task run-tests | ||
env: | ||
# gets test_flows from Set Test Flows step | ||
# and passes to the run-tests task | ||
test_flows: ${{ env.test_flows }} | ||
# bundle file required variables | ||
DATABRICKS_BUNDLE_ENV: ${{ env.ENV }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Terraform Code for Multi Environment Databricks Medallion Deployment | ||
|
||
![Multi Environment Image](../images/architecture.png) | ||
|
||
[Visio Drawing](https://microsoft.sharepoint.com/:u:/t/ExternalEcolabKitchenOS/EWM3kB69NGBBiy2s563pjJ0BeKWy1qgtgEznRvvufiseFg?e=RieWOu) | ||
|
||
## Overview | ||
|
||
**`Infra/modules`** folder has three modules: | ||
- **`adb-workspace`** - Deploys Databricks workspace | ||
- **`metastore-and-users`** - Creates Databricks Connector, Creates Storage Account, Give storage access rights to connector, Creates Metastore / Assigns Workspace to Metastore, and Finally Retrieves alls users, groups, and service principals from Azure AD. | ||
- **`adb-unity-catalog`** - Gives databricks access rights to the connector, Creates containers in the storage account, and creates external locations for the containers. Creates unity catalog and grants permissions user groups. Finally, creates **`bronze` `silver` `gold`** schemas under the catalog and gives the required permissions to the user groups. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the script fails with an error - lack of permissions or something else - is the script idempotent? Meaning, can we re-run and the script will continue where it left of? Can we make a note about that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, script can be run where its left off. It references the state files of each terraform model when running, and based on the state it can continue where its left off. I will make a note of that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a note on that in READ.me of Infra, that the end of the doc |
||
**NOTE** - *When **`adb-workspace`** module runs it creates databricks workspace, and by default it creates a metastore in the same region. Databricks allows only **ONE METASTORE** per region. **`metastore-and-users`** module deploys new metastore with our required configurations, but we have to delete existing metastore prior running the module* | ||
|
||
**NOTE** - *During script execution you will receive `Error: cannot create metastore: This account with id <Account_ID> has reached the limit for metastores in region <Region>` * error. This is because we have reached the limit of metastores in the region. To fix this, we need to delete existing metastore and re-run the script.* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a new "Known Issues" section at the end and add this note there? |
||
|
||
## How to Run | ||
|
||
### Pre-requisites | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add security pre-requisites for the account? What RBAC or other privileges on the Azure subscriptions does the user that will run the script needs to have? |
||
- `Infra/deployment/.env` - Update the values as per your requirement | ||
- Have databricks admin level access. Login to get databricks account id [accounts.databricks.net](https://accounts.azuredatabricks.net/) | ||
|
||
### Steps | ||
|
||
1. Login to Azure | ||
```bash | ||
az login | ||
``` | ||
|
||
2. Set the subscription | ||
```bash | ||
az account set --subscription <subscription-id> | ||
``` | ||
|
||
3. Change directory to `Infra/deployment` | ||
```bash | ||
cd Infra/deployment | ||
``` | ||
|
||
4. Make the script executable | ||
```bash | ||
chmod +x dev.deploy.sh | ||
``` | ||
|
||
5. Run the script to deploy the modules sequentially | ||
```bash | ||
./dev.deploy.sh | ||
``` | ||
|
||
## Destroy | ||
|
||
### Steps | ||
|
||
1. Change directory to `Infra/deployment` | ||
```bash | ||
cd Infra/deployment | ||
``` | ||
2. Make the script executable | ||
```bash | ||
chmod +x dev.destroy.sh | ||
``` | ||
3. Run the script to destroy the modules by passing | ||
```bash | ||
./dev.destroy.sh --destroy | ||
``` | ||
|
||
## Error Handling | ||
|
||
In case of any script fails during resource creation, try rerun the script. It will reference the local state files, and will try again to create the resources. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
region="" | ||
environment="dev" | ||
subscription_id="" | ||
resource_group_name="" | ||
metastore_name="" | ||
account_id="" # login https://accounts.azuredatabricks.net/ to get the account id. | ||
prefix="dev" | ||
|
||
# Ensure these groups exist in Azure EntraId. | ||
# Make sure you are a member of account_unity_admin group when running a script locally. | ||
aad_groups='["account_unity_admin","data_engineer","data_analyst","data_scientist"]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [linkspector] reported by reviewdog 🐶
Cannot reach ./databricks_ci_cd/README.md. Status: 404 Cannot find: ./databricks_ci_cd/README.md