Skip to content

data-platform-hq/terraform-databricks-runtime

Repository files navigation

Databricks Premium Runtime Terraform module

Terraform module for creation Databricks Premium Runtime

Usage

Requires Workspace with "Premium" SKU

The main idea behind this module is to deploy resources for Databricks Workspace with Premium SKU only.

Here we provide some examples of how to provision it with a different options.

In example below, these features of given module would be covered:

  1. Workspace admins assignment, custom Workspace group creation, group assignments, group entitlements
  2. Clusters (i.e., for Unity Catalog and Shared Autoscaling)
  3. Workspace IP Access list creation
  4. ADLS Gen2 Mount
  5. Create Secret Scope and assign permissions to custom groups
  6. SQL Endpoint creation and configuration
  7. Create Cluster policy
  8. Create an Azure Key Vault-backed secret scope
  9. Connect to already existing Unity Catalog Metastore
# Prerequisite resources

# Databricks Workspace with Premium SKU
data "azurerm_databricks_workspace" "example" {
  name                = "example-workspace"
  resource_group_name = "example-rg"
}

# Databricks Provider configuration
provider "databricks" {
  alias                       = "main"
  host                        = data.azurerm_databricks_workspace.example.workspace_url
  azure_workspace_resource_id = data.azurerm_databricks_workspace.example.id
}

# Key Vault where Service Principal's secrets are stored. Used for mounting Storage Container
data "azurerm_key_vault" "example" {
  name                = "example-key-vault"
  resource_group_name = "example-rg"
}

# Example usage of module for Runtime Premium resources.
module "databricks_runtime_premium" {
  source  = "data-platform-hq/databricks-runtime-premium/databricks"

  project  = "datahq"
  env      = "example"
  location = "eastus"

  # Parameters of Service principal used for ADLS mount
  # Imports App ID and Secret of Service Principal from target Key Vault
  key_vault_id             =  data.azurerm_key_vault.example.id
  sp_client_id_secret_name = "sp-client-id" # secret's name that stores Service Principal App ID
  sp_key_secret_name       = "sp-key" # secret's name that stores Service Principal Secret Key
  tenant_id_secret_name    = "infra-arm-tenant-id" # secret's name that stores tenant id value

  # 1.1 Workspace admins 
  workspace_admins = {
    user = ["[email protected]"]
    service_principal = ["example-app-id"]
  }

  # 1.2 Custom Workspace group with assignments.
  # In addition, provides an ability to create group and entitlements.
  iam = [{
    group_name = "DEVELOPERS"
    permissions  = ["ADMIN"]
    entitlements = [
      "allow_instance_pool_create",
      "allow_cluster_create",
      "databricks_sql_access"
    ] 
  }]

  # 2. Databricks clusters configuration, and assign permission to a custom group on clusters.
  databricks_cluster_configs = [ {
    cluster_name       = "Unity Catalog"
    data_security_mode = "USER_ISOLATION"
    availability       = "ON_DEMAND_AZURE"
    spot_bid_max_price = 1
    permissions        = [{ group_name = "DEVELOPERS", permission_level = "CAN_RESTART" }]
  },
  {
    cluster_name       = "shared autoscaling"
    data_security_mode = "NONE"
    availability       = "SPOT_AZURE"
    spot_bid_max_price = -1
    permissions        = [{group_name = "DEVELOPERS", permission_level = "CAN_MANAGE"}]
  }]

  # 3. Workspace could be accessed only from these IP Addresses:
  ip_rules = {
    "ip_range_1" = "10.128.0.0/16",
    "ip_range_2" = "10.33.0.0/16",
  }
  
  # 4. ADLS Gen2 Mount
  mountpoints = {
    storage_account_name = data.azurerm_storage_account.example.name
    container_name       = "example_container"
  }

  # 5. Create Secret Scope and assign permissions to custom groups 
  secret_scope = [{
    scope_name = "extra-scope"
    acl        = [{ principal = "DEVELOPERS", permission = "READ" }] # Only custom workspace group names are allowed. If left empty then only Workspace admins could access these keys
    secrets    = [{ key = "secret-name", string_value = "secret-value"}]
  }]

  # 6. SQL Warehouse Endpoint
  databricks_sql_endpoint = [{
    name        = "default"  
    enable_serverless_compute = true  
    permissions = [{ group_name = "DEVELOPERS", permission_level = "CAN_USE" },]
  }]

  # 7. Databricks cluster policies
  custom_cluster_policies = [{
    name     = "custom_policy_1",
    can_use  =  "DEVELOPERS", # custom workspace group name, that is allowed to use this policy
    definition = {
      "autoscale.max_workers": {
        "type": "range",
        "maxValue": 3,
        "defaultValue": 2
      },
    }
  }]

  # 8. Azure Key Vault-backed secret scope
  key_vault_secret_scope = [{
    name         = "external"
    key_vault_id = data.azurerm_key_vault.example.id
    dns_name     = data.azurerm_key_vault.example.vault_uri
  }]  
    
  providers = {
    databricks = databricks.main
  }
}

# 9 Assignment already existing Unity Catalog Metastore
module "metastore_assignment" {
  source  = "data-platform-hq/metastore-assignment/databricks"
  version = "1.0.0"

  workspace_id = data.azurerm_databricks_workspace.example.workspace_id
  metastore_id = "<uuid-of-metastore>"

  providers = {
    databricks = databricks.workspace
  }
}

Requirements

Name Version
terraform >= 1.0
databricks ~>1.0

Providers

Name Version
databricks ~>1.0

Modules

No modules.

Resources

Name Type
databricks_cluster.this resource
databricks_cluster_policy.overrides resource
databricks_cluster_policy.this resource
databricks_entitlements.this resource
databricks_group.this resource
databricks_ip_access_list.allowed_list resource
databricks_mount.adls resource
databricks_permissions.clusters resource
databricks_permissions.policy resource
databricks_permissions.sql_endpoint resource
databricks_secret.main resource
databricks_secret.this resource
databricks_secret_acl.this resource
databricks_secret_scope.main resource
databricks_secret_scope.this resource
databricks_sql_endpoint.this resource
databricks_system_schema.this resource
databricks_token.pat resource
databricks_workspace_conf.this resource
databricks_current_metastore.this data source
databricks_group.account_groups data source
databricks_sql_warehouses.all data source

Inputs

Name Description Type Default Required
cloud_name Cloud Name string n/a yes
clusters Set of objects with parameters to configure Databricks clusters and assign permissions to it for certain custom groups
set(object({
cluster_name = string
spark_version = optional(string, "15.3.x-scala2.12")
spark_conf = optional(map(any), {})
spark_env_vars = optional(map(any), {})
data_security_mode = optional(string, "USER_ISOLATION")
aws_attributes = optional(object({
availability = optional(string)
zone_id = optional(string)
first_on_demand = optional(number)
spot_bid_price_percent = optional(number)
ebs_volume_count = optional(number)
ebs_volume_size = optional(number)
ebs_volume_type = optional(string)
}), {
availability = "ON_DEMAND"
zone_id = "auto"
first_on_demand = 0
spot_bid_price_percent = 100
ebs_volume_count = 1
ebs_volume_size = 100
ebs_volume_type = "GENERAL_PURPOSE_SSD"
})
azure_attributes = optional(object({
availability = optional(string)
first_on_demand = optional(number)
spot_bid_max_price = optional(number, 1)
}), {
availability = "ON_DEMAND_AZURE"
first_on_demand = 0
})
node_type_id = optional(string, null)
autotermination_minutes = optional(number, 20)
min_workers = optional(number, 1)
max_workers = optional(number, 2)
cluster_log_conf_destination = optional(string, null)
init_scripts_workspace = optional(set(string), [])
init_scripts_volumes = optional(set(string), [])
init_scripts_dbfs = optional(set(string), [])
init_scripts_abfss = optional(set(string), [])
single_user_name = optional(string, null)
single_node_enable = optional(bool, false)
custom_tags = optional(map(string), {})
permissions = optional(set(object({
group_name = string
permission_level = string
})), [])
pypi_library_repository = optional(set(string), [])
maven_library_repository = optional(set(object({
coordinates = string
exclusions = set(string)
})), [])
}))
[] no
custom_cluster_policies Provides an ability to create custom cluster policy, assign it to cluster and grant CAN_USE permissions on it to certain custom groups
name - name of custom cluster policy to create
can_use - list of string, where values are custom group names, there groups have to be created with Terraform;
definition - JSON document expressed in Databricks Policy Definition Language. No need to call 'jsonencode()' function on it when providing a value;
list(object({
name = string
can_use = list(string)
definition = any
}))
[
{
"can_use": null,
"definition": null,
"name": null
}
]
no
custom_config Map of AD databricks workspace custom config map(string)
{
"enable-X-Content-Type-Options": "true",
"enable-X-Frame-Options": "true",
"enable-X-XSS-Protection": "true",
"enableDbfsFileBrowser": "false",
"enableExportNotebook": "false",
"enableIpAccessLists": "true",
"enableNotebookTableClipboard": "false",
"enableResultsDownloading": "false",
"enableUploadDataUis": "false",
"enableVerboseAuditLogs": "true",
"enforceUserIsolation": "true",
"storeInteractiveNotebookResultsInCustomerAccount": "true"
}
no
default_cluster_policies_override Provides an ability to override default cluster policy
name - name of cluster policy to override
family_id - family id of corresponding policy
definition - JSON document expressed in Databricks Policy Definition Language. No need to call 'jsonencode()' function on it when providing a value;
list(object({
name = string
family_id = string
definition = any
}))
[
{
"definition": null,
"family_id": null,
"name": null
}
]
no
iam_account_groups List of objects with group name and entitlements for this group
list(object({
group_name = optional(string)
entitlements = optional(list(string))
}))
[] no
iam_workspace_groups Used to create workspace group. Map of group name and its parameters, such as users and service principals added to the group. Also possible to configure group entitlements.
map(object({
user = optional(list(string))
service_principal = optional(list(string))
entitlements = optional(list(string))
}))
{} no
ip_addresses A map of IP address ranges map(string)
{
"all": "0.0.0.0/0"
}
no
key_vault_secret_scope Object with Azure Key Vault parameters required for creation of Azure-backed Databricks Secret scope
list(object({
name = string
key_vault_id = string
dns_name = string
tenant_id = string
}))
[] no
mount_configuration Configuration for mounting storage, including only service principal details
object({
service_principal = object({
client_id = string
client_secret = string
tenant_id = string
})
})
{
"service_principal": {
"client_id": null,
"client_secret": null,
"tenant_id": null
}
}
no
mount_enabled Boolean flag that determines whether mount point for storage account filesystem is created bool false no
mountpoints Mountpoints for databricks
map(object({
storage_account_name = string
container_name = string
}))
{} no
pat_token_lifetime_seconds The lifetime of the token, in seconds. If no lifetime is specified, the token remains valid indefinitely number 315569520 no
secret_scope Provides an ability to create custom Secret Scope, store secrets in it and assigning ACL for access management
scope_name - name of Secret Scope to create;
acl - list of objects, where 'principal' custom group name, this group is created in 'Premium' module; 'permission' is one of "READ", "WRITE", "MANAGE";
secrets - list of objects, where object's 'key' param is created key name and 'string_value' is a value for it;
list(object({
scope_name = string
scope_acl = optional(list(object({
principal = string
permission = string
})))
secrets = optional(list(object({
key = string
string_value = string
})))
}))
[] no
sql_endpoint Set of objects with parameters to configure SQL Endpoint and assign permissions to it for certain custom groups
set(object({
name = string
cluster_size = optional(string, "2X-Small")
min_num_clusters = optional(number, 0)
max_num_clusters = optional(number, 1)
auto_stop_mins = optional(string, "30")
enable_photon = optional(bool, false)
enable_serverless_compute = optional(bool, false)
spot_instance_policy = optional(string, "COST_OPTIMIZED")
warehouse_type = optional(string, "PRO")
permissions = optional(set(object({
group_name = string
permission_level = string
})), [])
}))
[] no
suffix Optional suffix that would be added to the end of resources names. string "" no
system_schemas Set of strings with all possible System Schema names set(string)
[
"access",
"billing",
"compute",
"marketplace",
"storage"
]
no
system_schemas_enabled System Schemas only works with assigned Unity Catalog Metastore. Boolean flag to enabled this feature bool false no
workspace_admin_token_enabled Boolean flag to specify whether to create Workspace Admin Token bool n/a yes

Outputs

Name Description
clusters Provides name and unique identifier for the clusters
metastore_id The ID of the current metastore in the Databricks workspace.
sql_endpoint_data_source_id ID of the data source for this endpoint
sql_endpoint_jdbc_url JDBC connection string of SQL Endpoint
sql_warehouses_list List of IDs of all SQL warehouses in the Databricks workspace.
token Databricks Personal Authorization Token

License

Apache 2 Licensed. For more information please see LICENSE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages