lab | ||||
---|---|---|---|---|
|
In this lab, you will learn how to secure a Synapse Analytics workspace and its supporting infrastructure. You will observe the SQL Active Directory Admin, manage IP firewall rules, manage secrets with Azure Key Vault and access those secrets through a Key Vault linked service and pipeline activities. You will understand how to implement column-level security, row-level security, and dynamic data masking when using dedicated SQL pools.
After completing this lab, you will be able to:
- Secure Azure Synapse Analytics supporting infrastructure
- Secure the Azure Synapse Analytics workspace and managed services
- Secure Azure Synapse Analytics workspace data
This lab will guide you through several security-related steps that cover an end-to-end security story for Azure Synapse Analytics. Some key take-aways from this lab are:
-
Leverage Azure Key Vault to store sensitive connection information, such as access keys and passwords for linked services as well as in pipelines.
-
Introspect the data that is contained within the SQL Pools in the context of potential sensitive/confidential data disclosure. Identify the columns representing sensitive data, then secure them by adding column-level security. Determine at the table level what data should be hidden from specific groups of users then define security predicates to apply row level security (filters) on the table. If desired, you also have the option of applying Dynamic Data Masking to mask sensitive data returned in queries on a column by column basis.
Before starting this lab, you must complete at least the setup steps in Lab 4: Explore, transform, and load data into the Data Warehouse using Apache Spark.
This lab uses the dedicated SQL pool you created in the previous lab. You should have paused the SQL pool at the end of the previous lab, so resume it by following these instructions:
-
Open Azure Synapse Studio (https://web.azuresynapse.net/).
-
Select the Manage hub.
-
Select SQL pools in the left-hand menu. If the SQLPool01 dedicated SQL pool is paused, hover over its name and select ▷.
-
When prompted, select Resume. It will take a minute or two to resume the pool.
-
Continue to the next exercise while the dedicated SQL pool resumes.
Important: Once started, a dedicated SQL pool consumes credits in your Azure subscription until it is paused. If you take a break from this lab, or decide not to complete it; follow the instructions at the end of the lab to pause your SQL pool!
Azure Synapse Analytics (ASA) is a powerful solution that handles security for many of the resources that it creates and manages. In order to run ASA, however, some foundational security measures need to be put in place to ensure the infrastructure that it relies upon is secure. In this exercise, we will walk through securing the supporting infrastructure of ASA.
The SQL Active Directory Admin can be a user (the default) or group (best practice so that more than one user can be provided these permissions) security principal. The principal assigned to this will have administrative permissions to the SQL Pools contained in the workspace.
-
In the Azure Portal (https://portal.azure.com), browse to your lab resource group, and from the list of resources open your Synapse workspace (do not launch Synapse Studio).
-
On the left menu, select Azure Active Directory and observe who is listed as a SQL Active Directory Admin. Is it a user or group?
Having robust Internet security is a must for every technology system. One way to mitigate internet threat vectors is by reducing the number of public IP addresses that can access the Azure Synapse Analytics Workspace through the use of IP firewall rules. The Azure Synapse Analytics workspace will then delegate those same rules to all managed public endpoints of the workspace, including those for SQL pools and SQL Serverless endpoints.
-
In the Azure Portal, on the blade for your Synapse workspace, select Networking.
-
Notice that an IP Firewall rule of Allow All has already been created for you in the lab environment. If you wanted to add your specific IP address you would instead select + Add Client IP from the taskbar menu (you do not need to do this now).
Note: When connecting to Synapse from your local network, certain ports need to be open. To support the functions of Synapse Studio, ensure outgoing TCP ports 80, 443, and 1433, and UDP port 53 are open.
When dealing with connectivity to external data sources and services, sensitive connection information such as passwords and access keys should be properly handled. It is recommended that this type of information be stored in an Azure Key Vault. Leveraging Azure Key Vault not only protects against secrets being compromised, it also serves as a central source of truth; meaning that if a secret value needs to be updated (such as when cycling access keys on a storage account), it can be changed in one place and all services consuming this key will start pulling the new value immediately. Azure Key Vault encrypts and decrypts information transparently using 256-bit AES encryption, which is FIPS 140-2 compliant.
-
In the Azure Portal, open the resource group for this lab, and from the list of resources, select the Key vault resource.
-
On the left menu, under Settings, select Access Policies.
-
Observe that Managed Service Identity (MSI) representing your Synapse workspace (it has a name similar to asaworkspacexxxxxxx) has already been listed under Application and it has 4 selected Secret Management Operations.
-
Select the drop-down that reads 4 selected under Secret Management Operations, observe that Get (which allows your workspace to retrieve the values of secrets from Key Vault) and List (which allows your workspace to enumerate secrets) are set.
Linked Services are synonymous with connection strings in Azure Synapse Analytics. Azure Synapse Analytics linked services provides the ability to connect to nearly 100 different types of external services ranging from Azure Storage Accounts to Amazon S3 and more. When connecting to external services, having secrets related to connection information is almost guaranteed. The best place to store these secrets is the Azure Key Vault. Azure Synapse Analytics provides the ability to configure all linked service connections with values from Azure Key Vault.
In order to leverage Azure Key Vault in linked services, you must first add your key vault resource as a linked service in Azure Synapse Analytics.
-
In Azure Synapse Studio, select the Manage hub from the left menu.
-
Beneath External Connections, select Linked Services, observe that a Linked Service pointing to your Key Vault has been created in the environment.
Since we have the Azure Key Vault set up as a linked service, we can leverage it when defining new linked services. Every New linked service provides the option to retrieve secrets from Azure Key Vault. The form requests the selection of the Azure Key Vault linked service, the secret name, and (optional) specific version of the secret.
It is recommended to store any secrets that are part of your pipeline in Azure Key Vault. In this task you will retrieve these values using a Web activity, just to show the mechanics. The second part of this task demonstrates using a Web activity in the pipeline to retrieve a secret from the Key Vault.
-
Return to the Azure portal.
-
In the blade for your asakeyvaultxxxxxxx Azure Key Vault resource, and select Secrets from the left menu. Then, in the top toolbar, select + Generate/Import.
-
Create a secret, with the name
PipelineSecret
and assign it a value ofIsNotASecret
, and select the Create button. -
Open the secret that you just created, drill into the current version, and copy the value in the Secret Identifier field. Save this value in a text editor, or retain it in your clipboard for a future step.
-
Switch back to Synapse Studio, then select the Integrate hub from the left menu.
-
On the Integrate pane, in the + menu, select Pipeline.
-
On the Pipeline tab, in the Activities pane search for Web and then drag an instance of a Web activity to the design area.
-
Select the Web1 web activity, and select the Settings tab. Fill out the form as follows:
-
URL: Paste the Key Vault Secret Identifier value you copied in step 4 above, then append
?api-version=7.1
to to the end of this value. For example, it should look something like:https://asakeyvaultNNNNN.vault.azure.net/secrets/PipelineSecret/f808d4fa99d84861872010f6c8d25c68?api-version=7.1
. -
Method: Select Get.
-
For Authentication select Managed Identity. We have already established an Access Policy for the Managed Service Identity of our Synapse workspace, this means that the pipeline activity has permissions to access the key vault via an HTTP call.
-
Resource: Enter https://vault.azure.net
-
-
From the Activities pane, add a Set variable activity to the design surface of the pipeline.
-
On the design surface of the pipeline, select the Web1 activity and drag a Success activity pipeline connection (green box) to the Set variable1 activity.
-
With the pipeline selected in the designer (e.g., neither of the activities are selected), select the Variables tab and add a new String parameter named
SecretValue
. -
Select the Set variable1 activity and select the Variables tab. Fill out the form as follows:
-
Debug the pipeline by selecting Debug from the toolbar menu. When it runs observe the inputs and outputs of both activities from the Output tab of the pipeline.
Note: On the Web1 activity, on the General tab there is a Secure Output checkbox that when checked will prevent the secret value from being logged in plain text, for instance in the pipeline run, you would see a masked value ***** instead of the actual value retrieved from the Key vault. Any activity that consumes this value should also have their Secure Input checkbox checked.
Transparent Data Encryption (TDE) is a feature of SQL Server that provides encryption and decryption of data at rest, this includes: databases, log files, and back ups. When using this feature with Synapse Analytics dedicated SQL pools, it will use a built-in symmetric Database Encryption Key (DEK) that is provided by the pool itself. With TDE, all stored data is encrypted on disk, when the data is requested, TDE will decrypt this data at the page level as it's read into memory, and vice-versa encrypting in-memory data before it gets written back to disk. As with the name, this happens transparently without affecting any application code. When creating a dedicated SQL pool through Synapse Analytics, Transparent Data Encryption is not enabled. The first part of this task will show you how to enable this feature.
-
In the Azure Portal, open your resource group, then locate and open the SqlPool01 dedicated SQL pool resource.
-
On the SQL pool resource blade, select Transparent data encryption from the left-hand menu. DO NOT turn on data encryption.
By default, this option is turned off. When you enable data encryption on this dedicated SQL pool, the pool is taken offline for a few minutes while TDE is applied.
It is important to identify data columns that hold sensitive information. Types of sensitive could be social security numbers, email addresses, credit card numbers, financial totals, and more. Azure Synapse Analytics allows you define permissions that prevent users or roles select privileges on specific columns.
- In Azure Synapse Studio, in the Develop hub, expand the SQL scripts section, and select Column Level Security.
- In the toolbar, connect to the SQLPool01 database.
- In the query window, run each step individually by highlighting the statement(s) in the step in the query window, and selecting the Run button from the toolbar (or press F5).
- Close the script tab. If prompted select Discard all changes.
- In the Develop hub, in the SQL scripts section, select Row Level Security.
- In the toolbar, connect to the SQLPool01 database.
- In the query window, run each step individually by highlighting the statement(s) for the step in the query window, and selecting the Run button from the toolbar (or press F5).
- Close the script tab. If prompted select Discard all changes.
- In the Develop hub, in the SQL scripts section, select Dynamic Data Masking.
- In the toolbar, connect to the SQLPool01 database.
- In the query window, run each step individually by highlighting the statement(s) for the step in the query window, and selecting the Run button from the toolbar (or press F5).
- Close the script tab. If prompted select Discard all changes.
Complete these steps to free up resources you no longer need.
-
In Synapse Studio, select the Manage hub.
-
Select SQL pools in the left-hand menu. Hover over the SQLPool01 dedicated SQL pool and select ||.
-
When prompted, select Pause.