Databricks Notebook Promotion using Azure DevOps

Productionize Databricks Notebooks

Himansu Sekhar
road to data engineering

--

This article shows how to promote the Databricks notebooks to different stages of code promotion process using Azure Databricks and Azure DevOps.

Prerequisites:

To follow along please create the following:

I have already created all the above mentioned services as shown in the screenshot below.

Login to Azure DevOps portal and click on create new project and fill the form as shown below and hit create.

Click Repos and then click Initialize near the bottom to create the empty repository where we’ll link our notebooks.

Log in to your Azure Databricks workspace and click on your user icon in the top right corner then select User Settings.

Click the Git integration tab and select Azure DevOps Services to connect to your project repository.

Next, click on Workspace and right click to create a new notebook.

For now it doesn’t matter what kind of notebook you create because we’re not doing any coding just adding in some comments to show the process. Choose Python to ensure the remaining instructions and code will work. Hit create once done.

Once created, click on Git: Not Linked under Revision History.

Click Link and then type a name for your feature branch and click create branch. We’ll work on this branch and use main as the collaboration branch. Click Save to begin working.

Type a comment/code on the notebook.

Open your repository in Azure DevOps to see the main branch is empty and the feature branch has your comment in. From your Databricks workspace, click the Git: Synced button under Revision History. Click Create PR to create a pull request. This launches Azure DevOps (you can just create the PR in DevOps if you prefer, the link just opens the interface for you).

Select your feature branch and add comments as necessary then click create.

Approve the request and complete the merge to finish. You should now change your branch in Databricks to main or create a new feature branch. The old feature branch will be removed during the merge.

In Azure Databricks workspace, click your user icon and enter user settings. Under Access Tokens click Generate New Token to create the token Azure DevOps will use to securely connect to the Databricks API.

Give the token a name and select a suitable lifespan. Note that it should become a normal part of your administration to rotate these tokens.

Copy the generated token now as you won’t be able to access it again.

Open your Key Vault and click Secrets on the menu, then Generate/Import.Enter your secret and give the secret a descriptive name. Add in the expiration timestamp to ensure you have this recorded centrally.

In your DevOps project, click Pipelines and then Library. Next, click + Variable Group.

Give the group a name such as Azure Key Vault. Select to link secrets from an Azure Key Vault and then select your subscription and Key Vault from the drop down lists. You may need to click authorize to configure the connections. Add in the Databricks token variable and click save.

In Azure DevOps, click Pipelines and then Create Pipeline.

Click “Use the classic editor” to use the GUI to create this pipeline. Choose your repository and then the Master branch, since that’s our collaboration branch for this build.

Select Empty Job

Give your build a name such as Build Databricks Artifact.

Click the + next to Agent Job 1 and add a Publish build artifacts task.

Select the Notebooks directory in your repository as the path to publish, and name the artifact Notebooks.

Click Triggers on the menu and click Enable Continuous Integration. Select your master branch.

Click Save and Queue to complete the build task and create the first build. Add a comment such as “created build job” then click save and run. Your task should now run and build the first artifact with your notebook in it.

Click Pipelines, Releases and create your first release pipeline.

Click start with an empty job and name the first stage Testing. You can create identical jobs for pre-prod and production later.

Click the add artifacts button and then select your build pipeline which will show that it last created an artifact called notebooks.

Click the lightning icon next to the artifact to enable continuous deployment.

Click Variables on the menu and add in the variable group so that your pipeline can find the secret we set up earlier.

Click tasks on the menu to set up the job. Add a PowerShell task to the pipeline.

Configure the powershell task to use inline code and paste in the below code:

Finally, click Create Release on the menu. In future this won’t be necessary since we set up the trigger, but since we won’t have another build to start that trigger we need to manually start this one. Just click create and your tasks will run and deploy the notebook to your workspace using the $newNotebookName variable as the name. Since we only have one workspace in the demo, that’s where the notebook will go.

Great you have made it to the end of the tutorial, Thanks for the patience. I hope this tutorial helped you learn something new.

--

--

Himansu Sekhar
road to data engineering

Data Engineering | DevOps | DataOps | Distributed Computing