DP-200 Implementing an Azure Data Solution Exam preparation guide

 Background

Most of us are working from home due to the ongoing COVID-19 situation. This has saved some time in travelling for me personally. I managed to utilize some of this time to prepare for Azure Role-based certifications. In the past 3 months, I managed to clear AZ-301, AZ-400 and DP-200 certification exams. Many people have asked me about how did I go about preparing for these certifications. This post and the accompanying YouTube video is about my preparation for the DP-200 exam.

DP-200 Implementing an Azure Data Solution Exam

In case you are new to Microsoft Role-based certifications, have a look at this excellent post by Thomas Maurer about selecting the right exam. The DP 200 exam is focused on Data Engineer role and falls under the Data & AI category of tests. It is for the Associate level. I have been part of Data Engineering teams since late 2015. I have worked with Big Data technologies on Hortonworks Data Platform and also on Microsoft Azure. The exam consists of 3 main areas:

  • Implement data storage solutions (40-50%)
  • Manage and develop data processing (25-30%)
  • Monitor and optimize data solutions (30-35%)

Your skills are tested against following core services from Microsoft Azure:
  • Azure Data Lake Storage (ADLS) Gen 2
  • Azure Data Factory
  • Azure Databricks
  • Azure SQL
  • Cosmos DB
  • Azure Stream Analytics
  • Azure Key Vault
  • Azure Monitor
Since the test is focused towards implementing the Data solutions, we need to know the different options available for securing, managing, integrating and monitoring the solutions. I used the Linux Academy course for DP 200 to prepare for this exam. If you are new to Azure I recommend going through the MS Learn paths related to DP 200. Following learning paths are available on MS Learn

We need to understand batch processing, stream processing, structured & unstructured data, different APIs supported by Cosmos DB, different consistency levels. In terms of Data Integrations, it is important to understand the capabilities of Azure Data Factory. For Data Processing, Azure Databricks, Synapse Analytics and Stream Analytics play a very important role. If you are not familiar with the stream processing and real time data integration, focus on understanding the different Windowing mechanisms like sessions window, sliding window, tumbling window, hopping window etc. 

The questions are all multiple choice questions (MCQ) based on case studies. For some questions, there is more than one correct answer and in such cases, we are also required to sequence the steps in the correct order. An example could be the data integration process which requires 5 steps. We are required to sequence all the steps correctly. The practice is very important due to this requirement.

Refer to the below YouTube video for more details about how I prepared for this exam


Conclusion

It is very important to practice as much as possible. The questions are not straightforward and involve selecting the right choices as well as making the right sequence of steps. I would also recommend using more than one reference material. Do not rely only on one source of information. I prefer to combine the eLearning courses from one of the platforms like Linux Acadamy, Pluralsight, Udemy etc. with MS Learn. I also do not recommend using dumps. In my opinion, dumps reduce the actual value of the certification. Please don't take shortcuts when learning new technology. 

Until next time, Code with Passion and Strive for Excellence.

spacer

Install and upgrade software on Windows using Chocolatey

Background

There are multiple ways of installing software packages. These can include common utilities such as 7-zip or Winzip, browsers such as Google Chrome, Firefox, Brave etc. We also install editors like Notepad Plus Plus, Visual Studio Code, terminals such as Fluent Terminal, cmder etc. The list can just go on and on. Things get complicated when you move from one operating system to another like MacOS or Linux. Lets add one more twist by adding the processor architecture or type 32 bit or 64 bit, Intel or AMD processors.

The way we install software could vary based on many of the factors listed above. Even for technical people, it can become a challenge sometimes to identify the right version of the software to download and install. This is where a package manager can be quite handy.

Chocolatey

In this post, we will focus the attention to a package manage specific to the Windows operating system. A package manager helps to search for packages, install the package along with its dependencies, identify outdated packages, uninstall the package, pin the version of the software and many other features. 

I have been using Chocolatey to install and upgrade the versions of more than 75+ software. I also managed to automate the setting up of a new Windows machine using chocolatey. The source code for this can be found in my Github repository.

Demo

In the YouTube video below see Chocolatey in action where we use it to search for packages, list all the installed packages, find information about the packages and upgrade packages as well as extensions for Visual Studio Code.


Conclusion

Using package manager to install software can make out life much easier. We do not need to visit the websites to look for the appropriate package, the dependencies get resolved automatically and we can identify the outdated packages easily. I hope you found this tip useful.

Until next time, Code with Passion and Strive for Excellence
spacer

How to Manage Kubernetes Secrets with Azure Key Vault

Background

There are different ways in which we can manage environment specific settings related to containerized applications. Kubernetes provides ConfigMaps and Secrets as two options to manage environment configuration. ConfigMaps are good for storing key value pairs in plain text. When we are dealing with sensitive information containing connection strings, user name and password, certificates etc. these should be stored in encrypted form. Kubernetes Secret objects stores data in Base64 encoded form. 

Extend Kubernetes Secrets by storing them externally in Azure Key Vault (AKV)

Storing secrets in encrypted form provides first line of defense. As the popularity of Kubernetes increases, the tools surrounding the ecosystem are also improving on regular basis. More and more vendors are providing extensions of their services to work with Kubernetes.

One such area is the way secrets are managed. In an enterprise scenario, we might use a secure Key Vault to store keys externally. Azure Key Vault (AKV) and HashiCorp Vault are examples of such key vaults. In this demo we are going to use Azure Key Vault. The AKV allows us to store
  • Keys
  • Secrets
  • Certificates
The keys, secrets and certificates are stored in a secure manner and we can provide very fine grained access to different users.

The Azure Kubernetes Service (AKS) is used to provision a managed Kubernetes cluster with 1.18.2 Kubernetes version. We are also using Azure Container Registry (ACR) to store the docker images for the application containers. In AKS cluster is created using Managed Identity which assigns an Identity to the VMSS agent pool. We use this managed identity and grant only the Get privilege to retrieve the secrets stored in AKV.

The secrets from AKV are pulled when the pod is created as part of the Kubernetes deployment. We use a Secrets Store Container Storage Interface (CSI) driver. Azure Key Vault Provider  for Secrets Store CSI Driver specifies Azure related properties. The secrets are synched with Kubernetes secret object. These are mounted as Volume Mount in the pod. Finally the data from Kubernetes secret is passed into Environment variables used by the pod.

Demo

All the above functionality is demonstrated as part of the YouTube video on Integrating AKV with AKS.


Conclusion

The image below depicts the 5 step process to integrate AKV with AKS.

spacer

Integrate Azure Container Registry with AKS in 5 easy steps

Background

When we start working with Docker and Kubernetes, we need a container registry to publish out images. Most people start with the public DockerHub registry. This is good for getting started. however, as we become more proficient with container images and orchestration using something like Kubernetes, we need enterprise grade features. The public container registries do not provide all the enterprise grade features. In such scenario, we need to choose private container registry. In this post we will see how to integrate private Azure Container Registry (ACR) with Azure Kubernetes Service (AKS) cluster.

Advantages  of using a private container registry with ACR

Similar to other private container registries, ACR provides following features:

  • Support for Docker and Open Container initiative (OCI) images
  • Simplify container lifecycle management
    • Build
    • Store
    • Secure
    • Scan
    • Replicate
  • Connect across environments
    • Azure Kubernetes Service (AKS)
    • Azure Redhat OpenShift
    • Azure Services (App Service ,Machine Learning, Azure Batch)


Integrate ACR with AKS using Admin User

In this YouTube video, I demonstrate how to integrate with ACR using 5 easy steps.

Integrate ACR with AKS using Admin User


The 5 steps demonstrated in the video are as follows

2 steps to integrate ACR with AKS


We use Admin user to push images to ACR registry using Docker login. The images are then pulled to AKS cluster using the Managed Identity associated with the AKS cluster. The Managed Identity is granted ACR Pull role when we create the AKS cluster using the --attach-acr flag with az aks create command.

Authenticate ACR with AKS using Managed Identity


Integrate ACR with AKS using AAD identity

After I published the video on YouTube, Sergio Rodrigo shared a blog about Build & Pull Docker images to ACR. I replied to his tweet on Twitter suggesting that the readers of his blog could benefit from the video version of my post. This Tweet caught the eye of Steve Lasker who is the PM on ACR team in Microsoft. Steve suggested that instead of using Admin User to connect to the ACR registry to push images, there is a better way. By default, the Admin User access is disable when we create a new ACR registry.

We can make use of our own identity linked to an Azure Active Directory (AAD) to authenticate with ACR. When we login to ACR using the AAD identity, a token is generated and our local docker config file is updated with the token. We can push the images using this token. This eliminates the need for enabling the Admin User for ACR.

Based on this suggestion I created an updated video.

The change in authentication method is highlighted in the diagram below


Conclusion

Private container registries are helpful in preserving the intellectual property of an organization. We need not have to publish the IP of organization to publicly available container registry. It also helps in improving the security posture by providing role based access control (RBAC). We can separate activities like who can push images to the registry and who can pull them. 

I hope these two videos were helpful in improving the readers understanding about integrating ACR with AKS.

Until next time, Code with Passion and Strive for Excellence.


spacer

Youtube–Autoscaling containers with KEDA on AKS

Background

Over the last few months, it has become difficult to dedicate time to blogging. I also wanted to explore the option of doing interactive videos. As a result, I have started YouTube channel and posted few videos so far. First of all I started with a 3 part series on Autoscaling containers with Kubernetes-based Event Driven Autoscaling (KEDA).

KEDA playlist

During last 6 months or so, I happened to deliver multiple sessions at different community events about event driven auto-scaling applications using containers and Kubernetes. KEDA is an interesting project. I built or rather refactored one of my sample application to run with KEDA.

In this example I built a .Net core producer which generates a configurable number of messages and pushes them to a RabbitMQ queue. This is developed as a .Net Core API project. On the consumer side, we can configure the .Net consumer to pick a batch of messages and process them. This is where event driven auto-scaling feature of KEDA comes into the picture. Based on the number of messages present in the queue, KEDA scales the consumer instances.

The producer, consumer and the RabbitMQ broker are all deployed onto a Kubernetes cluster. The Kubernetes cluster is a managed cluster deployed using Azure Kubernetes Service (AKS). In the 3 parts we cover following topics:

Part 1 – Provision AKS cluster

We look at all the software that I use in order to develop and deploy the application. We also look at the generic PowerShell script which automates the AKS cluster provisioning.


Part 2 – Deploy application containers

In this video we look at the code how it interacts with the RabbitMQ broker from the producer and the consumer side. We build the docker images using Docker-compose and publish them to DockerHub public container registry.


Part 3 – KEDA autoscale in action

In the final part of the series we look at the KEDA architecture, steps to deploy KEDA using Helm and deploy autoscale application based on the number of messages.


Conclusion:

I hope that the video series provides a much better option compared to taking screenshots and putting them in the blogpost. It also gives me better opportunity to express myself and provides another medium to share the knowledge. Hope you like the series. in case you find the content useful, please share it and subscribe to my YouTube channel.

The complete source code for the series is available on my Github repo named pd-tech-fest-2019. I am very proud to say that this is also one of the official samples listed in the KEDA project.

Until next time, Code with Passion and Strive for Excellence.

spacer

Upgrade SQL Server 2017 to 2019 with HA for containers

Background

I have been using SQL Server 2017 Linux container image for quite some time on my learnings related to Docker and Kubernetes. Earlier, I had written about how to build a custom SQL Server 2017 Docker image and also how to integrate SQL Server 2017 Linux with ASP.Net Core. Microsoft has announced the preview of SQL Server 2019 with Always on Availability groups feature. I decided to upgrade from 2017 to 2019 with HA features and deploy it to Kubernetes cluster on Azure Kubernetes Service (AKS). This post is about the changes required to migrate from the older SQL 2017 Linux version to the 2019 preview.

In this post we will be performing following activities
  • Deploy Kubernetes cluster with 4 nodes on Azure using Azure Kubernetes Service (AKS)
  • Deploy SQL Server 2019 Linux with Always On Availability Group
  • Update the application code to redirect to SQL 2019 instance from SQL 2017
  • Quick look at the differences between 2017 & 2019 SQL Server Linux containers

I will be using the code from my AKS learning series repository. Note that the code is in feature/sql2019 branch and has not yet been merged into the master.

1 - Deploy Kubernetes cluster with 4 nodes on Azure using Azure Kubernetes Service (AKS)

I have got used to creating AKS cluster frequently for different demos and presentations. Instead of typing the lengthy commands on terminal, I developed a Powershell script which does all the heavy lifting. There are other bunch of powershell scripts which I used quite often. These are available under the Powershell folder in the repo. I use initializeAKS.ps1 script to create AKS cluster with following configuration
  • Resource Group Name – techtalksgr
  • Resource Group Location – South East Asia
  • Cluster Name – techtalkscluster
  • DNS Name prefix – techtalksdns
  • Worker node count – 4
  • Kubernetes version – 1.13.5
All these parameters can be overwritten at the time of invoking the Powershell script. Apart from these the script also uses following parameters which are currently hardcoded
  • VM Node size – Standard_D2_V2
  • Kubernetes addon – http_application_routing
Along with AKS cluster creation, the script also sets up the kube config with the credentials required to run the kubectl commands. It also creates a cluster role binding so that the kubernetes dashboard can be accessed by using a proxy url.

In order to run the script, you can clone the repo and execute initializeAKS.ps1 from the Powershell command prompt. You should see an output similar to the screenshot shown below


After the resource group is created, it will take few minutes for the AKS cluster to spin up depending on the size of the cluster. In my case to provision the 4 node cluster it took about 10-15 mins. Once the initialization script is completed, the output should look similar to the screenshot below





The green lines in the output show that the cluster has been successfully created with Kubernetes version 1.13.5. With the cluster ready lets get started with deploying the SQL Server 2019 with Always on Availability Groups.

2 - Deploy SQL Server 2019 Linux with Always On Availability Group

In order to deploy SQL Server 2019 Linux with Always On Availability Groups, we need to take following actions
  • Deploy SQL Server Operator
  • Create sa passowrd and master key password
  • Deploy SQL Server custom resource
  • Deploy load balancing services

All these steps are documented in the Always On Availability groups for SQL Server containers deploy documentation. Once again I have automated the steps listed in the docs as part of an Powershell script. The script is a series of kubectl commands which deploys the required objects to the Kubernetes cluster. The script is part of the overall application stack which deploys the SQL Database, the Web API and the ASP.Net Core front end. The script is available in Github as deployTectTalks-AKS.ps1.


Upon successful execution of the script we will get an output similar to the below screenshot

We can verify that the required objects are created by querying the kubernetes pods using Kubernetes command line

kubectl get pods –namespace ag1





We can see that the mssql initialize pod has completed and the operator and mssql pods are running successfully. As part of this deployment the mssql pod is running two replicas. Apart from the the pods, there are multiple services also deployed to the Kubernetes cluster. 

We can query these services with the command
kubectl get svc –namespace ag1
Initially we will see that it takes some time to assign the external IP to the load balanced services.




After few minutes we will see that all the services are assigned external IP.




Once we have the IP address assigned to the ag1-primary service, we can use that IP to connect to the SQL Server 2019 from the application.


3 - Update the application code to redirect to SQL 2019 instance from SQL 2017

The way TechTalks application works is the TechTalksAPI sits in between the web front end and the database. I will use the technique of Init container to initialize the database when the TechTalks API related objects are deployed to Kubernetes. This is the same approach I have used with SQL 2017 as well. Note down the public IP of the ag1-primary service which we will need to update in the deployment file for the api-deployment.yml. Replace the server IP and the Data Source in the connection string with the IP address as 104.215.193.33. With this change we will be able to connect to the SQL server instance with Availability Group.

There is one more minor change I had to make to the initialize-database.sql script which I trigger via the init container. I put in a check to see if the current version of the SQL server is SQL 2019. If that is the case, we will add the TechTalksDB to an availability group named AG1. This makes the initialize database script compatible with both SQL Server 2017 and 2019.




With these modifications we are all set to deploy the TechTalksAPI. Navigate to the k8s/AKS/TechTalksAPI folder in the source code and apply the Kubernetes manifest files using the command
kubectl apply –-recursive –f . –namespace ag1

The API deployment and service objects will be created as shown below




We now have all the components of the application stack deployed to AKS cluster. Lets browse the front end using the IP of the TechTalksWeb which in our case is 104.215.185.167. If everything went fine, we should see output similar to

We have successfully connected to the SQL Server 2019 database with Availability Group on Kubernetes.

4 – Quick look at the differences between 2017 & 2019 SQL Server Linux containers

The SQL Server 2019 Linux container is still in preview. It offers benefits over the SQL 2017 version by providing built in high availability by means of Availability Groups. The SQL operator makes it easier to support features such as health checks and automatic selection of primary and secondary replicas in case of failover. With SQL Server 2017 Kubernetes manages the failover via statefulsets. it can take some time to recover from pod or node failure in 2017. Whereas the recovery is much faster in SQL 2019.
There was one difference I encountered between SQL 2017 & 2019. With the 2017 version, I was able to access the SQL Server via its service name for e.g. db-deployment from the TechTalksAPI deployment. I need not have to know the IP address of the container. This is not the case with Availability Group service. I need to know the IP to access SQL server 2019. I hope this will be addressed by the time SQL 2019 goes GA.

Conclusion

Migrating from SQL 2017 to SQL 2019 took some time. The number of steps to get SQL 2017 up and running on Kubernetes is quite straightforward with a Statefulset deployment and a service objects which can be accessed by API or front end application. SQL 2019 requires additional steps in terms of deploying the operator, SQL server custom objects and also availability group services. Although these additional steps can be tedious to debug, they add much needed resiliency and makes the database highly available in the event of node or pod failure. I hope the readers will find it easier with the help of Powershell script which handles some of these complexities to deploy and experiment with the SQL 2019 Linux features on containers.

Until next time, Code with Passion and Strive for Excellence.
spacer

My mantra to clear AZ-300 Azure Architect Technologies certification

Background


Recently I cleared the AZ-300: Microsoft Azure Architect Technologies certification. Many people have asked me how did I go about preparing for the certification exam and what all resources I used to clear the exam. This post is about the approach I took to prepare for this exam. I’ll share my experience with the following topics
  • Skills measured
  • Online courses
  • Hands-on labs
  • Sample test
  • Notes


Skills measured

The first thing to do while preparing for the exam is to understand what skills are measured as part of this test. Head over to the test details at https://www.microsoft.com/en-us/learning/exam-AZ-300.aspx and scroll down to the section which takes us through the skills measured section. It tells us about the % of questions that are likely to appear in the test from different sections. Here is the quick summary of the main topics
  • Deploy and configure infrastructure (25-30%)
  • Implement workloads and security (20-25%)
  • Create and deploy apps (5-10%)
  • Implement authentication and secure data (5-10%)
  • Develop for the cloud and for Azure Storage (20-25%)
In summary 80% of this exam is around the topics related to infrastructure, networking, security and storage.


Online courses

There are multiple options when it comes to online courses. Everybody has their pros and cons. Even before taking the certifications, I have personally used multiple online learning platforms including Pluralsight, Udemy, Edx, Linkedin learning etc. Particularly for this exam, I referred to 3 main online resources
  1. Pluralsight learning path for AZ-300
  2. Udemy course by Scott Duffy
  3. Microsoft Learning with Hands-on Labs

Pluralsight learning path for AZ-300

Pluralsight is one of my favourites when it comes to technical learning. When I started looking  for online resources, I was happy to see Pluralsight had already put together a learning path which is a collection of courses related to AZ-300 exam
.
There are 28 different courses authored by multiple authors. The total content length is more than 60 hours if you wish to cover each and every course. Most of the courses are short and to the point like 1 to 3 hours of length. There are few which are quite lengthy running more than 5 hours. One of the difficulties I found with Pluralsight courses is that there is no predefined structure. You can take them up in any order. It is difficult to understand which course should be taken up first and which ones later.


Udemy course by Scott Duffy

Personally, I always find it easier to refer to multiple resources. I got to know from a couple of other folks who were also preparing for the same exam that Udemy has courses material. A few months back I had subscribed to the 70-353 exam course. Since Microsoft replaced the older exam with AZ-300, the author Scott Duffy had restructured his course to cater to the needs of the new exam.




I found the course structure to be very well laid out and it helped me quite a lot. The content may not be very deep in this course but it covers all the basics required to get through the exam. There are also a lot of references provided for additional learning.


Microsoft learning with Hands-on Labs

Not many people are aware that Microsoft Learning offers free educational content. This is provided on a different axis like based on role or based on technology or product. I selected the Solution Architect role based learning. The content is very specific to the point.

I like the part where you can perform Hands-on Labs without having an Azure subscription. You are given temporary Azure account based on the duration of the topic and you can familiarize yourself with the capabilities of Azure. This is not just limited to the portal, there are cases when predefined VMs are also created with required tools in order to complete the required tasks.




Microsoft learning also has a bit of gamification. Every course you complete, you get some points in the form of experience. Although the points cannot be redeemed its a good incentive to collect more points and indirectly it shows how many features related to Azure you have covered via Microsoft learning resources.


Hands-on Labs

Although the online courses are helpful, I am one of those people who does not feel comfortable unless I try it out myself. In most cases, I used my Azure subscription to get familiar with the concepts. In some cases, it is not possible to do so. Like in case of migrating an on premise VM to cloud. In such cases, I had to rely on the online course. Thing related to Storage accounts, Virtual Networks, VMs, app services, containers is all easily doable using the Azure subscription.

I would highly recommend practising using your own subscription as much as possible. There are multiple reasons for this. First one is that many of the online courses are recorded more than 6 months before. Some of the features discussed in the module may not be available in the current Azure portal. Secondly, there are two sections in the exam where you will be required to perform tasks by accessing live Azure portal. If you have never tried it yourself, I can bet that you will not be able to perform these live tasks during the actual exam.


Sample Test

I think there is an option provided by some test providers to retake the test once in case you do not clear it in the first attempt. In my case, I had used my MVP benefits to get some discount on the online test. There was no such option for me. Which meant that if I did not clear the test in the first attempt, I had to repay the full amount again.

Luckily I found a sample test on Udemy by Scott Duffy and Riaan Lowe named AZ-300 Azure Architecture Technologies Practice Test. It has 2 tests with 50 questions each. Although the format does not match the actual exam, it still gives you an idea of what kind of questions to expect. It was also helpful for me to focus on areas which I was not very comfortable. One point to note is that this exam does not contain the live lab feature and is only restricted to the multiple choice type question.



I had never taken any online certification test before. So this sample test really helped me a lot. If you have already taken other online certifications, maybe you will have a different experience.


Notes

Every individual has different ways of studying. Some learn by reading books, some by watching videos. Others are good at learning by discussing with peers. I personally like to take notes in digital form. One Note is my favourite notes taking app. I made good use of One note’s note taking abilities during the preparation. Apart from using it on multiple devices like iPad pro and Surface Book, I found the handwritten notes along with highlighters and combination of images useful. Here is an example of my handwritten notes using One Note




Conclusion

This is was the first time I was taking any certification exam. In the past, I have never been a fan of certifications. I am thankful to Puneet Ghanshani for changing my perception of certifications. On multiple occasions, I found my discussions with Mayur Tendulkar very helpful. He was the one who suggested that I try Udemy along with other resources.

The certification exam looks at the breadth of the resources around Azure. I was able to complete the exam well before the stipulated time of 3 hours. If I were to go back and prepare for this exam again, I would follow a slightly different path. I would start with Udemy course and use Microsoft Learning in parallel with that. Then I would go through the Pluralsight courses and do the hands-on labs for deep dive kind of sessions.

During the exam, there are different scenario based questions. At first, I found it a bit difficult to understand the structure of the test. You are given a case study and then there are supporting documents like existing infrastructure, technical requirements and target scenario. The multiple choice questions are then related to these sections. The different sections are separated into a multi-tabbed interface. If you do not go through each of the tabs, the multiple choice questions do not make sense. It took me some time to understand this pattern, but once it was understood the rest of the exam went well.

I feel relieved that the monkey is off the back. It is a good starting point and I hope to continue with few more certifications. I hope that the people who are looking to clear this exam find it useful.
Until next time, Code with Passion and Strive for Excellence.
spacer