DP-201 Designing an Azure Data Solution Exam Preparation Guide

 Background

As part of Azure role based certifications, DP-201 Designing an Azure Data Solution certification exam is the 2nd in the list to get the Azure Data Engineer Associate certificate. I had cleared DP-200 exam earlier. Yesterday, I cleared DP-201 and achieved the Associate certification.

DP-201 Designing an Azure Data Solution Exam

This exam focusses on assessing the candidate in the following areas:

  • Design Azure Data Solutions (40-45%)
  • Design Data Processing Solutions (25-30%)
  • Design for Data Security and compliance (25-30%)

The primary skills are tested against the following core services from Azure

  • Azure Cosmos DB
  • Azure Synapse Analytics
  • Azure Data Lake Storage
  • Azure Data Factory
  • Azure Stream Analytics
  • Azure Databricks
  • Azure Blob Storage

Note: There are changes to the contents of the exam as of 31st July 2020. One notable exclusion from the list of services is the Azure SQL Database.

The test focusses on assessing the design skills of the candidate. It is not important to know so much about the code. We need to make the right design choices when selecting the services. Design options related to batch processing and stream processing are important. We also need to understand the different options related to disaster recovery and high availability.

All the questions are multiple-choice questions (MCQ). There are  2 case studies at the beginning of the test consisting of 9 questions. We can go back and forth on the questions within each case study and revise the answers. Once we mark the case studies as completed, we cannot go back and change the answers. The two case studies are followed by 30 questions. These questions are related to different services. Finally, there is one more case study with 3 questions towards the end. The trick here is that once a question answered, we cannot go back online the 2 earlier case studies.

Refer to the below Youtube video for more details about how I prepared for the test


Visual Notes taking

During the preparations, I also started experimenting with the visual notes taking approach. Instead of taking notes in the plain text, I started to make them more visual. Here are some examples of this approach.





I have published these visual notes to a GitHub repository in PDF as well as OneNote format. I hope people will find it useful in their preparations.

Conclusion

I found this test easier compared to DP-200. The focus of this test is on design skills and its important to understand the differences in different options available with the services. I hope you find this useful.

until next time, Code with Passion and Strive for Excellence.
spacer

Hacktoberfest DevOps with GitHub Actions

 Background

In the month of October, DigitalOcean in partnership with Intel and Dev are celebrating the Hacktoberfest event across the globe. This is an annual event which aims to increase awareness of open source in communities all over the world. There are meetups and events scheduled all over in this regards. If you are an open-source contributor, you can get a free t-shirt by registering for the event and submitting at least 4 pull requests during the month of October. You can find out more about the event from the website

DevOps with GitHub Actions

The Hacktoberfest Singapore meetup was scheduled for Saturday 10 October with half-day event. I had the opportunity to present a topic on building DevOps pipelines with GitHub Actions. GitHub Actions allows us to automate workflows when certain events trigger on our repository.

In this session, we demonstrated 3 different scenarios

  • A simple workflow for linting codebase with Github Superlinter
  • A workflow involving third party integration with SonarCloud for static code analysis
  • CI CD workflow for deploying a containerized app with Azure Container Registry (ACR) and Azure Kubernetes Service (AKS)

The recording of the session is available on YouTube


Slides

The slides used during the session are available online

Slideshare - https://www.slideshare.net/nileshgule/devops-with-github-actions

Speakerdeck - https://speakerdeck.com/nileshgule/devops-with-github-actions


Source code

The source code of the demo used during the session is available on the GitHub repository

Conclusion

GitHub Actions provides an excellent option for automating workflows to run specific tasks when some event like code push or a release is triggered on the repository. The marketplace offers an opportunity for third-party vendors to provide actions for their products to integrate into automated workflows on repositories.

Until next time, Code with Passion and Strive for Excellence.
spacer

Scaling .Net containers with Azure Kubernetes Service and Serverless ACI

 Background

Following my virtual session to the Microsoft Cloud South Florida user group last week, I was invited to deliver a similar virtual event. This time it was for Dear Azure user group in India. I am thankful to Kasam Shaikh the organizer of this event. The event was scheduled for Sunday afternoon. Given the relaxed time of 12:30 PM IST, we decided to have a 2 hours session. This was the first time I was present to the vibrant community in India. 

Scaling .Net Containers with AKS and Serverless ACI

The theme of the session is similar to the last one. The main difference here was the duration. We scheduled it for 2 hours and that provided me with an opportunity to do a deep dive on some of the topics. We started off by looking at the components of the application consisting of a Web API as Producer for RabbitMQ. Then we looked at the consumer which is built as .Net Core executable. We went through the steps of building Docker images using Docker compose. We also looked at the benefits of Docker. 

Next, we looked at private and public container registries. Kubernetes was our next logical step and we started by looking at the main features of Kubernetes.
RabbitMQ and KEDA were installed on Kubernetes cluster and the demo application was deployed using Kubernetes manifest files. In the closing stages of the demo, we looked at different options to scale a Kubernetes cluster. These included the manual scaling, Horizontal Pod Autoscaling (HPA) based on CPU or RAM utilization. There are cases when we need to scale not just on the resource usage but on some external factors. This is where we explored Kubernetes based Event Driven Autoscaling (KEDA). KEDA allows us to scale based on events.

Slight glitch

Usually, due to the time constraint, I prepare the demo environment beforehand and show the relevant bits during the live demo. This time since we had 2 hours at our disposal, I created the AKS clusters right at the start of the session. Most of the things which I wanted to show worked fine, except for a scenario of scaling onto the serverless Azure Container Instances (ACI). My backup cluster also had some problem. Lesson learnt that next time if I do a live cluster setup, the backup cluster needs to be tested thoroughly. I have done a similar demo at least 8-10 times in different forums. Maybe I became a bit overconfident that it would work.

YouTube video recording


The recording of this session is available on YouTube


Slides

The slides used during the session are available online
Slideshare - https://www.slideshare.net/nileshgule/scaling-containers-with-aks-and-aci

Conclusion

It was a wonderful experience to present to the vibrant developer community in India. The questions which were asked during the session prompted me to make changes to my demo which will be helpful for the future sessions. 

Until next time, Code with Passion and Strive for Excellence.
spacer

Automate SonarCloud code scans using GitHub Actions

 Background

In an earlier post / YouTube video, I had demonstrated how to automate code lining process using the GitHub Super Linter and GitHub action. In this post, we will explore how to use GitHub actions to automate the static code analysis using SonarCloud.

SonarCloud

You might have heard about SonarQube. It offers scanners for different programming languages. SonarCloud is cloud service which scans codebases for bugs, vulnerabilities and code smells. At the point of this writing, there are 24 mainstream programming languages supported which include:

  • C#
  • Java
  • Python
  • JavaScript
  • TypeScript
  • Go
  • Kotlin and others
SonarCloud provides detailed analysis across multiple dimensions of the code. These are helpful in identifying common mistakes done by developers and ensure that the code is of high quality. SonarCloud will also give an indicator of how much time is required to fix all the reported issues and remove the technical debt. The different dimensions are 
  • Reliability
  • Security
  • Maintainability
  • Coverage
  • Duplications

SonarCloud also has Quality Gates and Quality Profiles. Quality profiles can be used to refine the rules which are applied to the code while scanning the files.

Automate code scan with GitHub action

In the video, we can see how to use GitHub Action to automate the code scan using SonarCloud GitHub Action.


Conclusion

SonarCloud offers a very good analysis of codebase by performing static code analysis. The ruleset can be customized as per the language and also based on organization policies. GitHub Actions make it very easy to automate the workflows. Combining the power of GitHub action and the SonarCloud we get an up to date insights about our code in an automated manner. I hope you found this post useful.

Until next time, Code with Passion and Strive for Excellence
spacer

Scaling .Net Core containers with Event Driven Workloads

Background 

Due to the COVID-19 pandemic, many of the developer communities and user groups have been forced to conduct their regular session in a virtual manner. This has provided a great opportunity for organizers and speakers from across the globe to speak at community events and rope in speakers from different parts of the world. This might not have been possible in case of physical events. 

I have been speaking at the local community events in Singapore as well as other parts of Asia for the past 3-4 years. Recently, I got opportunity to speak at the virtual meetup across the globe for the Microsoft Cloud South Florida user group. 

 It started off with a Tweet from Julie Larman that she is getting multiple requests for speaking opportunities but could not fulfil all. She suggested the organizers can extend the opportunities to others who might be interested and available to speak. I thought it was a good opportunity and replied to her tweet. The thread got picked up by Dave Noderer and we managed to set up a virtual meetup in no time. 

Scaling .Net Core Containers with Event Driven Worksloads 

I have presented the topic of autoscaling container using KEDA on multiple occasions in the past for different meetups and events in Asia. I also have a 3 part series on my recently launched YouTube channel about this. The duration of the meetup was 90 minutes and that provided me with an opportunity to do a deep dive on some of the topics which are not possible in a 45 minutes or 1 hour session. 

The application I used in the demo is a dummy events management application called Tech Talks. There is a ASP.Net Core WebAPI which exposes a method to generate random events. These events are pumped into a RabbitMQ queue. We have a .Net Core exe which consumes these messages in a batch. It is the consumer which we use to showcase the autoscaling capabilities using an upcoming project called Kubernetes-based Event Driven Autoscaling (KEDA)




During the session, I demonstrated the following features 
  • Containerize .Net Core Web API and executable using Dockerfile 
  • Build and Publish docker images to a private container registry (Azure Container Registry) 
  • Use Docker-compose to multiple services
  • Use YAML files to describe Kubernetes deployments 
  • Provision AKS cluster using an idempotent Powershell script 
  • Deploy RabbitMQ cluster using Helm charts 
  • Deploy application containers to Kubernetes 
  • Auto scale RabbitMQ consumer using KEDA 
  • Extend the scaling capabilities to serverless Azure Container Instances (ACI) using Virtual Node

By the end of the session, we have expanded the containers to be auto scaled on to serverless Azure Container Instances (ACI) using Virtual Node.


YouTube video recording

The recording of this talk is now available on YouTube


Slides

The slides used during the session are available on 

Source code

The source code is available in GitHub repository.

Conclusion

The session provided me an opportunity to speak for the first time across the globe. I like to attend in-person events as it helps a great deal to network with people. In a virtual event sometimes you feel like you are talking to a screen. It is difficult to gauge the reaction of the audience in virtual event. 

One of the benefits of a virtual event is that we can focus more on the content delivery without getting distracted which could sometimes happen in a in-person event. Depending on which platform or communication tool is used (YouTube live stream / MS Teams/ Zoom etc) the question and answers can be handled separately. Aonther great advantage of virtual event is the ability to record it and share it on platforms like YouTube. People who could not attend due to timezone differences or due to emergencies can find these recording useful.

Until next time, Code with Passion and Strive for Excellence.
spacer

How to improve code quality using GitHub Super Linter

 Background

As developers, we work on multiple programming languages like C#, Java, Python, JavaScript, SQL, etc. Apart from the mainstream programming languages, we also work with different file types like XML, JSON, YAML. Each of these languages and file types has their own styles and conventions. As a language or a format becomes mature, there are standards and best practices which get developed over a period of time. 

Nowadays, many developers are full stack developers or polyglot developers. As part of the Microservices style of development, they might work on the web or Javascript based front end, a Java, C#, Go, Python or some other programming language based middle tier. And then some SQL or NoSQL backend. Each of these components can use a different language. And each language will have its own style. It can be very difficult for new or even experienced developers to follow all the best practices for all the languages at the same time.

Linter to the rescue

A linter is a software or an add on which will help to identify issues in a file with respect to the rules or conventions. It is a static code analysis tool which helps to automate the process of validating common errors, bugs, style-related errors. Some of the Integrated Development Environments have built-in linters for the common programming languages. Visual Studio, for example, can suggest changes to classes and methods. External tools and extensions like Resharper can also help. One of the most popular code editors, Visual Studio Code has many extensions which are specific to a particular language like

I tried doing a search for Linter in the Visual Studio Code Marketplace and as of this writing, there are more than 200 linters available.


All these linters help in making sure an individual developer can follow the rules and styles correctly on their development environment. Things get tricky when we work in teams and multiple developers are working on the same project. Each developer can have their personal opinion. To avoid having multiple styles for the same codebase, it is necessary to standardize the rules across the whole team. 

These rules can then be automatically checked as part of the automated build process. All the modern day Continuous Integration (CI) systems like Azure DevOps, Jenkins, Bamboo, TeamCity, TavisCI etc. allow us to perform static code analysis. 

GitHub Super Linter

Github recently announced what they call the Super Linter. GitHub also allows us to trigger certain actions based on some conditions like source code checkin into a master or main branch or a feature branch. These are called GitHub Actions.

The Super Linter is a collection of more than 30 linters for some of the most commonly used programming languages. With one GitHub action, we can scan the whole codebase and identify any issues in a single go. If the team has its own set of predefined rules for a particular language, we can customize the defaults to use the team or organizational rules.

In the video below, we can see how GitHub Super Linter can be configured for your repository and triggered using GitHub Action.



Conclusion

As demonstrated in the video, GitHub Super Linter can be a great tool for automating the static code analysis. It can easily help to standardize coding practices across multiple languages for teams and organizations. It is very easy to set up and can help a great deal in DevOps practices. Hope you find this useful.

Until next time, Code with Passion and Strive for Excellence.
spacer

DP-200 Implementing an Azure Data Solution Exam preparation guide

 Background

Most of us are working from home due to the ongoing COVID-19 situation. This has saved some time in travelling for me personally. I managed to utilize some of this time to prepare for Azure Role-based certifications. In the past 3 months, I managed to clear AZ-301, AZ-400 and DP-200 certification exams. Many people have asked me about how did I go about preparing for these certifications. This post and the accompanying YouTube video is about my preparation for the DP-200 exam.

DP-200 Implementing an Azure Data Solution Exam

In case you are new to Microsoft Role-based certifications, have a look at this excellent post by Thomas Maurer about selecting the right exam. The DP 200 exam is focused on Data Engineer role and falls under the Data & AI category of tests. It is for the Associate level. I have been part of Data Engineering teams since late 2015. I have worked with Big Data technologies on Hortonworks Data Platform and also on Microsoft Azure. The exam consists of 3 main areas:

  • Implement data storage solutions (40-50%)
  • Manage and develop data processing (25-30%)
  • Monitor and optimize data solutions (30-35%)

Your skills are tested against following core services from Microsoft Azure:
  • Azure Data Lake Storage (ADLS) Gen 2
  • Azure Data Factory
  • Azure Databricks
  • Azure SQL
  • Cosmos DB
  • Azure Stream Analytics
  • Azure Key Vault
  • Azure Monitor
Since the test is focused towards implementing the Data solutions, we need to know the different options available for securing, managing, integrating and monitoring the solutions. I used the Linux Academy course for DP 200 to prepare for this exam. If you are new to Azure I recommend going through the MS Learn paths related to DP 200. Following learning paths are available on MS Learn

We need to understand batch processing, stream processing, structured & unstructured data, different APIs supported by Cosmos DB, different consistency levels. In terms of Data Integrations, it is important to understand the capabilities of Azure Data Factory. For Data Processing, Azure Databricks, Synapse Analytics and Stream Analytics play a very important role. If you are not familiar with the stream processing and real time data integration, focus on understanding the different Windowing mechanisms like sessions window, sliding window, tumbling window, hopping window etc. 

The questions are all multiple choice questions (MCQ) based on case studies. For some questions, there is more than one correct answer and in such cases, we are also required to sequence the steps in the correct order. An example could be the data integration process which requires 5 steps. We are required to sequence all the steps correctly. The practice is very important due to this requirement.

Refer to the below YouTube video for more details about how I prepared for this exam


Conclusion

It is very important to practice as much as possible. The questions are not straightforward and involve selecting the right choices as well as making the right sequence of steps. I would also recommend using more than one reference material. Do not rely only on one source of information. I prefer to combine the eLearning courses from one of the platforms like Linux Acadamy, Pluralsight, Udemy etc. with MS Learn. I also do not recommend using dumps. In my opinion, dumps reduce the actual value of the certification. Please don't take shortcuts when learning new technology. 

Until next time, Code with Passion and Strive for Excellence.

spacer

Install and upgrade software on Windows using Chocolatey

Background

There are multiple ways of installing software packages. These can include common utilities such as 7-zip or Winzip, browsers such as Google Chrome, Firefox, Brave etc. We also install editors like Notepad Plus Plus, Visual Studio Code, terminals such as Fluent Terminal, cmder etc. The list can just go on and on. Things get complicated when you move from one operating system to another like MacOS or Linux. Lets add one more twist by adding the processor architecture or type 32 bit or 64 bit, Intel or AMD processors.

The way we install software could vary based on many of the factors listed above. Even for technical people, it can become a challenge sometimes to identify the right version of the software to download and install. This is where a package manager can be quite handy.

Chocolatey

In this post, we will focus the attention to a package manage specific to the Windows operating system. A package manager helps to search for packages, install the package along with its dependencies, identify outdated packages, uninstall the package, pin the version of the software and many other features. 

I have been using Chocolatey to install and upgrade the versions of more than 75+ software. I also managed to automate the setting up of a new Windows machine using chocolatey. The source code for this can be found in my Github repository.

Demo

In the YouTube video below see Chocolatey in action where we use it to search for packages, list all the installed packages, find information about the packages and upgrade packages as well as extensions for Visual Studio Code.


Conclusion

Using package manager to install software can make out life much easier. We do not need to visit the websites to look for the appropriate package, the dependencies get resolved automatically and we can identify the outdated packages easily. I hope you found this tip useful.

Until next time, Code with Passion and Strive for Excellence
spacer

How to Manage Kubernetes Secrets with Azure Key Vault

Background

There are different ways in which we can manage environment specific settings related to containerized applications. Kubernetes provides ConfigMaps and Secrets as two options to manage environment configuration. ConfigMaps are good for storing key value pairs in plain text. When we are dealing with sensitive information containing connection strings, user name and password, certificates etc. these should be stored in encrypted form. Kubernetes Secret objects stores data in Base64 encoded form. 

Extend Kubernetes Secrets by storing them externally in Azure Key Vault (AKV)

Storing secrets in encrypted form provides first line of defense. As the popularity of Kubernetes increases, the tools surrounding the ecosystem are also improving on regular basis. More and more vendors are providing extensions of their services to work with Kubernetes.

One such area is the way secrets are managed. In an enterprise scenario, we might use a secure Key Vault to store keys externally. Azure Key Vault (AKV) and HashiCorp Vault are examples of such key vaults. In this demo we are going to use Azure Key Vault. The AKV allows us to store
  • Keys
  • Secrets
  • Certificates
The keys, secrets and certificates are stored in a secure manner and we can provide very fine grained access to different users.

The Azure Kubernetes Service (AKS) is used to provision a managed Kubernetes cluster with 1.18.2 Kubernetes version. We are also using Azure Container Registry (ACR) to store the docker images for the application containers. In AKS cluster is created using Managed Identity which assigns an Identity to the VMSS agent pool. We use this managed identity and grant only the Get privilege to retrieve the secrets stored in AKV.

The secrets from AKV are pulled when the pod is created as part of the Kubernetes deployment. We use a Secrets Store Container Storage Interface (CSI) driver. Azure Key Vault Provider  for Secrets Store CSI Driver specifies Azure related properties. The secrets are synched with Kubernetes secret object. These are mounted as Volume Mount in the pod. Finally the data from Kubernetes secret is passed into Environment variables used by the pod.

Demo

All the above functionality is demonstrated as part of the YouTube video on Integrating AKV with AKS.


Conclusion

The image below depicts the 5 step process to integrate AKV with AKS.

spacer

Integrate Azure Container Registry with AKS in 5 easy steps

Background

When we start working with Docker and Kubernetes, we need a container registry to publish out images. Most people start with the public DockerHub registry. This is good for getting started. however, as we become more proficient with container images and orchestration using something like Kubernetes, we need enterprise grade features. The public container registries do not provide all the enterprise grade features. In such scenario, we need to choose private container registry. In this post we will see how to integrate private Azure Container Registry (ACR) with Azure Kubernetes Service (AKS) cluster.

Advantages  of using a private container registry with ACR

Similar to other private container registries, ACR provides following features:

  • Support for Docker and Open Container initiative (OCI) images
  • Simplify container lifecycle management
    • Build
    • Store
    • Secure
    • Scan
    • Replicate
  • Connect across environments
    • Azure Kubernetes Service (AKS)
    • Azure Redhat OpenShift
    • Azure Services (App Service ,Machine Learning, Azure Batch)


Integrate ACR with AKS using Admin User

In this YouTube video, I demonstrate how to integrate with ACR using 5 easy steps.

Integrate ACR with AKS using Admin User


The 5 steps demonstrated in the video are as follows

2 steps to integrate ACR with AKS


We use Admin user to push images to ACR registry using Docker login. The images are then pulled to AKS cluster using the Managed Identity associated with the AKS cluster. The Managed Identity is granted ACR Pull role when we create the AKS cluster using the --attach-acr flag with az aks create command.

Authenticate ACR with AKS using Managed Identity


Integrate ACR with AKS using AAD identity

After I published the video on YouTube, Sergio Rodrigo shared a blog about Build & Pull Docker images to ACR. I replied to his tweet on Twitter suggesting that the readers of his blog could benefit from the video version of my post. This Tweet caught the eye of Steve Lasker who is the PM on ACR team in Microsoft. Steve suggested that instead of using Admin User to connect to the ACR registry to push images, there is a better way. By default, the Admin User access is disable when we create a new ACR registry.

We can make use of our own identity linked to an Azure Active Directory (AAD) to authenticate with ACR. When we login to ACR using the AAD identity, a token is generated and our local docker config file is updated with the token. We can push the images using this token. This eliminates the need for enabling the Admin User for ACR.

Based on this suggestion I created an updated video.

The change in authentication method is highlighted in the diagram below


Conclusion

Private container registries are helpful in preserving the intellectual property of an organization. We need not have to publish the IP of organization to publicly available container registry. It also helps in improving the security posture by providing role based access control (RBAC). We can separate activities like who can push images to the registry and who can pull them. 

I hope these two videos were helpful in improving the readers understanding about integrating ACR with AKS.

Until next time, Code with Passion and Strive for Excellence.


spacer

Youtube–Autoscaling containers with KEDA on AKS

Background

Over the last few months, it has become difficult to dedicate time to blogging. I also wanted to explore the option of doing interactive videos. As a result, I have started YouTube channel and posted few videos so far. First of all I started with a 3 part series on Autoscaling containers with Kubernetes-based Event Driven Autoscaling (KEDA).

KEDA playlist

During last 6 months or so, I happened to deliver multiple sessions at different community events about event driven auto-scaling applications using containers and Kubernetes. KEDA is an interesting project. I built or rather refactored one of my sample application to run with KEDA.

In this example I built a .Net core producer which generates a configurable number of messages and pushes them to a RabbitMQ queue. This is developed as a .Net Core API project. On the consumer side, we can configure the .Net consumer to pick a batch of messages and process them. This is where event driven auto-scaling feature of KEDA comes into the picture. Based on the number of messages present in the queue, KEDA scales the consumer instances.

The producer, consumer and the RabbitMQ broker are all deployed onto a Kubernetes cluster. The Kubernetes cluster is a managed cluster deployed using Azure Kubernetes Service (AKS). In the 3 parts we cover following topics:

Part 1 – Provision AKS cluster

We look at all the software that I use in order to develop and deploy the application. We also look at the generic PowerShell script which automates the AKS cluster provisioning.


Part 2 – Deploy application containers

In this video we look at the code how it interacts with the RabbitMQ broker from the producer and the consumer side. We build the docker images using Docker-compose and publish them to DockerHub public container registry.


Part 3 – KEDA autoscale in action

In the final part of the series we look at the KEDA architecture, steps to deploy KEDA using Helm and deploy autoscale application based on the number of messages.


Conclusion:

I hope that the video series provides a much better option compared to taking screenshots and putting them in the blogpost. It also gives me better opportunity to express myself and provides another medium to share the knowledge. Hope you like the series. in case you find the content useful, please share it and subscribe to my YouTube channel.

The complete source code for the series is available on my Github repo named pd-tech-fest-2019. I am very proud to say that this is also one of the official samples listed in the KEDA project.

Until next time, Code with Passion and Strive for Excellence.

spacer