DP-201 Designing an Azure Data Solution Exam Preparation Guide

 Background

As part of Azure role based certifications, DP-201 Designing an Azure Data Solution certification exam is the 2nd in the list to get the Azure Data Engineer Associate certificate. I had cleared DP-200 exam earlier. Yesterday, I cleared DP-201 and achieved the Associate certification.

DP-201 Designing an Azure Data Solution Exam

This exam focusses on assessing the candidate in the following areas:

  • Design Azure Data Solutions (40-45%)
  • Design Data Processing Solutions (25-30%)
  • Design for Data Security and compliance (25-30%)

The primary skills are tested against the following core services from Azure

  • Azure Cosmos DB
  • Azure Synapse Analytics
  • Azure Data Lake Storage
  • Azure Data Factory
  • Azure Stream Analytics
  • Azure Databricks
  • Azure Blob Storage

Note: There are changes to the contents of the exam as of 31st July 2020. One notable exclusion from the list of services is the Azure SQL Database.

The test focusses on assessing the design skills of the candidate. It is not important to know so much about the code. We need to make the right design choices when selecting the services. Design options related to batch processing and stream processing are important. We also need to understand the different options related to disaster recovery and high availability.

All the questions are multiple-choice questions (MCQ). There are  2 case studies at the beginning of the test consisting of 9 questions. We can go back and forth on the questions within each case study and revise the answers. Once we mark the case studies as completed, we cannot go back and change the answers. The two case studies are followed by 30 questions. These questions are related to different services. Finally, there is one more case study with 3 questions towards the end. The trick here is that once a question answered, we cannot go back online the 2 earlier case studies.

Refer to the below Youtube video for more details about how I prepared for the test


Visual Notes taking

During the preparations, I also started experimenting with the visual notes taking approach. Instead of taking notes in the plain text, I started to make them more visual. Here are some examples of this approach.





I have published these visual notes to a GitHub repository in PDF as well as OneNote format. I hope people will find it useful in their preparations.

Conclusion

I found this test easier compared to DP-200. The focus of this test is on design skills and its important to understand the differences in different options available with the services. I hope you find this useful.

until next time, Code with Passion and Strive for Excellence.
spacer

Hacktoberfest DevOps with GitHub Actions

 Background

In the month of October, DigitalOcean in partnership with Intel and Dev are celebrating the Hacktoberfest event across the globe. This is an annual event which aims to increase awareness of open source in communities all over the world. There are meetups and events scheduled all over in this regards. If you are an open-source contributor, you can get a free t-shirt by registering for the event and submitting at least 4 pull requests during the month of October. You can find out more about the event from the website

DevOps with GitHub Actions

The Hacktoberfest Singapore meetup was scheduled for Saturday 10 October with half-day event. I had the opportunity to present a topic on building DevOps pipelines with GitHub Actions. GitHub Actions allows us to automate workflows when certain events trigger on our repository.

In this session, we demonstrated 3 different scenarios

  • A simple workflow for linting codebase with Github Superlinter
  • A workflow involving third party integration with SonarCloud for static code analysis
  • CI CD workflow for deploying a containerized app with Azure Container Registry (ACR) and Azure Kubernetes Service (AKS)

The recording of the session is available on YouTube


Slides

The slides used during the session are available online

Slideshare - https://www.slideshare.net/nileshgule/devops-with-github-actions

Speakerdeck - https://speakerdeck.com/nileshgule/devops-with-github-actions


Source code

The source code of the demo used during the session is available on the GitHub repository

Conclusion

GitHub Actions provides an excellent option for automating workflows to run specific tasks when some event like code push or a release is triggered on the repository. The marketplace offers an opportunity for third-party vendors to provide actions for their products to integrate into automated workflows on repositories.

Until next time, Code with Passion and Strive for Excellence.
spacer

Scaling .Net containers with Azure Kubernetes Service and Serverless ACI

 Background

Following my virtual session to the Microsoft Cloud South Florida user group last week, I was invited to deliver a similar virtual event. This time it was for Dear Azure user group in India. I am thankful to Kasam Shaikh the organizer of this event. The event was scheduled for Sunday afternoon. Given the relaxed time of 12:30 PM IST, we decided to have a 2 hours session. This was the first time I was present to the vibrant community in India. 

Scaling .Net Containers with AKS and Serverless ACI

The theme of the session is similar to the last one. The main difference here was the duration. We scheduled it for 2 hours and that provided me with an opportunity to do a deep dive on some of the topics. We started off by looking at the components of the application consisting of a Web API as Producer for RabbitMQ. Then we looked at the consumer which is built as .Net Core executable. We went through the steps of building Docker images using Docker compose. We also looked at the benefits of Docker. 

Next, we looked at private and public container registries. Kubernetes was our next logical step and we started by looking at the main features of Kubernetes.
RabbitMQ and KEDA were installed on Kubernetes cluster and the demo application was deployed using Kubernetes manifest files. In the closing stages of the demo, we looked at different options to scale a Kubernetes cluster. These included the manual scaling, Horizontal Pod Autoscaling (HPA) based on CPU or RAM utilization. There are cases when we need to scale not just on the resource usage but on some external factors. This is where we explored Kubernetes based Event Driven Autoscaling (KEDA). KEDA allows us to scale based on events.

Slight glitch

Usually, due to the time constraint, I prepare the demo environment beforehand and show the relevant bits during the live demo. This time since we had 2 hours at our disposal, I created the AKS clusters right at the start of the session. Most of the things which I wanted to show worked fine, except for a scenario of scaling onto the serverless Azure Container Instances (ACI). My backup cluster also had some problem. Lesson learnt that next time if I do a live cluster setup, the backup cluster needs to be tested thoroughly. I have done a similar demo at least 8-10 times in different forums. Maybe I became a bit overconfident that it would work.

YouTube video recording


The recording of this session is available on YouTube


Slides

The slides used during the session are available online
Slideshare - https://www.slideshare.net/nileshgule/scaling-containers-with-aks-and-aci

Conclusion

It was a wonderful experience to present to the vibrant developer community in India. The questions which were asked during the session prompted me to make changes to my demo which will be helpful for the future sessions. 

Until next time, Code with Passion and Strive for Excellence.
spacer

Automate SonarCloud code scans using GitHub Actions

 Background

In an earlier post / YouTube video, I had demonstrated how to automate code lining process using the GitHub Super Linter and GitHub action. In this post, we will explore how to use GitHub actions to automate the static code analysis using SonarCloud.

SonarCloud

You might have heard about SonarQube. It offers scanners for different programming languages. SonarCloud is cloud service which scans codebases for bugs, vulnerabilities and code smells. At the point of this writing, there are 24 mainstream programming languages supported which include:

  • C#
  • Java
  • Python
  • JavaScript
  • TypeScript
  • Go
  • Kotlin and others
SonarCloud provides detailed analysis across multiple dimensions of the code. These are helpful in identifying common mistakes done by developers and ensure that the code is of high quality. SonarCloud will also give an indicator of how much time is required to fix all the reported issues and remove the technical debt. The different dimensions are 
  • Reliability
  • Security
  • Maintainability
  • Coverage
  • Duplications

SonarCloud also has Quality Gates and Quality Profiles. Quality profiles can be used to refine the rules which are applied to the code while scanning the files.

Automate code scan with GitHub action

In the video, we can see how to use GitHub Action to automate the code scan using SonarCloud GitHub Action.


Conclusion

SonarCloud offers a very good analysis of codebase by performing static code analysis. The ruleset can be customized as per the language and also based on organization policies. GitHub Actions make it very easy to automate the workflows. Combining the power of GitHub action and the SonarCloud we get an up to date insights about our code in an automated manner. I hope you found this post useful.

Until next time, Code with Passion and Strive for Excellence
spacer

Scaling .Net Core containers with Event Driven Workloads

Background 

Due to the COVID-19 pandemic, many of the developer communities and user groups have been forced to conduct their regular session in a virtual manner. This has provided a great opportunity for organizers and speakers from across the globe to speak at community events and rope in speakers from different parts of the world. This might not have been possible in case of physical events. 

I have been speaking at the local community events in Singapore as well as other parts of Asia for the past 3-4 years. Recently, I got opportunity to speak at the virtual meetup across the globe for the Microsoft Cloud South Florida user group. 

 It started off with a Tweet from Julie Larman that she is getting multiple requests for speaking opportunities but could not fulfil all. She suggested the organizers can extend the opportunities to others who might be interested and available to speak. I thought it was a good opportunity and replied to her tweet. The thread got picked up by Dave Noderer and we managed to set up a virtual meetup in no time. 

Scaling .Net Core Containers with Event Driven Worksloads 

I have presented the topic of autoscaling container using KEDA on multiple occasions in the past for different meetups and events in Asia. I also have a 3 part series on my recently launched YouTube channel about this. The duration of the meetup was 90 minutes and that provided me with an opportunity to do a deep dive on some of the topics which are not possible in a 45 minutes or 1 hour session. 

The application I used in the demo is a dummy events management application called Tech Talks. There is a ASP.Net Core WebAPI which exposes a method to generate random events. These events are pumped into a RabbitMQ queue. We have a .Net Core exe which consumes these messages in a batch. It is the consumer which we use to showcase the autoscaling capabilities using an upcoming project called Kubernetes-based Event Driven Autoscaling (KEDA)




During the session, I demonstrated the following features 
  • Containerize .Net Core Web API and executable using Dockerfile 
  • Build and Publish docker images to a private container registry (Azure Container Registry) 
  • Use Docker-compose to multiple services
  • Use YAML files to describe Kubernetes deployments 
  • Provision AKS cluster using an idempotent Powershell script 
  • Deploy RabbitMQ cluster using Helm charts 
  • Deploy application containers to Kubernetes 
  • Auto scale RabbitMQ consumer using KEDA 
  • Extend the scaling capabilities to serverless Azure Container Instances (ACI) using Virtual Node

By the end of the session, we have expanded the containers to be auto scaled on to serverless Azure Container Instances (ACI) using Virtual Node.


YouTube video recording

The recording of this talk is now available on YouTube


Slides

The slides used during the session are available on 

Source code

The source code is available in GitHub repository.

Conclusion

The session provided me an opportunity to speak for the first time across the globe. I like to attend in-person events as it helps a great deal to network with people. In a virtual event sometimes you feel like you are talking to a screen. It is difficult to gauge the reaction of the audience in virtual event. 

One of the benefits of a virtual event is that we can focus more on the content delivery without getting distracted which could sometimes happen in a in-person event. Depending on which platform or communication tool is used (YouTube live stream / MS Teams/ Zoom etc) the question and answers can be handled separately. Aonther great advantage of virtual event is the ability to record it and share it on platforms like YouTube. People who could not attend due to timezone differences or due to emergencies can find these recording useful.

Until next time, Code with Passion and Strive for Excellence.
spacer

How to improve code quality using GitHub Super Linter

 Background

As developers, we work on multiple programming languages like C#, Java, Python, JavaScript, SQL, etc. Apart from the mainstream programming languages, we also work with different file types like XML, JSON, YAML. Each of these languages and file types has their own styles and conventions. As a language or a format becomes mature, there are standards and best practices which get developed over a period of time. 

Nowadays, many developers are full stack developers or polyglot developers. As part of the Microservices style of development, they might work on the web or Javascript based front end, a Java, C#, Go, Python or some other programming language based middle tier. And then some SQL or NoSQL backend. Each of these components can use a different language. And each language will have its own style. It can be very difficult for new or even experienced developers to follow all the best practices for all the languages at the same time.

Linter to the rescue

A linter is a software or an add on which will help to identify issues in a file with respect to the rules or conventions. It is a static code analysis tool which helps to automate the process of validating common errors, bugs, style-related errors. Some of the Integrated Development Environments have built-in linters for the common programming languages. Visual Studio, for example, can suggest changes to classes and methods. External tools and extensions like Resharper can also help. One of the most popular code editors, Visual Studio Code has many extensions which are specific to a particular language like

I tried doing a search for Linter in the Visual Studio Code Marketplace and as of this writing, there are more than 200 linters available.


All these linters help in making sure an individual developer can follow the rules and styles correctly on their development environment. Things get tricky when we work in teams and multiple developers are working on the same project. Each developer can have their personal opinion. To avoid having multiple styles for the same codebase, it is necessary to standardize the rules across the whole team. 

These rules can then be automatically checked as part of the automated build process. All the modern day Continuous Integration (CI) systems like Azure DevOps, Jenkins, Bamboo, TeamCity, TavisCI etc. allow us to perform static code analysis. 

GitHub Super Linter

Github recently announced what they call the Super Linter. GitHub also allows us to trigger certain actions based on some conditions like source code checkin into a master or main branch or a feature branch. These are called GitHub Actions.

The Super Linter is a collection of more than 30 linters for some of the most commonly used programming languages. With one GitHub action, we can scan the whole codebase and identify any issues in a single go. If the team has its own set of predefined rules for a particular language, we can customize the defaults to use the team or organizational rules.

In the video below, we can see how GitHub Super Linter can be configured for your repository and triggered using GitHub Action.



Conclusion

As demonstrated in the video, GitHub Super Linter can be a great tool for automating the static code analysis. It can easily help to standardize coding practices across multiple languages for teams and organizations. It is very easy to set up and can help a great deal in DevOps practices. Hope you find this useful.

Until next time, Code with Passion and Strive for Excellence.
spacer

DP-200 Implementing an Azure Data Solution Exam preparation guide

 Background

Most of us are working from home due to the ongoing COVID-19 situation. This has saved some time in travelling for me personally. I managed to utilize some of this time to prepare for Azure Role-based certifications. In the past 3 months, I managed to clear AZ-301, AZ-400 and DP-200 certification exams. Many people have asked me about how did I go about preparing for these certifications. This post and the accompanying YouTube video is about my preparation for the DP-200 exam.

DP-200 Implementing an Azure Data Solution Exam

In case you are new to Microsoft Role-based certifications, have a look at this excellent post by Thomas Maurer about selecting the right exam. The DP 200 exam is focused on Data Engineer role and falls under the Data & AI category of tests. It is for the Associate level. I have been part of Data Engineering teams since late 2015. I have worked with Big Data technologies on Hortonworks Data Platform and also on Microsoft Azure. The exam consists of 3 main areas:

  • Implement data storage solutions (40-50%)
  • Manage and develop data processing (25-30%)
  • Monitor and optimize data solutions (30-35%)

Your skills are tested against following core services from Microsoft Azure:
  • Azure Data Lake Storage (ADLS) Gen 2
  • Azure Data Factory
  • Azure Databricks
  • Azure SQL
  • Cosmos DB
  • Azure Stream Analytics
  • Azure Key Vault
  • Azure Monitor
Since the test is focused towards implementing the Data solutions, we need to know the different options available for securing, managing, integrating and monitoring the solutions. I used the Linux Academy course for DP 200 to prepare for this exam. If you are new to Azure I recommend going through the MS Learn paths related to DP 200. Following learning paths are available on MS Learn

We need to understand batch processing, stream processing, structured & unstructured data, different APIs supported by Cosmos DB, different consistency levels. In terms of Data Integrations, it is important to understand the capabilities of Azure Data Factory. For Data Processing, Azure Databricks, Synapse Analytics and Stream Analytics play a very important role. If you are not familiar with the stream processing and real time data integration, focus on understanding the different Windowing mechanisms like sessions window, sliding window, tumbling window, hopping window etc. 

The questions are all multiple choice questions (MCQ) based on case studies. For some questions, there is more than one correct answer and in such cases, we are also required to sequence the steps in the correct order. An example could be the data integration process which requires 5 steps. We are required to sequence all the steps correctly. The practice is very important due to this requirement.

Refer to the below YouTube video for more details about how I prepared for this exam


Conclusion

It is very important to practice as much as possible. The questions are not straightforward and involve selecting the right choices as well as making the right sequence of steps. I would also recommend using more than one reference material. Do not rely only on one source of information. I prefer to combine the eLearning courses from one of the platforms like Linux Acadamy, Pluralsight, Udemy etc. with MS Learn. I also do not recommend using dumps. In my opinion, dumps reduce the actual value of the certification. Please don't take shortcuts when learning new technology. 

Until next time, Code with Passion and Strive for Excellence.

spacer