Data Engineering with Azure Databricks and Spark

The objective of this course is to equip individuals who have no prior experience in data engineering with the skills from a beginner to an advanced level, utilizing tools such as SQL, Apache Spark, and Azure Databricks. We will provide Azure Subscriptions for all the Lab Sessions in this course.

Intermediate 60 Days Weekends
  • SQL Fundamentals
    • Introduction to SQL Statements
    • SELECT
    • FROM
    • WHERE
    • GROUP BY
    • HAVING
    • ORDER BY
    • LIMIT
    • Aliases
    • NULL
  • SQL JOINS
    • INNER JOINS 
    • FULL OUTER JOINS
    • LEFT JOIN
    • RIGHT JOIN
    • CROSS JOIN
    • Subqueries
    • Aggregate Functions
    • NULL Values
    • UNION
    • INTERSECT
    • EXCEPT
  • Advanced SQL
    • Window Functions - Row Numbers, RANK, AVERAGE, COUNT, etc
    • PARTITION
    • Common Table Expressions (CTE)
    • Nested Queries
    • Indexes, Constraints
    • Triggers
    • Stored Procedures
    • Transactions
  • Azure Fundamentals
    • Create an Azure Account
    • Azure Portal
    • Azure Services
    • Hands-On Lab
  • Azure Databricks
    • Introduction to Azure Databricks
    • Creating Azure Databricks Service
    • Databricks User Interface Overview
    • Azure Databricks Architecture Overview
    • Hands-On Lab
  • Azure Databricks Clusters
    • Azure Databricks Cluster Types
    • Azure Databricks Cluster Configuration
    • Creating Azure Databricks Cluster
    • Azure Databricks Pool
    • Azure Databricks Cluster Policy
    • Hands-On Labs
  • Databricks Notebooks
    • Working with different programming languages in Azure Databricks Notebooks such as Python, R, SQL, and Scala
    • Managing and organizing notebooks using folders, tags, and version control
    • Collaborating with other team members on notebooks using shared notebooks, Git, and Databricks Repos
    • Using Databricks Connect to connect Databricks clusters to local IDEs and notebooks
    • Configuring and managing Databricks clusters to optimize performance and scalability
    • Using Databricks Delta to manage large-scale data lakes and improve data reliability and performance
    • Integrating Azure Databricks with other Azure services such as Azure Blob Storage, Azure Data Factory, and Azure Machine Learning
    • Understanding and working with Databricks Jobs to schedule and automate data engineering workflows
    • Monitoring and logging Databricks workloads using Databricks Workspace and Azure Monitor.
    • Hands-On Lab
     
     
     
  • Accessing Azure Data Lake from Databricks
    • Intrduction to Azure Data Lake
    • Accessing Data Lake
    • Creating Azure Data Lake Storage Gen2
    • Azure Data Explorer Overview
    • Access Azure Data Lake using Access Keys
    • Access Azure Data Lake using SAS Token
    • Access Azure Data Lake using Service Principal
    • Cluster Scoped Authentication
    • Access Azure Data Lake using Credential Paathrough
    • Hands-On Lab
  • Securing Access to Azure Data Lake
    • Overview of Securing Secrets
    • Creating Azure Key Vault
    • Creating Secret Scope
    • Databricks Secrets Utility
    • Using Secret to access Azure Data Lake in Notebooks
    • Using Secret Utility in Clusters
    • Hands-On Lab
  • Mounting Data Lake to Databricks
    • Databricks File System
    • Databricks Mount Overview
    • Mounting Azure Data Lake Storage Gen2
    • Hands-On Lab
  • Introduction to Spark
    • Overview of Spark
    • Spark Architecture and Components
    • Spark RDDs, DataFrames, and Datasets
    • Spark SQL
    • SQLContext
    • Spark Streaming and Structured Streaming
    • Azure HDInsight and Spark clusters
    • Spark on Azure Databricks
    • Hands-On Lab
  • Data Ingestion with Spark
    • Understanding data ingestion

    • Reading data into Spark: Spark data sources, reading data from files (CSV, JSON, Parquet), connecting to databases and other data stores.

    • Handling structured and unstructured data: parsing, cleaning, transforming, and enriching data with Spark transformations and actions, handling missing and null values, schema inference and enforcement

    • Streaming data ingestion: Working with Spark Streaming, Structured Streaming, and Event Hubs to ingest and process real-time data.

    • Data ingestion optimization: Performance optimization techniques such as partitioning, bucketing, caching, and shuffling, leveraging Spark SQL and DataFrames for efficient data processing, tuning Spark configurations for optimal performance

    • Data ingestion monitoring and management: monitoring and troubleshooting Spark jobs, leveraging Azure monitoring and management tools, scaling data ingestion with Spark on Azure.

    • Data quality and data governance: Ensuring data quality and data governance through schema enforcement, data profiling, and data validation techniques.

    • Security and compliance: understanding and implementing data security and compliance measures, such as data encryption, access control, and auditing, in Spark data ingestion pipelines.

    • Hands-On Lab
     
     
     
  • Data Transformation with Spark
    • Introduction to Spark data processing
    • Using Spark SQL for data transformation
    • DataFrames and Datasets in Spark
    • Transforming data with Spark RDDs
    • ETL processing with Spark
    • Using Spark and Azure Data Lake Storage
    • Spark Streaming for real-time data processing
    • Optimization techniques for Spark data processing
    • Managing and monitoring Spark clusters in Azure.
    • Hands-On Lab

In this course, you will learn how to perform data engineering with Microsoft Azure Databricks, SQL,and Apache Spark which will enable you boost the performance of big data analytic applications.

What you'll learn

  • You will learn how to build a real world data project using Azure Databricks and Spark Core. 
  • You will acquire professional level data engineering skills in Azure Databricks, Delta Lake, Spark Core, Azure Data Lake Gen2 and Azure Data Factory (ADF)
  • You will learn how to create notebooks, dashboards, clusters, cluster pools and jobs in Azure Databricks
  • You will learn how to ingest and transform data using PySpark in Azure Databricks
  • You will learn how to transform and analyse data using Spark SQL in Azure Databricks
  • You will learn about Data Lake architecture and Lakehouse architecture. Also, you will learn how to implement a solution for Lakehouse architecture using Delta Lake.
  • You will learn how to create Azure Data Factory pipelines to execute Databricks notebooks
  • You will learn how to create Azure Data Factory triggers to schedule pipelines as well as monitor them.
  • You will gain the skills required around Azure Databricks and Data Factory to pass the Azure Data Engineer Associate certification exam DP203, but the primary objective of the course is not to teach you to pass the exams.
  • You will learn how to connect to Azure Databricks from PowerBI to create reports

How students rated this courses

0.0

(Based on 0 reviews)


Reviews

Transcript from the "Introduction" Lesson

Course Overview [00:00:00]

My name is John Deo and I work as human duct tape at Gatsby, that means that I do a lot of different things. Everything from dev roll to writing content to writing code. And I used to work as an architect at IBM. I live in Portland, Oregon.

Introduction [00:00:16]

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

Why Take This Course? [00:00:37]

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

A Look at the Demo Application [00:00:54]

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

Summary [00:01:31]

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

Course - Frequently Asked Questions

How this course help me to design layout?

My name is Jason Woo and I work as human duct tape at Gatsby, that means that I do a lot of different things. Everything from dev roll to writing content to writing code. And I used to work as an architect at IBM. I live in Portland, Oregon.

What is important of this course?

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

Why Take This Course?

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

Is able to create application after this course?

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

We'll dive into GraphQL, the fundamentals of GraphQL. We're only gonna use the pieces of it that we need to build in Gatsby. We're not gonna be doing a deep dive into what GraphQL is or the language specifics. We're also gonna get into MDX. MDX is a way to write React components in your markdown.

$450
Installments
Enroll Now Starts September 9, 2023

What's included

  • Certificate
  • 13 Modules
  • Live Classes
  • Lifetime access
WhatsApp