Tilbud

Practical Data Factory

Underviser
Adresse
Kanalvej 7, 2800 Kongens Lyngby   Vis kort

9.000 DKK

Ikke på lager

Status

Detaljer

As cloud platforms expand in scale and breadth, there is growing need for an orchestration tool that can bridge the gaps between distributed services. Azure Data Factory provides this glue, pulling together services into a coherent data preparation and transformation pipeline. However, many people make the leap from on-premises SSIS and use Data Factory in the same way – this will get you so far, but successful Data Factory developers write less code, reuse components and harness the emerging Data Flow technologies.

This two day course takes the Data Factory novice, runs them through the fundamentals before taking them on a journey to building code-efficient, agile orchestration solutions. We will look at some of the most common scenarios, including pulling on-premises data into the cloud, hosting SSIS packages and communicating with Web APIs.

Prerequisites:

  • An understanding of ETL processing either ETL or ELT on either on-premises or in a big data environment.
  • A laptop with a subscription to Azure

Module 1: Introduction

  • Data Factory in Context
  • ADF Terms & Concepts
    • Object Types
    • Behind the Scenes
    • Usage Scenarios
  • Building a basic pipeline
  • Source Control in ADF
    • Setting up Repos
    • Best Working Practice

Module 2: Building a Data Loading Pipeline

  • The Self Hosted Integration Runtime
    • Installing the SHIR
    • Monitoring the SHIR
    • Creating SHIR DataSets
    • Using the SHIR in a Pipeline Activity
  • Data Factory Control Flow
  • Building an ADF Metadata Store

Module 3: Production Data Factory

  • Monitoring & Alerts in Data Factory
    • Different Monitoring Screens in ADF
    • Using Alerts
  • Building Error Workflows
    • Common Error Patterns
    • Reusable Pipelines
  • The ADF DevOps Story
    • ADF Application Lifecycle
    • Common Deployment Patterns
    • Azure DevOps & Data Factory

Module 4: Common ETL Patterns

  • Working with Databricks Notebooks
    • The Databricks Activities
    • Widgets & Parameters
    • Cluster Selection
  • Bridging the gap with Azure Functions
    • Using Azure Functions in common workflows

Module 5: Transformations using Data Flows

  • Introducing Mapping Data Flows
    • Data Flow Architecture
    • Data Flow Functions
  • Optimising Mapping Data Flows
    • Spark Execution Internals
    • Activity Optimisations
    • Data Flow Monitoring
  • Introducing Wrangling Data Flows
    • Basic Wrangling Functionality
    • Crash Course in Power Query M code
  • Selecting the right Data Flow engine

Module 6: Extending Data Factory

  • Working with SSIS and Data Factory
    • Setting Up the SSIS Runtime
    • Automating SSIS Runtime Uptime
    • Deploying SSIS Packages
  • Extending Data Factory with Custom Activities
    • Hosting Options
    • Example Scenario
    • Automating Batch Pool Uptime

Yderligere information

Længde

2 dage