Detaljer
As cloud platforms expand in scale and breadth, there is growing need for an orchestration tool that can bridge the gaps between distributed services. Azure Data Factory provides this glue, pulling together services into a coherent data preparation and transformation pipeline. However, many people make the leap from on-premises SSIS and use Data Factory in the same way – this will get you so far, but successful Data Factory developers write less code, reuse components and harness the emerging Data Flow technologies.
This two day course takes the Data Factory novice, runs them through the fundamentals before taking them on a journey to building code-efficient, agile orchestration solutions. We will look at some of the most common scenarios, including pulling on-premises data into the cloud, hosting SSIS packages and communicating with Web APIs.
Prerequisites:
- An understanding of ETL processing either ETL or ELT on either on-premises or in a big data environment.
- A laptop with a subscription to Azure
Module 1: Introduction
- Data Factory in Context
- ADF Terms & Concepts
- Object Types
- Behind the Scenes
- Usage Scenarios
- Building a basic pipeline
- Source Control in ADF
- Setting up Repos
- Best Working Practice
Module 2: Building a Data Loading Pipeline
- The Self Hosted Integration Runtime
- Installing the SHIR
- Monitoring the SHIR
- Creating SHIR DataSets
- Using the SHIR in a Pipeline Activity
- Data Factory Control Flow
- Building an ADF Metadata Store
Module 3: Production Data Factory
- Monitoring & Alerts in Data Factory
- Different Monitoring Screens in ADF
- Using Alerts
- Building Error Workflows
- Common Error Patterns
- Reusable Pipelines
- The ADF DevOps Story
- ADF Application Lifecycle
- Common Deployment Patterns
- Azure DevOps & Data Factory
Module 4: Common ETL Patterns
- Working with Databricks Notebooks
- The Databricks Activities
- Widgets & Parameters
- Cluster Selection
- Bridging the gap with Azure Functions
- Using Azure Functions in common workflows
Module 5: Transformations using Data Flows
- Introducing Mapping Data Flows
- Data Flow Architecture
- Data Flow Functions
- Optimising Mapping Data Flows
- Spark Execution Internals
- Activity Optimisations
- Data Flow Monitoring
- Introducing Wrangling Data Flows
- Basic Wrangling Functionality
- Crash Course in Power Query M code
- Selecting the right Data Flow engine
Module 6: Extending Data Factory
- Working with SSIS and Data Factory
- Setting Up the SSIS Runtime
- Automating SSIS Runtime Uptime
- Deploying SSIS Packages
- Extending Data Factory with Custom Activities
- Hosting Options
- Example Scenario
- Automating Batch Pool Uptime