Detaljer
In this course we’ll quickly cover the fundamentals of data integration pipelines before going much deeper into our Azure resources.
Within a typical Azure data platform solution for any enterprise grade data analytics or data science workload an umbrella resource is needed to trigger, monitor, and handle the control flow for transforming datasets, with the goal being actionable data insight. Those requirements are met by deploying Azure Data Integration pipelines, delivered using Azure Synapse Analytics or Azure Data Factory. In this session, Paul Andrew will show you how to create rich dynamic data pipelines and apply these orchestration resources in production. Using scaled architecture design patterns, best practice, data mesh principals, and the latest metadata driven frameworks. We will take a deep dive into the services, considering how to build custom activities, complex pipelines and think about hierarchical design patterns for enterprise grade deployments. All this and more in a complete set of 12 modules (based on real world experience) we will take you through how to implement data integration pipelines in production and delivered advanced orchestration patterns.
If that’s not enough learning for you, a set of hands-on labs will also be made available that you can work through at your own pace. You will leave this course with new skills, ideas, and a much deeper understanding of the resources for your future data platform projects.
Prerequisites
If you’ve never used Azure Data Integration Pipelines before in either Azure Data Factory or Azure Synapse Analytics, but your a fast learner – that’s ok! However, please watch Paul’s 1 hour complete introduction session, recorded as part of a recent community MeetUp: https://mrpaulandrew.com/2021/08/23/an-introduction-to-azure-data-integration-pipelines/
Agenda
The following offers an insight into the complete agenda and module breakdown for this course.
Module 1: Pipeline Fundamentals
- The History of Azure Orchestration
- Synapse Analytics vs Data Factory
- Integration Components
- Common Activities
- Execution Dependencies
Module 2: Integration Runtime Design Patterns
- Compute Types
- Azure
- Hosted
- SSIS
- Patterns & Configuration
Module 3: Data Transformation
- Data Flows
- Power Query Injection
- Spark Configuration
- Use Cases
Module 4: Dynamic Pipelines
- Expressions & Interpolation
- Dynamic Content Chains
- Metadata Driven
- Orchestration Framework – procfwk.com
Module 5: Execution Parallelism
- Control Flow Scale Out
- Concurrency Limitations
- Internal vs External Activities
- Decoupling Pipeline Workloads
Module 6: Pipeline Extensibility
- Azure Batch Service
- Tasks
- Compute Pools
- Scaling
- Pipeline Custom Activities
Module 7: VNet Integration
- Private Endpoints
- Managed VNet’s
- Firewall Bypass
Module 8: Security
- Managed Identities vs Service Principals
- Azure Key Vault Backing
- Pipeline Access & Permissions
Module 9: Monitoring & Alerting
- Portal Monitoring
- Log Analytics & Kusto Queries
- Operational Dashboards
- Advanced Alerting
Module 10: CI/CD
- Source Control vs Developer UI
- Basic ARM Template Deployments
- Advanced Deployment Patterns
Module 11: Solution Testing
- Development Time Validation
- Test Coverage
- NUnit Tests
Module 12: Final Thoughts
- Running Costs
- Conclusions
- Best Practices