Detaljer
The data landscape has changed drastically in the past few years, and what it means to build data platforms has evolved alongside it. The tools, languages and architectural patterns have all shifted to take advantage of emerging technologies, cloud scalability and advancements in Data Science… but how to you evolve alongside it?
The typical problems of “Big Data” are becoming more mainstream:
- Data volumes are growing
- There is a boom in sensor & event data
- Exotic data types are now part of many third party integrations
- Business opportunities need immediate reaction and cannot wait on a release cycle that counts in months
We have been building Modern Data Warehouses for the last few years, adapting as tools came and went. The architectural choice is vast and constantly changing but we have a simplified reference architecture, and we want to teach you to master it.
Simon Whiteley use his consultancy experience and close relationships with Microsoft to deliver the most up-to-date Data Lakehouse course available. He will show you what works, what does not work and most importantly how to build a Modern Data Lakehouse for your business. We use hands-on labs to make sure you’re getting the most out of the course and you’ll build a solution to take away with you.
Prerequisites
- A background as a Microsoft data professional
- A basic understanding of Synapse Analytics
- A laptop with a subscription to Azure
Agenda
We aim to deliver the latest and greatest of Synapse Analytics and as the service is still in preview the content of this agenda might be updated.
Lakehouse Architectures
- Designing Data Lakes
- Synapse Overview
- Big Data Processing Techniques
Synapse Setup & Config
- Provisioning Studio
- Connecting to the Lake
- Linked Services & Workspace
- Security, Access & Monitoring
- The Synapse Metastore
Orchestration & Data Acquisition
- Data Factory Overview
- Connecting & Loading Data
- Data Driven Orchestration
On-demand SQL Pools
- On-demand SQL Pools Overview
- Performance & Cost
- Supported T-SQL
- Query and Create files in the Lake
- Use Cases & Scenarios
Provisioned SQL Pools
- MPP Architecture
- Tables & Distribution
- Performance Tuning
- Workload Management
- Successful Patterns
Spark Pools & Pyspark Basics
- Spark Pools Overview
- Reading / Writing Data
- The Languages
- PySpark Deep Dive
- Design Patterns
Power BI & Integrations
- Power BI Development
- Workspace Integrations
- Data Access Patterns