Skip to content
Workshops and tutorials

Snowflake --> ClickHouse Migration with dbt


Course

About

This hands-on lab guides data engineers through a complete, production-realistic migration from Snowflake to ClickHouse Cloud using a 50-million-row NYC Taxi dataset. The course is structured in three parts: you begin by running and critically analyzing an existing Snowflake workload — seven representative queries covering VARIANT colon-path access, LATERAL FLATTEN, MERGE, QUALIFY, and Snowflake Streams — then spend the bulk of the course working through five structured design worksheets before executing the migration in Part 3. The design phase covers every architectural decision that has no Snowflake equivalent: choosing between MergeTree engine variants, designing sparse primary indexes via ORDER BY, mapping Snowflake's type system to ClickHouse's fixed-width integers and DateTime64, and translating dbt incremental models from Snowflake's MERGE strategy to ClickHouse's delete+insert pattern. Every decision is recorded in a migration plan that the implementation directly validates against.

In Part 3, you provision a ClickHouse Cloud service with Terraform, load 50M rows via a resumable Python batch script with retry-safe deduplication, and run dbt to build a full analytics layer of staging views, incremental fact tables, and dimension tables using ClickHouse-native engines and ORDER BY keys. Beyond the core pipeline, you create an in-memory dictionary for O(1) zone lookups and a Refreshable Materialized View on a 3-minute refresh cycle — two ClickHouse-native objects that replace Snowflake Tasks and dimension JOINs with no dbt equivalent. The lab concludes with a scripted producer cutover that stops the Snowflake data producer, closes the migration gap with a delta pass, verifies row-count parity, and starts a live ClickHouse producer — alongside a benchmark comparing all seven original queries across both systems.