ClickHouse Schema Migrations
Overview
Section titled “Overview”ClickHouse schema changes are managed through embedded, versioned migrations that follow the same model as PostgreSQL. The runner is wired into the same CLI commands.
| Scenario | What to use |
|---|---|
| Fresh install | frameworks cluster init clickhouse creates databases, applies the baseline schema, and runs migrations |
| Upgrading | frameworks cluster migrate applies pending migrations from pkg/database/sql/clickhouse/migrations/ |
cluster init clickhouse and cluster migrate are idempotent. Each migration
records itself in a per-database _migrations ledger; re-running is a no-op.
Migration structure
Section titled “Migration structure”Migrations are organized by target database, release version, and phase:
pkg/database/sql/clickhouse/migrations/└── periscope/ └── v0.2.31/ ├── expand/ │ ├── 001_add_filename_to_artifacts.sql │ └── 002_add_viewer_node_tracking.sql ├── postdeploy/ │ └── 001_rebuild_artifact_projection.sql └── contract/ └── 001_drop_legacy_artifact_view.sqlThe first path segment after clickhouse/migrations/ must match a configured
ClickHouse database name (today: periscope). The CLI embeds these files,
applies only migrations whose database exists in the manifest, and records
applied files in that database’s _migrations table.
Reconciliation model
Section titled “Reconciliation model”The baseline schema in pkg/database/sql/clickhouse/<db>.sql is the cumulative
current state at the latest release tag. Migrations advance the schema forward
in subsequent releases.
On a fresh install, frameworks cluster init clickhouse:
- Creates the configured databases.
- Applies the baseline schema (DDL is
IF NOT EXISTS, safe to re-run). - Runs every embedded migration up to the cluster’s resolved channel version.
The migration SQL is idempotent (also
IF NOT EXISTS), so DDL re-application no-ops on objects already created by the baseline; the_migrationsledger row is recorded by the same step.
On upgrades, frameworks cluster migrate runs only the migrations whose
ledger row is missing. There is no blind ledger pre-seed: every recorded
migration was produced by an actual SQL apply.
Upgrade procedure
Section titled “Upgrade procedure”- Check release notes for required ClickHouse DDL.
- Confirm the phase. Routine rolling upgrades apply
expandmigrations before deploying new binaries.postdeploymigrations cover heavyMODIFY COLUMNrewrites orALTER TABLE … UPDATE/DELETEmutations and run after binaries are live.contractmigrations are explicit cleanup with--yesrequired. - Apply migrations through the CLI:
frameworks cluster migrate validateframeworks cluster migrate --phase expandvalidate is fast and safe to run anywhere — it checks the embedded migration
tree against the layout and per-phase safety rules.
- Dry-run first to see what will change:
frameworks cluster migrate --phase expand --dry-runDry-run runs the role under ansible-playbook --check --diff. The role
explicitly skips every mutating step (ledger table create, SQL apply, ledger
insert) and only reports pending migrations.
- Apply for real:
frameworks cluster migrate --phase expand --to-version v0.5.0--to-version is optional; when omitted the cluster’s resolved channel
version is used.
- Deploy services after expand migrations are in place:
frameworks cluster upgrade --all- Run any required postdeploy documented in the release notes:
frameworks cluster migrate --phase postdeploy --to-version v0.5.0Ledger details
Section titled “Ledger details”The role creates <db>._migrations per ClickHouse database with this shape:
CREATE TABLE IF NOT EXISTS _migrations ( version LowCardinality(String), phase LowCardinality(String), seq UInt32, filename String, checksum FixedString(64), applied_at DateTime64(3) DEFAULT now64()) ENGINE = ReplacingMergeTree(applied_at)ORDER BY (version, phase, seq);ReplacingMergeTree collapses retried-insert duplicates on merge. Reads use
argMax(checksum, applied_at) and surface uniqExact(checksum) so a crashed
mid-write that left two distinct checksums under the same key is flagged as a
hard failure, not silently resolved.
If a migration file’s content changes after it has been applied, the role fails on checksum mismatch. Treat applied migrations as immutable.
Writing migrations
Section titled “Writing migrations”Guidelines for contributors:
IF NOT EXISTSis mandatory on everyCREATE TABLE / VIEW / DICTIONARY. This is what makes idempotent re-apply safe across both fresh installs (where the baseline already created the object) and existing clusters. The validator rejects migrations missing this guard.- Prefer expand-only changes in routine upgrades — add columns/tables, avoid drops, avoid type rewrites.
- Use
Nullablefor new columns — old binaries ignore them, rollback is safe. - Update the baseline schema in the same release: edit
pkg/database/sql/clickhouse/<db>.sqlso a fresh install ends in the same state as an upgraded one. The migration runs both on fresh installs (no-op against the freshly applied baseline) and on upgrades (the actual schema change). - One migration per logical change — easier to track and debug.
- Pick the correct phase:
expand: additive, safe alongside old + new binaries.postdeploy: heavyMODIFY COLUMNrewrites,ALTER TABLE … UPDATE/DELETEmutations, materialized-view rebuilds.contract: drops, renames, post-cleanup.
- Large historical rewrites (long
ALTER … UPDATE, MV rebuilds, replay from Kafka) need release runbooks with progress and verification, not incidental DDL inside a migration.
Example expand migration:
ALTER TABLE periscope.artifact_events ADD COLUMN IF NOT EXISTS filename Nullable(String) AFTER internal_name;
ALTER TABLE periscope.artifact_state_current ADD COLUMN IF NOT EXISTS filename Nullable(String) AFTER internal_name;Single-node vs replicated
Section titled “Single-node vs replicated”The default schema is single-node. If you run a replicated topology, every
migration that creates or alters a Replicated* table must use ON CLUSTER '{cluster}':
ALTER TABLE periscope.artifact_events ON CLUSTER '{cluster}' ADD COLUMN IF NOT EXISTS filename Nullable(String) AFTER internal_name;The role is single-node aware today; replicated support is a future role var. Until then, replicated topologies need to apply migrations on each shard themselves.
Rollback
Section titled “Rollback”- Nullable columns are safe to leave — older service versions simply won’t populate them.
- Don’t drop columns during rollback unless explicitly required.
- Re-deploying an older service version is usually sufficient.
- Do not rely on DDL rollback for normal recovery — keep expand changes compatible, redeploy the previous binary, and leave extra columns in place until the contract release.
Troubleshooting
Section titled “Troubleshooting”“Unknown column” errors:
Code: 16. DB::Exception: Unknown column 'filename' in column listCause: Service deployed before migration was applied. Fix: Apply the migration, then restart the service.
Check ledger contents:
SELECT version, phase, seq, filename, checksum, applied_atFROM periscope._migrationsORDER BY version, phase, seqFORMAT Vertical;Check if a column exists:
SELECT name, type FROM system.columnsWHERE database = 'periscope' AND table = 'artifact_events' AND name = 'filename';