Skip to content

Data Architecture

Two databases, split by workload:

  • PostgreSQL: Source of truth for entity state, configuration, and billing. Organized into service-specific schemas.
  • ClickHouse: High-volume time-series analytics and event logs.

PostgreSQL schemas are owned by services. Operators should treat these as service-private storage, not as a public integration API.

ServiceOwnsCanonical schema
CommodoreUsers, auth tokens, wallet identities, streams, stream keys, clips, DVR, VOD, and push targets.pkg/database/sql/schema/commodore.sql
QuartermasterTenants, clusters, nodes, service registry, bootstrap tokens, cluster access, and ingress declarations.pkg/database/sql/schema/quartermaster.sql
PurserBilling tiers, subscriptions, usage records, prepaid balances, payments, x402 nonces, and invoices.pkg/database/sql/schema/purser.sql
FoghornMedia artifact registry, artifact placement, node output snapshots, maintenance state, and processing jobs.pkg/database/sql/schema/foghorn.sql
PeriscopePostgreSQL coordination state for analytics/billing jobs. High-volume analytics data is in ClickHouse.pkg/database/sql/schema/periscope.sql
NavigatorPublic TLS certificates, ACME accounts, internal CA state, and issued internal certificates.pkg/database/sql/schema/navigator.sql
SkipperKnowledge-base documents, conversations, crawl state, usage, reports, baselines, and staged posts.pkg/database/sql/schema/skipper.sql
Listmonk / ChatwootSupplementary application schemas for email and support/chat integrations.pkg/database/sql/schema/listmonk.sql, pkg/database/sql/schema/chatwoot.sql

Real-time stream, viewer, node, artifact, storage, routing, federation, API, and processing events live in ClickHouse. Current-state tables such as stream, node, artifact, and viewer-session state are derived from event streams and optimized for analytics/query paths.

This page is intentionally not a full table catalog. The repository SQL is the contract used by the CLI/provisioner and local development stack.

StoreSourceNotes
PostgreSQL / YugabyteDB baseline schemaspkg/database/sql/schema/*.sqlEmbedded by pkg/database/sql/embed.go and applied by frameworks cluster init / frameworks cluster provision.
ClickHouse analytics schemapkg/database/sql/clickhouse/periscope.sqlDefines event logs, current-state tables, materialized views, and rollups.
Static seed datapkg/database/sql/seeds/static/purser_tiers.sqlProduction reference data such as billing tiers.
Demo seed datapkg/database/sql/seeds/demo/*.sqlLocal/demo data only; separate from demo-mode API fallbacks.

The SQL files are regular enough for tooling to generate a table and materialized-view index from CREATE TABLE IF NOT EXISTS and CREATE MATERIALIZED VIEW IF NOT EXISTS statements. For column-level or constraint-level reference material, prefer generated output from a SQL parser or database introspection over hand-maintained Markdown.

ClickHouse stores append-only event logs plus derived rollups:

  • Raw events: stream lifecycle, viewer sessions, QoE samples, node metrics, routing decisions, federation events, storage events, processing events, API requests, and acquisition events.
  • Current state: stream, node, artifact, and viewer-session snapshots optimized for fast reads.
  • Rollups: 5-minute, hourly, and daily aggregates used for dashboards, billing, and capacity reporting.

Billing-critical rollups include cluster_id and origin_cluster_id in their grouping keys so cross-cluster usage can be attributed correctly. See docs/architecture/cross-cluster-billing.md for the settlement model.


Most public GraphQL list fields in the Control Plane and Analytics API use cursor-based pagination (first/after and last/before).

  • Default Limit: 50
  • Max Limit: 500
  • Response Metadata: Total count is returned in connection totalCount fields.

Some internal service calls and infrastructure loaders use larger batch sizes or service-specific defaults when the caller is building topology or cache state instead of serving a public page.

  • Tenant ID: Required for tenant-scoped API and service queries. Authentication middleware puts tenant_id in context; resolvers and service clients reject tenant-scoped reads when that context is missing.
  • Service Boundaries: Services connect only to their own Postgres schema. Cross-service access happens via gRPC (and a few HTTP endpoints where explicitly defined), never direct DB links.

Client IP addresses are stored in ClickHouse for internal analytics but redacted from API responses:

LayerBehavior
ClickHouseRaw client IP stored in event fields such as connection_addr, host, and client_ip
GraphQL APIReturns null for connectionAddr and host fields
SignalmanRedacts host before WebSocket/subscription broadcast

Exposed: Geographic aggregates, viewer counts, session data, service/node IPs (infrastructure visibility)

Redacted: Raw viewer IP addresses, connection addresses

Viewer geographic data goes through two transformations before storage:

  1. GeoIP Lookup: Client IP → country, city, raw lat/lon (via MMDB database)
  2. H3 Bucketing: Raw lat/lon → H3 hexagonal cell → cell centroid coordinates

The platform uses Uber’s H3 spatial indexing at resolution 5:

ResolutionApprox. Hexagon Area
5~252 km²

What this means:

  • Coordinates stored/exposed are cell centroids, not actual viewer locations
  • Viewers within the same ~252 km² hexagon map to identical coordinates
  • Sufficient for regional analytics; prevents precise location tracking

Geographic fields in ClickHouse:

FieldTypeDescription
country_codeFixedString(2)ISO country code
cityStringCity name from GeoIP
latitudeFloat64H3 cell centroid (normalized)
longitudeFloat64H3 cell centroid (normalized)

Bandwidth is tracked at multiple granularities:

LevelTableUse Case
Sessionviewer_connection_eventsPer-viewer bytes transferred
Streamstream_analytics_dailyDaily per-stream totals
Tenanttenant_viewer_dailyBilling rollups
Nodenode_metrics_samplesInfrastructure capacity