Data Architecture
Two databases, split by workload:
- PostgreSQL: Source of truth for entity state, configuration, and billing. Organized into service-specific schemas.
- ClickHouse: High-volume time-series analytics and event logs.
Service Ownership
Section titled “Service Ownership”PostgreSQL schemas are owned by services. Operators should treat these as service-private storage, not as a public integration API.
| Service | Owns | Canonical schema |
|---|---|---|
| Commodore | Users, auth tokens, wallet identities, streams, stream keys, clips, DVR, VOD, and push targets. | pkg/database/sql/schema/commodore.sql |
| Quartermaster | Tenants, clusters, nodes, service registry, bootstrap tokens, cluster access, and ingress declarations. | pkg/database/sql/schema/quartermaster.sql |
| Purser | Billing tiers, subscriptions, usage records, prepaid balances, payments, x402 nonces, and invoices. | pkg/database/sql/schema/purser.sql |
| Foghorn | Media artifact registry, artifact placement, node output snapshots, maintenance state, and processing jobs. | pkg/database/sql/schema/foghorn.sql |
| Periscope | PostgreSQL coordination state for analytics/billing jobs. High-volume analytics data is in ClickHouse. | pkg/database/sql/schema/periscope.sql |
| Navigator | Public TLS certificates, ACME accounts, internal CA state, and issued internal certificates. | pkg/database/sql/schema/navigator.sql |
| Skipper | Knowledge-base documents, conversations, crawl state, usage, reports, baselines, and staged posts. | pkg/database/sql/schema/skipper.sql |
| Listmonk / Chatwoot | Supplementary application schemas for email and support/chat integrations. | pkg/database/sql/schema/listmonk.sql, pkg/database/sql/schema/chatwoot.sql |
Real-time stream, viewer, node, artifact, storage, routing, federation, API, and processing events live in ClickHouse. Current-state tables such as stream, node, artifact, and viewer-session state are derived from event streams and optimized for analytics/query paths.
Schema Sources
Section titled “Schema Sources”This page is intentionally not a full table catalog. The repository SQL is the contract used by the CLI/provisioner and local development stack.
| Store | Source | Notes |
|---|---|---|
| PostgreSQL / YugabyteDB baseline schemas | pkg/database/sql/schema/*.sql | Embedded by pkg/database/sql/embed.go and applied by frameworks cluster init / frameworks cluster provision. |
| ClickHouse analytics schema | pkg/database/sql/clickhouse/periscope.sql | Defines event logs, current-state tables, materialized views, and rollups. |
| Static seed data | pkg/database/sql/seeds/static/purser_tiers.sql | Production reference data such as billing tiers. |
| Demo seed data | pkg/database/sql/seeds/demo/*.sql | Local/demo data only; separate from demo-mode API fallbacks. |
The SQL files are regular enough for tooling to generate a table and materialized-view index from
CREATE TABLE IF NOT EXISTS and CREATE MATERIALIZED VIEW IF NOT EXISTS statements. For
column-level or constraint-level reference material, prefer generated output from a SQL parser or
database introspection over hand-maintained Markdown.
Analytics Shape
Section titled “Analytics Shape”ClickHouse stores append-only event logs plus derived rollups:
- Raw events: stream lifecycle, viewer sessions, QoE samples, node metrics, routing decisions, federation events, storage events, processing events, API requests, and acquisition events.
- Current state: stream, node, artifact, and viewer-session snapshots optimized for fast reads.
- Rollups: 5-minute, hourly, and daily aggregates used for dashboards, billing, and capacity reporting.
Billing-critical rollups include cluster_id and origin_cluster_id in their grouping keys so
cross-cluster usage can be attributed correctly. See docs/architecture/cross-cluster-billing.md
for the settlement model.
Data Access Patterns
Section titled “Data Access Patterns”Pagination
Section titled “Pagination”Most public GraphQL list fields in the Control Plane and Analytics API use cursor-based pagination
(first/after and last/before).
- Default Limit: 50
- Max Limit: 500
- Response Metadata: Total count is returned in connection
totalCountfields.
Some internal service calls and infrastructure loaders use larger batch sizes or service-specific defaults when the caller is building topology or cache state instead of serving a public page.
Isolation
Section titled “Isolation”- Tenant ID: Required for tenant-scoped API and service queries. Authentication middleware puts
tenant_idin context; resolvers and service clients reject tenant-scoped reads when that context is missing. - Service Boundaries: Services connect only to their own Postgres schema. Cross-service access happens via gRPC (and a few HTTP endpoints where explicitly defined), never direct DB links.
Data Privacy
Section titled “Data Privacy”IP Address Handling
Section titled “IP Address Handling”Client IP addresses are stored in ClickHouse for internal analytics but redacted from API responses:
| Layer | Behavior |
|---|---|
| ClickHouse | Raw client IP stored in event fields such as connection_addr, host, and client_ip |
| GraphQL API | Returns null for connectionAddr and host fields |
| Signalman | Redacts host before WebSocket/subscription broadcast |
Exposed: Geographic aggregates, viewer counts, session data, service/node IPs (infrastructure visibility)
Redacted: Raw viewer IP addresses, connection addresses
Geographic Normalization (H3 Bucketing)
Section titled “Geographic Normalization (H3 Bucketing)”Viewer geographic data goes through two transformations before storage:
- GeoIP Lookup: Client IP → country, city, raw lat/lon (via MMDB database)
- H3 Bucketing: Raw lat/lon → H3 hexagonal cell → cell centroid coordinates
The platform uses Uber’s H3 spatial indexing at resolution 5:
| Resolution | Approx. Hexagon Area |
|---|---|
| 5 | ~252 km² |
What this means:
- Coordinates stored/exposed are cell centroids, not actual viewer locations
- Viewers within the same ~252 km² hexagon map to identical coordinates
- Sufficient for regional analytics; prevents precise location tracking
Geographic fields in ClickHouse:
| Field | Type | Description |
|---|---|---|
country_code | FixedString(2) | ISO country code |
city | String | City name from GeoIP |
latitude | Float64 | H3 cell centroid (normalized) |
longitude | Float64 | H3 cell centroid (normalized) |
Bandwidth Tracking
Section titled “Bandwidth Tracking”Bandwidth is tracked at multiple granularities:
| Level | Table | Use Case |
|---|---|---|
| Session | viewer_connection_events | Per-viewer bytes transferred |
| Stream | stream_analytics_daily | Daily per-stream totals |
| Tenant | tenant_viewer_daily | Billing rollups |
| Node | node_metrics_samples | Infrastructure capacity |