Skip to content

Ansible-driven provisioning

The CLI provisions, validates, and updates infrastructure by invoking ansible-playbook under the hood via the apenella/go-ansible v2 library. All role content lives in a single collection at ansible/collections/ansible_collections/frameworks/infra; external community content (Postgres, Redis, ClickHouse, Prometheus, community.sops, Docker, and common modules) is pinned in ansible/requirements.yml and installed into ansible/.cache/ by make ansible-galaxy-install. Caddy is installed by the in-tree frameworks.infra.caddy role rather than an external Galaxy role.

The role and playbook inventory changes as provisioning grows. Treat the tree as code-owned, not docs-owned:

SurfaceSource of truth
External Ansible contentansible/requirements.yml
Ansible config pathsansible/ansible.cfg
Playbooks invoked by the CLIansible/playbooks/
In-tree collection and rolesansible/collections/ansible_collections/frameworks/infra/
CLI role selection and executioncli/pkg/provisioner/ and cli/pkg/ansiblerun/

Use make ansible-check to verify playbook syntax and make ansible-galaxy-install to install pinned external content.

The CLI’s subcommands map onto role tags:

CLIAnsible tagsFlags
cluster provisioninstall,configure,service,validate
cluster provision --dry-runinstall,configure,service--check --diff
cluster doctor(observed-state survey, no role run)
cluster upgradeinstall,configure,service,validate
cluster upgrade --dry-runinstall,configure,service--check --diff
cluster initinit; Postgres/Yugabyte also run baseline schema and expand migrations
cluster seedseed; ClickHouse seeds only with --demo and a periscope database
cluster migratemigrate for the selected schema phase
cluster migrate --dry-runmigrate for the selected schema phase--check --diff
cluster restartrestart
cluster drift(observed-state survey, no role run)

cluster drift and cluster doctor / cluster status are observed-state surveys built on top of the detect + health layers. They run direct port / HTTP / SQL probes from the CLI host rather than invoking a role, because “is the service reachable and healthy?” is a question SSH-free network probes answer correctly and faster than a full Ansible play.

For a role-level diff of what would change on apply, use cluster provision --dry-run: it invokes ansible-playbook with --check --diff per service, so the full Ansible diff (file contents, package versions, unit changes) is surfaced inline in the plan.

Tags are supplied to ansible-playbook via -t, so a role that short-circuits (e.g. the install tasks on an already-installed host) runs near-instantly.

community.sops v2.3.0 is loaded as an Ansible vars plugin. Files named group_vars/<group>.sops.yml or host_vars/<host>.sops.yml are decrypted transparently when Ansible loads group/host vars. The CLI forwards SOPS_AGE_KEY_FILE into the ansible-playbook subprocess environment; no in-Go decrypt before handoff.

The CLI is the same binary locally and in CI. GitOps pipelines usually have two jobs:

  • Pull requests run frameworks cluster provision --gitops-dir . --cluster <name> --dry-run.
  • Approved merges run frameworks cluster provision --gitops-dir . --cluster <name>.

CI must provide SSH access, SOPS/age material when encrypted files are used, and the same manifest source operators use locally. Keep workflow YAML in the GitOps repo, where branch names, environments, approvals, and cache paths can match that repository’s policy.

The CLI’s collection-install step is idempotent and keyed on sha256(requirements.yml) — unchanged inputs skip the ansible-galaxy call entirely.

  • make provision-hello — end-to-end smoke against localhost with the hello role.
  • make ansible-check — syntax-check every playbook under ansible/playbooks/.
  • make ansible-lint — ansible-lint production profile on the collection.
  • make ansible-yamllint — yamllint on role + playbook YAML.
  • make ansible-test — the above, chained.
  • molecule test -s default inside any role directory — full converge + idempotence with Docker driver.

Edge-node provisioning (edge provision, edge deploy) runs through the frameworks.infra.edge meta-role, which composes frameworks.infra.mistserver, frameworks.infra.helmsman, and an edge-shaped Caddy install (plus an optional vmagent) in Linux systemd, macOS launchd, or docker mode. Use native mode for production edge nodes: Foghorn’s edge release reconciler can roll native Helmsman, MistServer, and Caddy components over the existing Helmsman stream, including MistServer in-place reloads. Docker mode remains available for local or constrained deployments, but Docker edge nodes are not auto-updated by the release reconciler yet; update them manually with frameworks edge update or a compose pull/up workflow. Preflight probes and post-apply HTTPS verification stay in Go so operators see fast-fail messages inline with the role output.