Hosted secrets-backend pick — follow-up to ADR-034¶
Status: Open Owner: backbone maintainers Tracking issue: #285 Source ADR: ADR-034 — Secrets and Config Management Strategy
This is the working document for the deferred decision in ADR-034: which secrets
backend backs the hosted plane. ADR-034 codified the contract (env vars on the
container; pydantic-settings reads them; rotation owners and cadence) but
deliberately left the backend implementation unselected. This file lists what
needs to happen, in what order, and against what criteria, so a human or a future
agent can pick up the work cold.
When to start¶
Trigger any of:
- Managed-hosting phase unparks. See
docs/plan/phase-managed-hosting.md. When that phase resumes (Apache 2.0 review complete, design partner committed, multi-tenancy hardened), this work is on the critical path before the first hosted deploy. - Phase 6 §6.13 deployment ADR is being authored. The platform decision (Hetzner / DigitalOcean / AWS K8s / Compose-only) drops out of §6.13. The secrets backend pick is downstream of it but should land in the same wave so self-hosters get one consistent ops story instead of two.
- An external constraint forces it sooner — e.g., a design partner needs a staging deploy, a security review demands per-secret audit logs that GitHub Actions secrets cannot provide.
If none of the above is true, do not start. ADR-034's whole point is that the deferral is safe: the application code is identical regardless of backend choice, so picking early just locks in an opinion before the constraints are knowable.
Decision criteria (from ADR-034)¶
The backend must satisfy all three:
- Platform fit. Match the hosting platform decided in §6.13 (don't pick AWS Secrets Manager if hosting lands on Hetzner; don't pick Hetzner vault if hosting lands on AWS).
- Per-environment scoping. Support dev / staging / prod isolation without manual file shuffling. A single shared "prod" namespace per environment is the bare minimum; per-tenant scoping is a bonus, not a requirement.
- Rotation without redeploy. Either (a) sidecar that re-renders env on rotation and signals the app process, or (b) a controlled rolling restart triggered by the rotation event. If the only rotation path is "edit secret, redeploy app," that fails the criterion.
Candidates¶
These are the four candidates ADR-034 captured. Add or remove only if a new constraint makes one infeasible (e.g., AWS goes off the table because hosting lands on Hetzner). Do not silently re-rank without recording why.
| Candidate | Native fit | Pricing model | Rotation story |
|---|---|---|---|
| Hetzner vault | Hetzner-hosted K8s; minimum vendor surface | Operator-run; infra cost only | DIY — write the runbook ourselves |
| Doppler | Cross-cloud; want a managed UI for editing across dev/stg/prd | Per-seat SaaS | Built-in; webhook on rotation |
| 1Password Connect | Team already on 1Password Business | Per-seat 1P + self-hosted Connect server (free) | Connect server pulls; rolling restart |
| AWS Secrets Manager | Hosting on AWS; want IAM-scoped access + native rotation hooks | Per-secret + per-API-call usage | Native rotation Lambdas; KMS-backed |
Steps¶
Do them in order. Each step has a verification check; if the check fails, stop and investigate before moving on.
Step 1 — Confirm the platform from §6.13¶
- Action. Read the §6.13 deployment ADR (look for
ADR-0XXtitled "Deployment ADR — Compose, images, K8s path" indocs/DECISIONS.md, or check thephase-6-admin-ops.mdADR table for the assigned number). - Verify. The platform (Hetzner K8s / DO K8s / AWS EKS / Compose-only / etc.) is named explicitly in that ADR's Decision section. If §6.13 is not yet merged, this work blocks on it — do not pick a backend speculatively.
Step 2 — Score the four candidates¶
- Action. For each candidate in the table above, write a one-paragraph fit assessment against the three decision criteria. Do this in a scratch document or as comments on the tracking issue — not yet in an ADR, because picks 2-4 are alternatives, not the decision.
- Verify. Each candidate has a clear pass/fail/maybe per criterion, with the reason in writing. At least one candidate is a clear pass on all three; if none is, the criteria themselves need to be re-examined (escalate).
Step 3 — POC the leading candidate¶
- Action. Stand up the leading candidate against a staging or scratch
cluster. Provision a single test secret (
POC_TEST_VALUE=hello), wire it into a temporary Settings field, deploy a minimal container, confirm the env var arrives. Then rotate the secret without redeploying the container and confirm the new value is picked up within whatever the documented SLA is (sidecar reload window, scheduled rolling restart, etc.). - Verify. Three checks: (a) initial provisioning works end-to-end without ad-hoc manual steps; (b) rotation propagates within the SLA; (c) the audit log of the backend shows both the rotation event and the read events from the container. If any check fails, fall back to candidate #2 and repeat.
Step 4 — Make the decision¶
- Action. Pick. Either:
- Amend ADR-034 by adding a new section "Hosted backend selection (resolved YYYY-MM-DD)" under the Decision section; OR
- File a follow-up ADR (next free number — check the bottom of
docs/DECISIONS.md) titled "Hosted secrets backend selection" that supersedes the deferred portion of ADR-034.
Prefer amending if the rest of ADR-034 stands unchanged; prefer a new ADR if
the picked backend forces other contract changes (e.g., an additional config
layer, new naming conventions for backend-specific keys).
- Verify. The chosen ADR has: the picked backend, the date, the rejected
candidates with one-line rationales, and a link from phase-managed-hosting.md
retrospective backlog table (already linked to ADR-034 today — update the
link to point to the resolution if a new ADR is created).
Step 5 — Author the rotation runbook¶
- Action. Create
docs/ops/secret-rotation.md. For each rotation-eligible secret insrc/climate_lama/config.py(APP_SECRET_KEY,API_KEY_PEPPER,DATABASE_URLpassword component,MINIO_ACCESS_KEY/MINIO_SECRET_KEY, OIDC client secret, SMTP password), document: who rotates, the cadence (default ≥90 days from ADR-034), the exact command sequence in the chosen backend, and the rollout mechanism (sidecar reload vs. rolling restart). - Verify. A new contributor can rotate any one of the listed secrets by following the runbook step-by-step without asking questions in chat. Cross-check by handing it to someone who has not been in this conversation.
Step 6 — Wire deploy templates to the backend¶
- Action. Update the Helm chart, Compose file, or whatever §6.13 settled on
to source env vars from the chosen backend (
externalSecrets/ sidecar / init container / etc.). Remove any temporary plaintext secret-injection paths used during the POC. - Verify. A clean deploy from scratch into a non-prod environment boots
successfully with all secrets sourced from the backend;
git grep-ing for hardcoded secret values in the deploy templates returns nothing.
Step 7 — Update parked-phase pointers¶
- Action. Edit
docs/plan/phase-managed-hosting.md: the retrospective backlog row for #102 already points at ADR-034; add a second line or update it to point at the resolution (ADR amendment or follow-up ADR). Also update thephase-6-admin-ops.md§6.14 entry to mark the deferred-pick portion as resolved. - Verify. No stale "deferred" or "TBD" references to the secrets-backend
decision remain in
docs/plan/. Rungit grep -i "secrets backend.*defer\|defer.*secrets backend"— should return only the historical text inside ADR-034 itself.
Out of scope (do NOT bundle into this work)¶
- Per-tenant secrets isolation. That's a multi-tenancy concern, not a backend concern. If the picked backend supports it natively, great; if not, do not invent it. File a separate issue.
- SSO / OIDC for the backend's admin UI. Useful operationally but not on the critical path for the first hosted deploy.
- Migrating dev contributors away from
.envfiles. ADR-034 explicitly keeps.envas the dev-loop source. Don't change that here. - CI secret store changes. GitHub Actions secrets stay as the CI source per ADR-034. Only the hosted plane is in scope.
Definition of done¶
- Decision recorded in
docs/DECISIONS.md(amended ADR-034 or new ADR) docs/ops/secret-rotation.mdrunbook merged- Deploy templates source secrets from the chosen backend
- Parked-phase pointers updated to reference the resolution
- Tracking issue closed with the resolution ADR number in the closing comment