Security
How Failkit is built.
Last updated: 2026-04-28
Draft pending counsel review and SOC 2 audit. This page describes the platform’s actual posture. Compliance attestations will be linked here as they complete; the absence of an attestation today does not mean the underlying control is missing.
1. Per-tenant isolation
Every Failkit customer gets their own Kubernetes namespace, their own Postgres database, and their own subdomain (<customer>.failkit.com). Workloads do not share processes, network policies, or DNS records with other tenants.
- Compute: dedicated namespace with NetworkPolicy denying cross-namespace traffic by default.
- Database: dedicated CNPG-managed Postgres database with a tenant-scoped role; cross-database access is not granted.
- Defense-in-depth: table-level Row-Level Security (RLS) policies enforce a
app.current_tenant GUC even within a tenant’s own DB — if a code path forgets to scope a query, RLS blocks it. - Identity: tenant admins are issued in the tenant’s own DB; the management plane mints short-lived bootstrap tokens (5-minute TTL, replay-protected) to seed them.
Enterprise customers can opt into dedicated tenancy: a separate cluster on isolated infrastructure with no multi-tenant components.
2. Encryption
In transit
- All public traffic uses TLS 1.2+ with modern cipher suites. Certificates are issued by Let’s Encrypt via DNS-01 ACME and renewed automatically.
- Replication agents communicate with the management plane over mTLS using a per-tenant client certificate signed by an internal CA.
- Internal pod-to-pod traffic is constrained by Calico NetworkPolicies; sensitive paths additionally use mTLS via the service mesh.
At rest
- Postgres data is encrypted at rest using cloud-provider KMS (AES-256). Backups are encrypted with the same key envelope.
- Replicated VM payloads are encrypted by the agent before leaving Customer’s environment, using keys stored in Customer’s own cloud-provider KMS. Failkit cannot decrypt those payloads.
- Object-store buckets used as replication targets sit in Customer’s own cloud account, under Customer’s own IAM policies.
3. Access control
Customer access
- Tenant portals support email + password by default (bcrypt-hashed, 12 cost rounds).
- Professional and Enterprise tiers include SSO via SAML 2.0 and OIDC.
- Role-based access control inside the tenant:
owner, admin, operator, auditor. Audit-only roles cannot trigger drills or modify protection groups.
Failkit employee access
- Failkit staff cannot access Customer Data without an explicit, time-bound support request from Customer.
- Production access is granted via short-lived (8-hour) credentials brokered by an SSO-fronted bastion. All access is logged.
- Production secrets are managed in HashiCorp Vault; no long-lived credentials are stored on workstations.
- Mandatory hardware MFA (FIDO2) for all employees with production access.
4. Replication & drill data path
The replication path is designed so Failkit’s management plane never holds Customer VM payloads:
- The on-prem agent reads VM blocks from the source hypervisor (vSphere or Hyper-V).
- Blocks are encrypted client-side and uploaded directly to the customer-controlled object-store bucket (S3, Azure Blob, GCS).
- The agent reports metadata to failkit.com: byte counts, checksums, RPO observed. The bytes themselves never traverse Failkit infrastructure.
- During a drill, the orchestrator instructs the customer’s cloud account to spin up VMs from the replicated blocks. Drill VMs run on Customer’s own compute, billed to Customer’s own cloud account.
A single Failkit incident cannot expose Customer VM data, because we don’t hold it.
5. Auditability
- Every drill produces a signed PDF report with timestamp, RPO/RTO measured, VM-by-VM boot result, and a SHA-256 hash chained to the prior report. Signatures use Ed25519 with a per-tenant key.
- Reports are retained per the tenant’s configured retention policy (default 7 years for Enterprise, 1 year otherwise) and are downloadable directly from the portal.
- All administrative actions (user changes, policy changes, drill triggers) emit immutable audit events stored alongside the tenant’s own DB; auditors can be granted read-only access.
6. Vendor & sub-processor list
The list of sub-processors and the data each handles is on the Privacy page.
7. Compliance roadmap
- SOC 2 Type I: target Q4 2026. Auditor selection in progress.
- SOC 2 Type II: target H1 2027.
- ISO 27001: target 2027 following SOC 2 Type II.
- HIPAA: Business Associate Agreement available for Enterprise on request; the platform’s data path is compatible (PHI never traverses Failkit infrastructure) but BAA is signed per-customer.
- FedRAMP / IL-4: not on the near-term roadmap. Contact us if you need it.
8. Incident response
Failkit maintains a documented incident-response runbook. We will notify affected customers within 24 hours of confirming a security incident affecting their data, with a written postmortem within 14 days. Status of ongoing incidents is published at status.failkit.com (planned).
9. Reporting a vulnerability
We welcome coordinated disclosure. Email [email protected] with details. PGP key fingerprint will be published here once the inbox is signed off. We commit to:
- Acknowledging receipt within 2 business days.
- Triaging within 5 business days.
- Not pursuing legal action against good-faith researchers who follow this process.
Out-of-scope: denial-of-service testing, social engineering of Failkit employees, physical attacks on Failkit facilities or vendors.