Automate secret rotation in Kubernetes, then get out of the way!

Márk Sági-Kazár

2023-05-12 @ Open Source Summit NA 2023


Márk Sági-Kazár

Open Source Tech Lead @ Cisco

CNCF Ambassador


Once upon a time…

Your secrets WILL be compromised…


What are you going to do about it?

Why is secret rotation important?

  • Maintain security of sensitive information
  • Meet compliance requirements
  • Reduce the risk of a data breach

Challenges of secret rotation

  • Complexity
  • Time-consuming and error prone process
  • Disruption of service availability

Secret rotation should be…

  • possible
  • automated
  • periodic

Secret rotation flow

    actor Operator
    participant Provider as Secret provider
    participant Store as Secret store
    participant Deploy as ???
    participant Production

    Deploy->>Store: Watch for changes
    activate Deploy
    Operator->>Provider: Generate new secret
    Provider-->>Operator: Return new secret
    Operator->>Store: Rotate secret in store
    Store-->>Deploy: Notice secret change
    deactivate Deploy
    Deploy->>Production: Deploy new secret

Secret rotation in Kubernetes

⚠️ Plug the holes first! ⚠️

  • Turn on encryption at rest
  • Configure least-privilege access to Secrets

Official guide: Good practices for Kubernetes Secrets

Deploying secrets to Kubernetes

  • External Secrets Operator (ESO):
  • Synchronize secrets from an external store to Kubernetes
  • Mount secrets as usual (env var, file)


Triggering workload rollout

    participant Store as Secret store
    participant ExternalSecrets as External secrets
    participant Kubernetes
    participant Reloader

    ExternalSecrets->>Store: Watch for changes
    Reloader->>Kubernetes: Watch for changes
    Store-->>ExternalSecrets: Notice secret change
    ExternalSecrets->>Kubernetes: Deploy new secret
    Kubernetes-->>Reloader: Notice secret change
    Reloader->>Kubernetes: Trigger workload rollout

What could possibly go wrong?

Who knows, so monitor everything


Potential high cardinality labels (drop metrics/labels you don’t need)

Changes take effect with a delay

  1. Change some configuration ✏️
  2. Wait until the next secret sync period 🤞
  3. Hope nothing breaks 🙏

Solution: create (and modify) test secrets at the same time.

Cascading effect of an outage 1

Requirement: Use store validation.

  1. Provider goes down for a long time (ie. hours) ❌
  2. Store validation reaches a backoff of hours ⏳
  3. Secret synchronization essentially stops 😱

Solution: Bump every (Cluster)SecretStore after an outage.

To sum up ESO

  • Understand how (and when) changes will take effect
  • Monitor and alert for failures

Kubernetes without secrets 😱

Access secret store directly

  • Integrated into the application


  • “Inject” secrets into the application

Secret injection in Kubernetes

  • Inject a custom init into Pods using a mutating admission webhook
  • Get secrets from secret store in the custom init
  • Inject secrets as environment variables


  • Started at Banzai Cloud
  • Vault Swiss Army knife

Bank-Vaults secret injection

  • Secret references: vault:path/to/secret#KEY
  • Mutating webhook
    • Detect secret references
    • Mutate Pods
  • Custom init replaces secret references with actual values


Secret changes do not take effect (ie. trigger workload reload) at the moment.

Risks and mitigations

Risk: Secret store is a SPOF

Mitigation: Maintain a cluster-local instance

Risk: Webhook is a SPOF

Mitigation: Configure webhook according to best practices


Bank-Vaults Roadmap

  • Moving to a new GitHub organization
  • Workload reload on secret change
  • Support for more providers
  • Secret synchronization between providers
  • Your desired feature (submit a new feature request)


Final thoughts

It seems wisest to assume the worst from the beginning…and let anything better come as a surprise.

Jules Verne

Thank you

Any questions?