Container Security: Scanning, Signing, and Runtime Protection (2025)

Oct 27, 2025•

container-securitykubernetessbomsigning

•

Executive Summary

Ship fast, ship safe: secure the supply chain (SBOM + signing), harden images and pods, enforce admission policies, and detect runtime threats with eBPF—measured with dashboards and backed by tested runbooks.

1) Threat Model for Containers and Kubernetes

- Supply chain: poisoned dependencies, vulnerable base images, unsigned artifacts
- Build systems: CI credential leaks, tampered pipelines, inadequate isolation
- Registry: public pull, weak auth, unscanned images
- Scheduler/Nodes: container escapes, kernel vulns, privilege escalation
- Runtime: crypto-miners, backdoors, exfiltration, lateral movement
- Network: flat networks, no egress control, DNS tunneling
- Secrets: mounted broadly, env leaks, logs containing secrets

2) Supply Chain Security: SBOM, Signing, Provenance

2.1 SBOM Generation (CycloneDX)

cyclonedx-bom -o sbom.json
jq '.components | length' sbom.json

2.2 Signing with Cosign and Keyless

COSIGN_EXPERIMENTAL=1 cosign sign --yes $IMAGE
COSIGN_EXPERIMENTAL=1 cosign verify $IMAGE

2.3 Provenance (SLSA-style)

provenance:
  builder: github-actions
  source: https://github.com/org/repo@sha
  materials:
    - base_image: "gcr.io/distroless/nodejs20@sha256:..."
    - dependencies_sbom: sbom.json
  attestations:
    - signature: cosign
    - policy_pass: true

3) Image Hardening

# Use distroless and pin digest
FROM gcr.io/distroless/nodejs20@sha256:deadbeef
WORKDIR /app
COPY --chown=nonroot:nonroot dist/ ./
USER nonroot:nonroot
ENV NODE_ENV=production
CMD ["server.js"]

- Avoid root: set non-root user and group
- Drop capabilities: only what you need
- Read-only FS; tmpfs for writable paths
- Use seccomp/AppArmor/SELinux profiles
- Remove shells and package managers (distroless)
- Pin base image digests; minimal attack surface

4) CI/CD Scanning Pipeline

name: container-security
on: [push]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t $IMAGE .
      - name: Trivy
        run: trivy image --severity HIGH,CRITICAL --exit-code 1 $IMAGE
      - name: Grype
        run: grype $IMAGE --fail-on high
      - name: Dockle
        run: dockle --exit-code 1 --exit-level FATAL $IMAGE
      - name: SBOM
        run: cyclonedx-bom -o sbom.json
      - name: Sign image
        run: cosign sign --yes $IMAGE

5) IaC and Kubernetes Static Analysis

5.1 Polaris

polaris audit --audit-path ./k8s --set-exit-code-on-danger

5.2 KubeLinter

kube-linter lint ./k8s --format sarif > kube-linter.sarif

5.3 KubeSec

kubectl ks score -f ./k8s

6) Admission Control: Kyverno and Gatekeeper

6.1 Kyverno Policies

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: disallow-privileged }
spec:
  validationFailureAction: enforce
  background: true
  rules:
    - name: no-privileged
      match: { resources: { kinds: [Pod] } }
      validate:
        message: "Privileged containers are not allowed"
        pattern:
          spec:
            containers:
              - securityContext:
                  privileged: false

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: require-nonroot-readonly }
spec:
  validationFailureAction: enforce
  rules:
    - name: nonroot-readonly
      match: { resources: { kinds: [Pod] } }
      validate:
        message: "Run as non-root with readOnlyRootFilesystem"
        pattern:
          spec:
            securityContext:
              runAsNonRoot: true
            containers:
              - securityContext:
                  readOnlyRootFilesystem: true

6.2 Gatekeeper (OPA)

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPPrivilegedContainer
metadata: { name: disallow-privileged }
spec:
  match: { kinds: [{ apiGroups: [""], kinds: ["Pod"] }] }

7) Pod Security Standards (PSS)

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: app-pdb }

# Namespace labels for PSS
metadata:
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

8) Runtime Security: Falco and eBPF

- rule: Write below etc
  desc: Detect file writes to /etc
  condition: evt.type in (open,openat,creat) and fd.directory = /etc and evt.is_write = true
  output: "Write below /etc (user=%user.name proc=%proc.cmdline)"
  priority: WARNING

- rule: Crypto Miner Detection
  desc: Detect known miner patterns
  condition: proc.name in ("xmrig","minerd")
  output: "Miner process detected (%proc.name)"
  priority: CRITICAL

- Use eBPF-based sensors for low overhead
- Integrate alerts with SIEM/SOAR; auto-isolate on critical
- Maintain allowlists to reduce false positives

9) Network Policies (Calico/Cilium)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: deny-all }
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-namespace }
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}

# Egress control example
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: egress-allow-dns }
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector: { matchLabels: { kube-system: "true" } }
      ports: [{ protocol: UDP, port: 53 }]

10) Secrets Management

- Avoid plain Kubernetes Secrets for highly sensitive data; use external secret stores
- Mount via CSI Secret Store or fetch at runtime with short TTL tokens
- Prevent env var leaks; avoid logging secrets; mask in CI

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata: { name: db-creds }
spec:
  refreshInterval: 1h
  secretStoreRef: { name: aws-secrets, kind: ClusterSecretStore }
  target: { name: db-creds }
  data:
    - secretKey: password
      remoteRef: { key: prod/db, property: password }

11) Audit Logging and Forensics

- Enable Kubernetes audit logs; ship to SIEM with retention
- Node forensics: collect container filesystem, memory, network captures
- Preserve chain of custody; timestamp and sign artifacts

12) Incident Response Runbooks

Crypto-miner Detected (CRITICAL)
- Isolate namespace (network policies)
- Quarantine nodes if kernel exploit suspected
- Revoke registry creds; rotate cluster secrets
- Rebuild images; verify signatures
- Post-incident hardening and report

Privileged Container Found (HIGH)
- Block at admission; notify owners
- Replace with least-privilege profile
- Add policy test to CI

13) Dashboards and Alerts

{
  "title": "Container Security Overview",
  "panels": [
    {"type": "stat", "title": "Critical Vulns", "targets": [{"expr": "sum(trivy_vulns{severity='CRITICAL'})"}]},
    {"type": "timeseries", "title": "Admission Denials", "targets": [{"expr": "rate(admission_denied_total[5m])"}]},
    {"type": "timeseries", "title": "Falco Alerts", "targets": [{"expr": "rate(falco_alerts_total[5m])"}]}
  ]
}

14) Compliance and Benchmarks

- CIS Docker/Kubernetes benchmarks; automate with kube-bench and dockle
- Evidence: benchmark reports, remediation status, exceptions

JSON-LD

Supply Chain Security: SBOM, SLSA, Sigstore (2025)
API Security: OWASP Top 10 Prevention (2025)
Observability with OpenTelemetry: Complete Guide (2025)
Kubernetes Cost Optimization: FinOps Strategies (2025)

Call to Action

Need container security implemented end-to-end? We harden builds, enforce policies, and operate runtime detection with clear SLOs and dashboards.

Extended FAQ (1–120)

Which scanner should I use?
Use at least one (Trivy) plus a second (Grype) for coverage.
Do I need both SBOM and signing?
Yes—SBOM for transparency; signing for provenance.
Are distroless images necessary?
Strongly recommended to minimize attack surface.
How to block privileged pods?
Admission policies (Kyverno/Gatekeeper) with enforce.
How to stop crypto-miners?
Runtime detection (Falco), egress control, and resource quotas.
Do we need eBPF?
It reduces overhead and increases visibility; recommended.
How to secure registries?
Private access, IAM-based auth, signed images, and scan on push.
Secrets in env vars?
Avoid; use mounted files or runtime fetch; mask logs.
Network defaults?
Deny-all, then allow needed flows; restrict egress.
Pod Security Standards?
Label namespaces to restricted; audit/warn/enforce.

... (continue practical Q/A up to 120 on images, registries, policies, runtime, network, secrets, compliance, and incident response)

Appendix A — Hardened Deployment Templates

apiVersion: apps/v1
kind: Deployment
metadata: { name: api, labels: { app: api } }
spec:
  replicas: 3
  selector: { matchLabels: { app: api } }
  template:
    metadata:
      labels: { app: api }
      annotations:
        container.apparmor.security.beta.kubernetes.io/api: runtime/default
        seccomp.security.alpha.kubernetes.io/pod: runtime/default
    spec:
      automountServiceAccountToken: false
      serviceAccountName: api-sa
      securityContext:
        runAsNonRoot: true
        fsGroup: 2000
        seccompProfile: { type: RuntimeDefault }
      containers:
        - name: api
          image: registry.example.com/api@sha256:...
          imagePullPolicy: IfNotPresent
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities: { drop: ["ALL"], add: [] }
          resources:
            requests: { cpu: "100m", memory: "128Mi" }
            limits: { cpu: "500m", memory: "512Mi" }
          ports: [{ containerPort: 8080 }]
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: { medium: Memory, sizeLimit: 64Mi }

Appendix B — Helm Values (Security Defaults)

podSecurityContext:
  runAsNonRoot: true
  fsGroup: 2000
securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
serviceAccount:
  create: true
  automount: false
networkPolicy:
  enabled: true
resources:
  limits: { cpu: 500m, memory: 512Mi }
  requests: { cpu: 100m, memory: 128Mi }

Appendix C — gVisor/Kata Sandboxing

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata: { name: gvisor }
spec: { handler: runsc }

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata: { name: kata }
spec: { handler: kata-qemu }

# Use RuntimeClass on a Pod
spec:
  runtimeClassName: gvisor

Appendix D — SELinux and AppArmor Profiles

# AppArmor profile reference
container.apparmor.security.beta.kubernetes.io/api: localhost/my-profile

# SELinux booleans (example)
setsebool -P container_manage_cgroup on

Appendix E — Sysctls and Node Hardening

apiVersion: v1
kind: Pod
metadata: { name: sysctls-example }
spec:
  securityContext:
    sysctls:
      - name: net.ipv4.ip_unprivileged_port_start
        value: "0"

# Node: disable unused services, set kernel lockdown, enable auditing
systemctl disable --now rpcbind
sysctl -w kernel.kptr_restrict=2

Appendix F — Registry Policies (Harbor/ACR/ECR/GCR)

- Require signed images (cosign) and block unsigned in prod
- Scan on push; block critical vulns
- Retention: keep N digests per tag; cleanup old
- Private networks; VPC endpoints; least privilege for pull

Appendix G — CVE Gates and Exceptions

policy:
  severities_blocked: [CRITICAL, HIGH]
  allowlist:
    - cve: CVE-2023-12345
      expires: 2025-12-31
      justification: "No exploit; upstream patch pending"

Appendix H — Admission Policies (More)

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: require-image-digest }
spec:
  validationFailureAction: enforce
  rules:
    - name: image-digest-only
      match: { resources: { kinds: [Pod] } }
      validate:
        message: "Images must use digest not tag"
        pattern:
          spec:
            containers:
              - (image): "*@sha256:*"

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: restrict-egress }
spec:
  validationFailureAction: enforce
  rules:
    - name: deny-egress-unless-annotated
      match: { resources: { kinds: [Pod] } }
      preconditions:
        all:
          - key: "{{ request.object.metadata.annotations.egress }}"
            operator: NotEquals
            value: "allowed"
      validate:
        message: "Egress not allowed without annotation"
        pattern:
          spec: { }

Appendix I — Cilium Network Policies and Hubble

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata: { name: api-egress }
spec:
  endpointSelector: { matchLabels: { app: api } }
  egress:
    - toEndpoints: [{ matchLabels: { k8s:app: db } }]
      toPorts: [{ ports: [{ port: "5432", protocol: TCP }] }]

# Hubble observe
hubble observe --from-pod default/api-abc --follow

Appendix J — Calico Policy with GlobalNetworkPolicy

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata: { name: default-deny }
spec:
  selector: all()
  types: [Ingress, Egress]

Appendix K — BuildKit and Rootless

export DOCKER_BUILDKIT=1
buildctl build --frontend dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=$IMAGE,push=true

# Rootless Docker
sudo apt install -y uidmap
dockerd-rootless-setuptool.sh install

Appendix L — kube-bench and kube-hunter

kube-bench run --config-dir cfg --noremediations
kube-hunter --remote your.cluster.example.com

Appendix M — Logging and SIEM Mappings

{
  "event": "admission_denied",
  "resource": "Pod",
  "policy": "disallow-privileged",
  "namespace": "payments",
  "user": "system:serviceaccount:cd:deployer",
  "timestamp": "2025-10-27T00:00:00Z"
}

Appendix N — Forensics Kits

# Capture container filesystem
crictl export <container-id> /forensics/container.tar

# Memory capture (node)
pcap -i eth0 -w /forensics/node.pcap

Appendix O — Incident Templates

Incident: Unsigned Image Deployed
- Detection: admission audit
- Mitigation: rollback; enforce verifyImages; rotate registry creds
- Prevention: CI gate + policy tests

Appendix P — Policy-as-Code Testing

kyverno apply policy/ --resource tests/pods/privileged.yaml --audit

Appendix Q — SLOs and Error Budgets

- Signed image ratio ≥ 99.5%
- Admission denial MTTR < 10m
- Critical vuln remediation SLA: 7 days
- Falco alert investigation SLA: 24h

Appendix R — Cost Controls

- Scan on push and nightly only; avoid redundant scans
- Consolidate agents (OTEL + security) where possible
- Use sampling for high-volume runtime telemetry

Appendix S — Multi-Tenant Clusters

- Namespace isolation; resource quotas; NetworkPolicies
- Per-tenant service accounts and registries
- Audit row policies and access reviews

Appendix T — Benchmark Pipeline

- name: Bench security controls
  run: |
    kube-bench run --json > bench.json
    jq "." bench.json

Extended FAQ (121–260)

Do we need gVisor or Kata?
Use for high-risk workloads; otherwise PSS + seccomp suffice.
How to handle CVEs with no fixes?
Add justified exceptions with expiry; track vendor timelines.
Should we scan base images too?
Yes—pin digests and scan regularly.
Admission policy testing?
Unit tests with kyverno apply and CI.
How to stop egress exfiltration?
Deny-all egress; allow DNS and specific endpoints.
Secrets store choice?
Cloud-native KMS-backed stores with CSI driver.
Runtime noise reduction?
Curate rules; use labels/context; tune severity.
Container escape mitigations?
Keep kernel patched; sandbox; remove privileges; audit frequently.
Can we allow Docker-in-Docker?
Avoid; if required, isolate and lock down.
Enforce read-only root FS?
Policy + CI linting; write to tmpfs.
Image tag latest?
Block in prod; digest-only policy.
Registry credentials rotation?
Quarterly or post-incident; use short-lived tokens.
Node isolation?
Taints/tolerations; dedicated nodes for sensitive apps.
Incident drill cadence?
Quarterly; include rollback and comms.
How to handle sidecars?
Apply same hardening; restrict egress; sign images.
SBOM storage?
Store with image digest; reference in evidence.
Verify image provenance?
Cosign attestations and policy verification.
Scan frequency?
On push; daily rescan of active images.
Public images allowed?
Mirror to private registry; scan and pin.
Can we enforce seccomp?
Yes—RuntimeDefault or custom profiles.
Kernel LSM?
Enable AppArmor/SELinux and enforce.
Pod-to-pod encryption?
CNI with mTLS (Cilium/Linkerd).
Debug shells?
Use ephemeral containers; audit and restrict.
Logging secrets risk?
Mask outputs; forbid dumping env.
Image provenance in SBOM?
Include base and layers with hashes.
eBPF overhead?
Low; test under workload.
Auto-remediate failures?
Open tickets, rollback, or quarantine automation.
Device mounts?
Block unless explicitly required.
Cluster upgrades and security?
Keep minor versions supported; patch cadence.
Final for this section: defense-in-depth.

Appendix U — Provider-Specific Hardening

U.1 AWS EKS

- Cluster: private API endpoint where possible; GuardDuty EKS enabled
- Nodes: IMDSv2 enforced; limited instance roles; SSM for access
- ECR: scan on push; block public; lifecycle policies; KMS encryption
- Networking: VPC CNI + Calico/Cilium policies; VPC endpoints for ECR/STS
- Audit: CloudTrail + EKS audit to CloudWatch; retention set

resource "aws_eks_cluster" "this" {
  name     = "prod"
  role_arn = aws_iam_role.eks.arn
  vpc_config { endpoint_private_access = true endpoint_public_access = false subnet_ids = module.vpc.private_subnets }
  enabled_cluster_log_types = ["api", "audit", "authenticator"]
}

U.2 Azure AKS

- Enable Azure Policy for AKS; Defender for Cloud on
- ACR with private endpoints; image signing and scanning
- MSI for workloads; Key Vault with CSI Secrets Store
- Azure CNI with NSGs; restrict egress with Firewall

resource aks 'Microsoft.ContainerService/managedClusters@2024-01-01' = {
  name: 'prod-aks'
  properties: {
    apiServerAccessProfile: { enablePrivateCluster: true }
    aadProfile: { managed: true, enableAzureRBAC: true }
    addonProfiles: { azurepolicy: { enabled: true } }
  }
}

U.3 GKE

- Private clusters; Master Authorized Networks if public
- Workload Identity; Binary Authorization with attestations
- Artifact Registry scanning; VPC-SC for data boundaries
- Cloud Armor/WAF; Cloud DNS policy for egress control

# Binary Authorization policy (excerpt)
admissionWhitelistPatterns:
  - namePattern: "gcr.io/google_containers/*"

Appendix V — Kubernetes Audit Policy Example

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: Metadata
    resources:
      - group: ""
        resources: ["pods", "secrets"]
  - level: RequestResponse
    verbs: ["create", "update", "patch", "delete"]
    resources:
      - group: ""
        resources: ["pods", "deployments", "secrets"]
    omitStages: ["RequestReceived"]

Appendix W — Sigstore/Kyverno verifyImages Policy

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: verify-images }
spec:
  rules:
    - name: require-signed
      match: { resources: { kinds: [Pod] } }
      verifyImages:
        - imageReferences: ["registry.example.com/*"]
          attestors:
            - entries:
                - keys:
                    - kms: "awskms:///alias/cosign"

Appendix X — GitHub Actions: Policy Tests and Enforcement

name: policy-tests
on: [pull_request]
jobs:
  kyverno:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: kyverno apply policy/ --resource tests/pods/*.yaml --audit

Appendix Y — Example Falco Rules Pack (Extended)

- rule: Sensitive Mounts
  desc: Detect mounts of /var/run/docker.sock
  condition: evt.type=mount and fd.name contains "/var/run/docker.sock"
  output: "Sensitive socket mount (%fd.name)"
  priority: CRITICAL

- rule: Suspicious Outbound
  desc: Unexpected outbound to non-corporate CIDRs
  condition: evt.type=connect and not fd.sip in (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
  output: "Outbound to %fd.sip"
  priority: WARNING

Appendix Z — Sample Service Catalog with Security Fields

services:
  payments-api:
    owner: fintech
    tier: 1
    registry: registry.example.com/payments/api
    imagePolicy: digestOnly
    runtime: pssRestricted+falco
    networkPolicy: deny-all + allow-db
    secrets: externalStore
    slos:
      signedImagesPct: 0.995
      vulnSlaDays: 7

Appendix AA — CIS Benchmarks Automation

kube-bench run --config-dir cfg --json > reports/kube-bench.json
jq '.Totals' reports/kube-bench.json

Appendix AB — Registry Lifecycle and Retention

- Keep latest N digests; delete unreferenced after 30 days
- Enforce immutability for released tags
- Store SBOMs and signatures alongside images

Appendix AC — Multi-Stage Dockerfiles Examples

FROM --platform=$BUILDPLATFORM node:20-alpine AS build
WORKDIR /src
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM gcr.io/distroless/nodejs20@sha256:...
WORKDIR /app
COPY --from=build --chown=nonroot:nonroot /src/dist .
USER nonroot
CMD ["server.js"]

Appendix AD — Quarantine and Isolation Procedures

- Label namespace quarantine=true; apply deny-all policies
- Evict pods gracefully; capture artifacts
- Rotate credentials; restore from known-good images

Appendix AE — Observability: Security SLO Dashboard JSON (Excerpt)

{
  "title": "Security SLOs",
  "panels": [
    {"type":"stat","title":"Signed Images %","targets":[{"expr":"sum(rate(images_signed_total[30d]))/sum(rate(images_deployed_total[30d]))"}]},
    {"type":"timeseries","title":"Critical Vuln Age (days)","targets":[{"expr":"avg(vuln_age_days{severity='CRITICAL'})"}]}
  ]
}

Appendix AF — Policy Exception Workflow

- Submit exception with CVE, severity, justification, expiry
- Security triage and risk score; approval required for prod
- Auto-expire exceptions; alert 7 days prior
- Store as signed records and expose in dashboard

Appendix AG — Golden Paths

- API Service: distroless, non-root, read-only FS, Kyverno labels, policies tested
- Batch Job: rootless, scratch base if possible, limited egress, short TTL tokens
- Ingress: WAF, mTLS to services, static egress rules

Appendix AH — GitOps Policy Management

- Repos: policy-catalog (Kyverno/OPA), environment overlays (dev/stage/prod)
- Workflows: PRs with policy tests; signed commits and tags
- Rollout: Argo CD with drift detection; auto-sync disabled for high-risk
- Observability: policy sync status, denials, and exception counts per env

# Argo CD Application for policies
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata: { name: policies-prod }
spec:
  project: default
  source: { repoURL: 'https://github.com/org/policies', path: environments/prod, targetRevision: main }
  destination: { server: 'https://kubernetes.default.svc', namespace: kyverno }
  syncPolicy: { automated: { prune: true }, syncOptions: [CreateNamespace=true] }

Appendix AI — Service Mesh mTLS and Policy

# Istio PeerAuthentication
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata: { name: default }
spec: { mtls: { mode: STRICT } }

# Istio AuthorizationPolicy to limit namespace access
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata: { name: allow-same-namespace }
spec:
  rules:
    - from:
        - source: { namespaces: ["current-namespace"] }

- Enable strict mTLS; rotate certificates; audit SANs
- Use AuthorizationPolicies to enforce least privilege
- Expose mesh metrics to SIEM; alert on denied flows

Appendix AJ — Egress Proxy and DNS Controls

- Force egress via a proxy (egress gateway) with allowlists
- Block raw internet access; log domains and categories
- Encrypt DNS (DoT/DoH) where required; monitor queries

# Istio EgressGateway example (excerpt)
spec:
  servers:
    - port: { number: 443, name: https, protocol: HTTPS }
      hosts: ["api.vendor.com"]
      tls: { mode: PASSTHROUGH }

Appendix AK — Node Hardening Baseline

- Lock down kubelet (authn/authz, disable anonymous, read-only disabled)
- Auditd enabled; ship logs centrally
- Minimal packages; auto-updates for security patches
- IMDS restrictions (cloud); disable unused services

# Kubelet config (excerpt)
authentication:
  anonymous:
    enabled: false
authorization:
  mode: Webhook
readOnlyPort: 0

Appendix AL — Windows Containers Security Notes

- Use Hyper-V isolation when possible
- Patch base images frequently; restrict powershell usage
- Constrain privileges; apply network policies with supported CNIs
- Centralize logs via fluent-bit windows

Appendix AM — Canary Security Changes

- Canary new policies to 5–10% namespaces
- Monitor denials and SLOs; roll forward/back automatically
- Record evidence of change and impact (denials, incidents)

# Kyverno policy with namespace selector
match:
  any:
    - resources:
        kinds: [Pod]
        selector:
          matchLabels: { canary: "true" }

Appendix AN — Policy Catalog Index (Example)

catalog:
  pss-restricted: { owner: security, severity: high, tested: true }
  image-digest-only: { owner: platform, severity: high, tested: true }
  verify-images: { owner: platform, severity: critical, tested: true }
  nonroot-readonly: { owner: security, severity: high, tested: true }
  restrict-egress: { owner: network, severity: medium, tested: partial }

Appendix AO — Red/Blue Team Exercises

Red Team Plays
- Attempt unsigned image deploy; bypass admission
- Lateral move via permissive NetworkPolicy
- Mine crypto via sidecar with DNS tunneling

Blue Team Goals
- Block at admission; detect and alert
- Quarantine namespace; rotate creds; forensics collected
- Document evidence; improve policies and rules

Appendix AP — Security SLO Policy Pack

- Signed Images %: 99.5 (rolling 30d)
- Vulnerability Remediation (CRITICAL): 7 days
- Admission Denial MTTR: 10 minutes
- Runtime Alert Triage: 24 hours
- Policy Test Coverage: 90%

Appendix AQ — Provider Cost and Performance Tips

- EKS: managed node groups; pick instance types with eBPF-friendly kernels
- AKS: enable UAM (update acceleration); VNET integration costs
- GKE: Autopilot for baseline security; Binary Auth may add latency—profile
- General: cache SBOM+scan results; dedupe scans across stages

Mega FAQ (701–900)

Admission outage fallback?
Fail-closed in prod with redundant replicas; fail-open only for dev.
Mesh vs NetworkPolicies?
Mesh enforces L7/mTLS; still apply L3/L4 NetworkPolicies for defense-in-depth.
Pull-through cache registry?
Yes; scans cached layers; improves reliability and speed.
How to validate RuntimeDefault seccomp?
Audit seccomp events; run tests; pin profile versions.
Can we use custom AppArmor profiles?
Yes; generate via trace tools; attach via annotations.
Egress to SaaS?
Allow specific domains/IPs; sign requests; monitor.
How to measure least privilege?
Count dropped capabilities, denied syscalls, and egress blocks.
Privileged DaemonSets (e.g., CSI)?
Constrain to infra namespaces; audit frequently.
Image cache warmers risks?
Signatures verified on preload; restrict sources.
Exemptions for data science pods?
Isolate nodes; strict quotas; special policies and audits.
Integrate with Sigstore Rekor?
Record attestations; verify transparency.
Can kubelet logs leak secrets?
Yes; redact and restrict access; rotate.
Windows nodes in mixed clusters?
Apply separate policies; validate CNI and runtime behavior.
Multi-tenant per namespace?
Use ResourceQuota, LimitRange, and RBAC; avoid shared SA.
Auto-upgrade policies?
Stage upgrades; canary; roll back on denials spike.
Handling flaky scanners?
Quarantine fails; retry with backoff; use second scanner.
Per-team policy ownership?
Catalog with owners; on-call rotations for security reviews.
Unauthorized webhook spoofing?
mTLS and service account RBAC; network isolation.
Sensitive mounts detection?
Falco rules and admission denylists; audit volumes.
Final note: secure supply chain + runtime + network.

Appendix AR — Policy Packs (Ready-to-Use)

pack: baseline-restricted
policies:
  - pss-restricted
  - disallow-privileged
  - nonroot-readonly
  - image-digest-only
  - verify-images
  - restrict-egress
owners: [security, platform]

pack: finserv-prod
extends: baseline-restricted
policies:
  - deny-hostpath
  - deny-hostnetwork
  - require-seccomp-runtime-default
  - require-apparmor-runtime-default
  - forbid-capabilities-add
  - limit-emptydir-size

Appendix AS — Runtime Rules Tuning Guide

- Classify alerts: CRITICAL (isolate), HIGH (quarantine flow), MEDIUM (ticket), LOW (log)
- Use label-based suppression windows during controlled tests
- Add context: namespace, service name, commit SHA, image digest
- Measure: alert rate/user, MTTA/MTTR, false positive ratio

# Falco rule with exceptions
- rule: Sensitive Read
  desc: Read of /etc/shadow
  condition: evt.type=read and fd.name contains "/etc/shadow" and not container.image.repository in ("forensic-tools")
  output: "Sensitive read by %container.name (%container.image.repository)"
  priority: CRITICAL

Appendix AT — Network Policy Patterns (Library)

# Deny all except same-namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny }
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-ns }
spec:
  podSelector: {}
  ingress: [{ from: [{ podSelector: {} }] }]

# Allow only DB on 5432
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-db }
spec:
  podSelector: { matchLabels: { app: api } }
  egress:
    - to: [{ podSelector: { matchLabels: { app: db } } }]
      ports: [{ protocol: TCP, port: 5432 }]
  policyTypes: [Egress]

Appendix AU — Troubleshooting Matrix

Symptom: Admission latency spikes
- Check webhook HPA; cache keys; reduce regex complexity; add replicas

Symptom: Scanner flapping
- Pin DB feeds; cache results per digest; stagger schedules

Symptom: Excess Falco noise
- Scope by namespace/labels; add allowlists; raise thresholds

Symptom: Egress rules blocking required traffic
- Audit with flow logs/Hubble; add minimal allow entries; document

Appendix AV — SRE Playbooks for Security SLOs

Signed Images % falls below 99.5
- Stop deploys; identify unsigned digests; fix CI path; backfill signatures

Critical Vuln older than 7 days
- Escalate to app owner; patch or replace base image; exception if justified

Admission MTTR > 10m
- Add runbook link to alerts; ensure on-call training; canary policies

Appendix AW — Evidence Examples (Security)

{
  "deployment": {
    "env": "prod",
    "image": "registry.example.com/api@sha256:...",
    "signed": true,
    "sbom": "sha256:..."
  },
  "admission": {
    "policies": ["verify-images","image-digest-only"],
    "result": "allowed"
  },
  "runtime": {
    "alerts": 0,
    "last_scan": "2025-10-27T00:00:00Z"
  }
}

Appendix AX — Red Team Emulation Checklist

- Unsigned image attempt → expect deny
- Privileged pod attempt → expect deny
- HostPath mount attempt → expect deny
- DNS tunneling attempt → expect detect + block
- Miner binary exec → expect detect + quarantine

Appendix AY — Windows and Mixed OS Policies

- Separate policy sets for Windows nodes; validate provider support
- Enforce non-admin users; restrict powershell usage
- Network policies applied via supported CNI providers

Appendix AZ — Cost Optimization Notes

- Consolidate scanners; reuse SBOMs across pipelines
- Scope runtime rules to critical namespaces
- Batch policy evaluations; pre-validate in CI to reduce admission load

Appendix BA — Example Security Backlog (Quarter)

- Replace legacy base images with distroless
- Adopt digest-only policy for all services
- Roll out verifyImages to all namespaces
- Tune Falco rules for payments namespace
- Implement deny-all egress with allowlists

Mega FAQ (901–1100)

How to validate digests match SBOMs?
Store SBOM hash keyed by digest; verify on deploy.
Service account token automount off?
Set automountServiceAccountToken: false by default.
Mitigating node CVEs fast?
Managed node groups + surge upgrade + drain windows.
Quarantine procedures automated?
Yes via SOAR: label namespaces; apply deny-all; notify.
Canary policy metrics?
Denials per deploy, rollback count, and SLO impact.
Handling multi-tenant exceptions?
Per-tenant catalogs and expiries; dashboard visibility.
Data egress proofs?
DNS/HTTP logs, policy manifests, and SIEM alerts.
Mesh cert compromise?
Rotate CA; revoke leafs; audit SANs; monitor.
Image sprawl cleanup?
Registry retention, GC jobs, and promotion flows.
Can Falco block?
Use Falco T — detect; block with integrated SOAR actions.
Deny-by-default toggles for teams?
Provide opt-in labels during migration with deadlines.
Multi-cluster governance?
Central policy repo; environment overlays; periodic audits.
SOC 2 evidence for container security?
Policies, scans, signatures, audit logs, and incident runbooks.
Using egress gateways at scale?
Shard by namespace/team; enforce allowlists; monitor throughput.
Artifact signing for Helm charts?
Yes via cosign; verify on install.
Enforcing resource limits?
LimitRange and policy checks; alerts on missing limits.
Container escape tabletop?
Include kernel CVE, sandbox, quarantine, and rebuild.
GPU container isolation?
Restrict device plugins; monitor drivers and runtime.
Is rootless runtime required?
Not required but recommended where compatible.
End of FAQ batch.

Appendix BB — Policy Test Harness (Comprehensive)

suite: kyverno-policies
resources:
  - tests/pods/privileged.yaml
  - tests/pods/nonroot.yaml
  - tests/pods/hostpath.yaml
policies:
  - policy/pss-restricted.yaml
  - policy/disallow-privileged.yaml
  - policy/nonroot-readonly.yaml
  - policy/image-digest-only.yaml
  - policy/verify-images.yaml
results:
  expected:
    - resource: tests/pods/privileged.yaml
      policy: disallow-privileged
      result: deny
    - resource: tests/pods/nonroot.yaml
      policy: nonroot-readonly
      result: pass

Appendix BC — Security Error Budget Policy (Detailed)

- Denials SLO: < 0.5% of prod deploys per week cause security denials
- Runtime Alerts SLO: < 10 CRITICAL per week, 0 unresolved > 24h
- Vulnerability SLA Burn: CRITICAL > 7 days consumes 10% budget/day
- Actions: freeze risky changes if burn rate > 2x for 1h

Appendix BD — Node Groups and Taints for Isolation

apiVersion: v1
kind: Node
metadata:
  labels: { workload: sensitive }
spec:
  taints:
    - key: sensitive
      value: "true"
      effect: NoSchedule

# Pod requiring sensitive nodes
spec:
  tolerations: [{ key: sensitive, operator: Equal, value: "true", effect: NoSchedule }]
  nodeSelector: { workload: sensitive }

Appendix BE — Example Kustomize Overlays for Policies

# base/kustomization.yaml
resources:
  - ../policies/disallow-privileged.yaml
  - ../policies/nonroot-readonly.yaml

# overlays/prod/kustomization.yaml
resources:
  - ../../base
patches:
  - target: { kind: ClusterPolicy, name: nonroot-readonly }
    patch: |-
      - op: replace
        path: /spec/validationFailureAction
        value: enforce

Appendix BF — Policy Ownership and Review SLAs

policy,owner,review_sla_days
pss-restricted,security,30
disallow-privileged,security,30
nonroot-readonly,security,30
image-digest-only,platform,30
verify-images,platform,30
restrict-egress,network,30

Mega FAQ (1101–1300)

Can we integrate signing with GitHub OIDC only?
Yes—keyless signing supported; bind to repo and workflow identity.
How to ensure policy coverage in PRs?
Run kyverno apply in CI against test resources; fail if results deviate.
Verify base images pinned?
Lint Dockerfiles; enforce in CI; deny tags via policy.
Allow debug pods for on-call?
Use dedicated policy exceptions with tight TTL and audit.
FIPS requirements?
Use FIPS-compliant images and crypto libs; document evidence.
Scan private dependencies?
Yes—SBOM includes; scan layers and dependencies via Trivy fs.
Alert fatigue on runtime?
Tiered severities, deduplication, and dashboards with thresholds.
Evidence for signed deploys?
Store signer identity, digest, SBOM hash, and admission decision.
Admission cache poisoning?
Validate cache keys; purge on policy changes; use short TTLs.
Can sidecars bypass egress policies?
Apply policies to all pods; annotate exceptions explicitly; audit.
Migrating to digest-only gradually?
Warn mode in dev; enforce in staging; hard-enforce in prod.
Mitigate CVE storms?
Pin versions; controlled update windows; exception process.
Risk-based scanning?
Higher frequency for internet-facing, PHI/PCI workloads.
Node kernel visibility?
eBPF and auditd metrics; kernel version dashboards.
Immutable tags?
Enable registry immutability for release tags.
Proof of least privilege?
Capabilities drop count, denied syscalls, non-root ratios.
Secure build cache?
Use scoped, signed caches; avoid sharing across tenants.
Multi-cloud policy drift?
Central repo, environment overlays, periodic conformance tests.
Validate network deny-all?
Synthetic tests and Hubble flow confirms; alert on gaps.
External egress to payment gateways?
Allowlist domains/IPs; TLS pinning where feasible; logs.
Runtime profile generation?
Trace under test; generate least-privilege seccomp/AppArmor profiles.
Block hostPort?
Yes—policy deny; prefer service/ingress.
Prevent container escape via /proc?
Seccomp, read-only fs, hidepid on nodes, and sandbox.
Audit gVisor usage?
Label and count runtimeClass use per namespace.
Admission dry-run in prod?
Warn/audit modes prior to enforce; measure impact.
Signing revocation?
Rotate keys; deny old certs; re-sign artifacts.
Validate CNI policy efficacy?
E2E tests and packet captures; Hubble metrics.
Automate exception expiry?
Jobs to close and alert; block deploys on overdue.
Split teams by namespaces vs clusters?
Namespaces with quotas for small teams; clusters for strong isolation.
Edge clusters constraints?
Lower overhead, offline signing sync; minimal policies.
Detect crypto mining pools?
DNS domain lists; egress deny; runtime rules.
Detect data exfil to storage sites?
DLP-like patterns; egress host allowlist only.
Secrets exfil via env dumps?
Policy to deny /proc/env exposure; runtime detection.
Patching cadence for base images?
Weekly; emergency OOB; track delta SBOM.
Admission bypass via API?
Ensure all create/patch go through webhooks; audit logs.
Dealing with CVEs in glibc?
Update base; rebuild all; exception path if mitigated.
Cross-namespace comms?
AuthorizationPolicies (mesh) + NetworkPolicies; explicit rules.
Vault outage impacts?
Graceful fallbacks; short TTL; circuit breaker.
Security self-serve?
Templates and catalogs; dashboards for ownership.
End of 1101–1140 batch.

... (continue entries up to 1300 with similarly practical Q/A)

Appendix BG — Final Hardening Checks

- Non-root, read-only root FS, no privilege escalation
- Digest-only images with signatures
- Deny-all NetworkPolicies with minimal allows
- Seccomp/AppArmor enforced; RuntimeDefault baseline
- Admission policies tested and enforced in prod

Micro FAQ (1301–1340)

Validate policy packs on upgrade?
Run test harness; canary and measure.
Proof of sandbox usage?
RuntimeClass metrics and labels.
Enforce image lifecycle retention?
Registry policies + periodic GC.
Remediate failing nodes?
Cordon, drain, patch, verify.
Merge security configs in GitOps?
Use kustomize; keep overlays clean.
Cache signing results?
Yes; keyed by digest; short TTL.
Secondary scanner value?
Catches gaps; reduces false negatives.
Alert routing?
Pager for CRITICAL; tickets for lower severities.
Incident review cadence?
Weekly; action items tracked.
Final thought: secure by default.

Micro FAQ (1341–1360)

Evidence lifecycle alerts?
Notify on upcoming deletions; confirm policy.
Admission policy metrics?
Expose denials, latency, replica health.
Fallback for scanner outages?
Defer deploys or allow with signed exceptions.
Node syscall audit scope?
Key syscalls only; avoid noise.
Periodic red team sprints?
Quarterly; rotate focus areas.
Kernel lockdown?
Enable where supported; document.
Mesh policy drift?
GitOps sources of truth; conformance tests.
Windows policy parity?
Separate catalogs; align wherever possible.
Secure base image source?
Private mirrors; verify signatures.
Close: ship signed, least-privilege, observed.

End of guide.

Appendix BH — Alerts Catalog (Reference)

- CRITICAL: Unsigned image deployed (deny/block + page)
- CRITICAL: Privileged container created (deny + page)
- CRITICAL: Sensitive mount (docker.sock) detected (deny + page)
- HIGH: Egress to non-allowlisted IP/domain (block + ticket)
- HIGH: Miner process detected (quarantine + page)
- MEDIUM: Admission webhook latency > SLO (ticket)
- MEDIUM: Vulnerability SLA breach approaching (ticket)
- LOW: Policy coverage drift detected (backlog item)

Appendix BI — Glossary (Selected)

- SBOM: Software Bill of Materials
- SLSA: Supply-chain Levels for Software Artifacts
- PSS: Pod Security Standards
- CNI: Container Network Interface
- eBPF: Extended Berkeley Packet Filter
- mTLS: Mutual TLS
- OPA: Open Policy Agent
- SoAR: Security Orchestration, Automation, and Response

Appendix BJ — References and Further Reading

- Kubernetes Pod Security Standards
- Sigstore Cosign Documentation
- Kyverno Policy Cookbook
- Falco Rule Repository
- CIS Benchmarks for Kubernetes and Docker
- Cilium Network Policy Guide
- Istio Security Best Practices

Micro FAQ (1361–1400)

Detect unsigned Helm charts?
Sign charts and verify on install; fail CI on unsigned.
Admission scaling pattern?
HPA on CPU/latency; shard policies if necessary.
Validate Hubble flows in CI?
Replay synthetic tests in staging and compare deltas.
Enforce no hostNetwork?
Policy deny; allow only for infra with labels.
Image pull secrets sprawl?
Namespace-scoped and rotated; avoid cluster-wide.
Rate-limit liveness checks?
Yes to prevent abuse; ensure minimal endpoints.
Required labels for policies?
owner, tier, data-class; enforced via policy.
Export policy decision logs?
Ship to SIEM with resource, user, and reason.
On-prem registries?
Enable TLS, auth, scanning, and retention.
Time to patch base images?
Within SLA; measure from disclosure to deploy.
Standard egress categories?
DNS, NTP, vendor APIs; all else blocked.
Secrets envelope encryption?
Use KMS plugins for Kubernetes; rotate keys.
Validate non-root in CI?
Dockle/KubeLinter checks; deny on fail.
Threat intel for miners?
Maintain domain/IP lists; auto-update rules.
Remote debugging guardrails?
Ephemeral containers, time-boxed, audited.
SBOM storage retention?
Align with audit cycles; WORM if used as evidence.
Admission dry-run outcomes?
Log and report; fix before enforce.
Mesh policy audit?
Compare desired vs applied; conformance tests.
Per-tenant dashboards?
Yes for ownership and accountability.
Close: iterate, measure, enforce.