Caching Strategies: Redis, Memcached, CDN Patterns (2025)

Oct 26, 2025•

cachingredismemcachedcdn

•

Caching is the highest ROI performance tool when applied carefully. This guide provides concrete patterns and pitfalls.

Executive summary

Choose consistent keys; sensible TTLs; prevent stampedes; partition hot keys
Select strategy per use case: write-through/around/back; read-through; negative caching

Stampede prevention

Request coalescing; jittered TTL; soft TTL + background refresh; locks

Invalidation

Event-driven invalidation; tag-based; versioned keys; fallbacks

Observability

Hit ratio, latency, key size distributions, eviction rates; sample payloads safely

CDN

Cache-control headers; immutable assets; signed URLs; edge functions

FAQ

Q: Is Redis better than Memcached?
A: Redis offers richer data types and persistence; Memcached is a simple, fast in-memory cache; choose by feature needs and ops.

Streaming (Kafka/Flink): /blog/real-time-data-streaming-kafka-flink-architecture-2025
Event-Driven Architecture: /blog/event-driven-architecture-patterns-async-messaging
ClickHouse Performance: /blog/clickhouse-analytics-database-performance-guide-2025
Sharding Strategies: /blog/database-sharding-partitioning-strategies-scale-2025
Orchestration: /blog/data-pipeline-orchestration-airflow-prefect-dagster

Call to action

Want a caching audit and performance plan? Request a review.
Contact: /contact • Newsletter: /newsletter

Executive Summary

This guide provides a production-focused blueprint for caching in 2025: Redis/Memcached/CDN/edge, cache patterns (cache-aside, write-through, write-back, refresh-ahead), invalidation strategies, stampede prevention, metrics/observability, HA/DR, and cost modeling.

Caching Fundamentals

Patterns

- Cache-Aside: app reads from cache; on miss, load from source and populate
- Write-Through: writes go to cache and source synchronously
- Write-Back (Write-Behind): write to cache, flush to source asynchronously
- Refresh-Ahead: refresh items before TTL expires for hot keys

TTL and Eviction

- TTL per key or per namespace
- Evictions: LRU/LFU/Random; size-based limits
- Soft TTL (serve stale) vs Hard TTL (strict expiry)

Keys, Namespacing, Versioning

function key(ns: string, id: string, ver = 'v1'){ return `${ns}:${ver}:${id}` }

- Namespaces per tenant/service/route
- Version bump to invalidate entire namespace on deploy

Consistency and Invalidation Patterns

- Source of truth: database/object store
- Invalidate on write; publish invalidation events
- Patterns: time-based TTL, explicit delete, version key, write-through update

async function setWithVersion(k: string, v: string, ttl: number){
  const pipe = redis.multi(); pipe.set(k, v, { EX: ttl }); pipe.incr('cache:version'); await pipe.exec();
}

Stampede/Dogpile Prevention

// single-flight: only one loader per key
const inflight = new Map<string, Promise<any>>();
export async function cached<T>(k: string, loader: () => Promise<T>, ttl = 300){
  const cv = await redis.get(k); if (cv) return JSON.parse(cv) as T;
  if (inflight.has(k)) return inflight.get(k)! as Promise<T>;
  const p = loader().then(async v => { await redis.set(k, JSON.stringify(v), { EX: ttl }); inflight.delete(k); return v; })
                    .catch(e => { inflight.delete(k); throw e; });
  inflight.set(k, p); return p;
}

// mutex lock
async function withLock(lockKey: string, fn: () => Promise<void>, ttlMs = 2000){
  const ok = await redis.set(lockKey, '1', { NX: true, PX: ttlMs });
  if (!ok) return; try { await fn(); } finally { await redis.del(lockKey); }
}

Hot Key Mitigation

- Shard key with suffixes; client-side consistent hashing
- Use local in-process cache for ultrahot items
- Cap TTL and use probabilistic early refresh

Negative and Partial Caching

- Negative caching: cache not-found (404) for short TTL to prevent backend hits
- Partial responses: cache fragments (e.g., GraphQL fields) with keys

Stale-While-Revalidate (SWR)

export async function swr<T>(k: string, loader: () => Promise<T>, ttl = 60, swrTtl = 300){
  const v = await redis.get(k); if (v) {
    const meta = await redis.ttl(k);
    if (meta < swrTtl) loader().then(nv => redis.set(k, JSON.stringify(nv), { EX: ttl })).catch(()=>{})
    return JSON.parse(v) as T;
  }
  const nv = await loader(); await redis.set(k, JSON.stringify(nv), { EX: ttl }); return nv;
}

Rate Limiting Tokens (Caches)

import { RateLimiterRedis } from 'rate-limiter-flexible'
const rl = new RateLimiterRedis({ storeClient: redis, keyPrefix: 'rl', points: 60, duration: 60 })

Probabilistic Data Structures

// Bloom filter for existence checks (use RedisBloom module in prod)

// HyperLogLog for approximate unique counts
await redis.pfadd('hll:users', userId)
const approx = await redis.pfcount('hll:users')

Redis: Topologies and Persistence

- Standalone for dev, Sentinel for HA failover, Cluster for sharding
- Persistence: AOF (append-only) vs RDB snapshots; combine for durability
- TLS and ACLs; network policies; isolate from public internet

# redis.conf snippets
aof-use-rdb-preamble yes
maxmemory 4gb
maxmemory-policy allkeys-lfu
tls-port 6379

# Sentinel
sentinel monitor mymaster 10.0.1.10 6379 2

Redis Pub/Sub and Streams

// Pub/Sub invalidation
await redis.publish('cache:invalidate', key)

// Streams for events
await redis.xadd('events', '*', 'type', 'order_created', 'order_id', orderId)

Memcached Basics

- LRU eviction; no persistence; simple strings; multi-get support
- Use for ephemeral app-level caching where durability not required

import memjs from 'memjs'
const mc = memjs.Client.create()
await mc.set(key, Buffer.from(JSON.stringify(v)), { expires: 300 })

CDN Caching (CloudFront/Cloudflare/Fastly)

Cache-Control: public, max-age=600, s-maxage=1200, stale-while-revalidate=300
ETag: "abc123"
Vary: Accept-Encoding, Accept-Language

- Signed URLs/cookies to protect private content
- Invalidate on deploy; use versioned asset names

# CloudFront invalidation
aws cloudfront create-invalidation --distribution-id D123 --paths "/app/*"

Reverse Proxies: NGINX/Varnish

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=STATIC:100m inactive=60m use_temp_path=off;
server {
  location / {
    proxy_cache STATIC;
    proxy_cache_key "$scheme$request_method$host$request_uri";
    proxy_cache_valid 200 302 10m;
    proxy_cache_valid 404 1m;
    add_header X-Cache-Status $upstream_cache_status;
    proxy_pass http://app;
  }
}

sub vcl_backend_response {
  set beresp.ttl = 10m;
  if (beresp.status == 404) { set beresp.ttl = 60s; }
}

Edge Caching and API Caching

- Use Cloudflare Workers/CloudFront Functions for header manipulation and SWR
- Cache API GET responses with short TTL, vary by auth/tenant

// Cache API responses by tenant
function apiKey(tenant: string, path: string){ return `api:${tenant}:${path}` }

GraphQL Caching

- Persisted queries; cache query+variables signature
- Per-field caching and dataloaders to batch backend calls

const cacheKey = `gql:${hash(query+JSON.stringify(variables))}`

Database and Materialized Views

CREATE MATERIALIZED VIEW mv_orders_1h AS
SELECT date_trunc('hour', created_at) AS h, SUM(amount) AS revenue
FROM orders WHERE created_at >= now() - interval '7 days'
GROUP BY 1;

- Refresh policies aligned to freshness SLAs; invalidate dependent caches

Application Caches

// In-process LRU cache for small hot set
import LRU from 'lru-cache'
const lru = new LRU<string, any>({ max: 5000, ttl: 1000 * 60 })

Multi-Tenant Namespacing and Security

function tenantKey(tenant: string, k: string){ return `${tenant}:${k}` }

- TLS to cache servers; ACL roles; network isolation; avoid PII storage

Observability: Metrics and Dashboards

import client from 'prom-client'
const hits = new client.Counter({ name: 'cache_hits_total', help: 'hits', labelNames: ['cache'] })
const misses = new client.Counter({ name: 'cache_misses_total', help: 'misses', labelNames: ['cache'] })
const backends = new client.Counter({ name: 'backend_requests_total', help: 'backend' })

sum(rate(cache_hits_total[5m])) / (sum(rate(cache_hits_total[5m])) + sum(rate(cache_misses_total[5m])))

{
  "title": "Cache Ops",
  "panels": [
    {"type":"stat","title":"Hit Ratio","targets":[{"expr":"sum(rate(cache_hits_total[5m]))/(sum(rate(cache_hits_total[5m]))+sum(rate(cache_misses_total[5m])))"}]},
    {"type":"timeseries","title":"Backend Offload","targets":[{"expr":"sum(rate(backend_requests_total[5m]))"}]}
  ]
}

OTEL Traces for Cache Layers

span.addEvent('cache.get', { key: k })
span.addEvent('cache.miss', { key: k })
span.addEvent('backend.fetch', { ms: 45 })

HA/DR and Autoscaling

- Redis Sentinel or Managed (Elasticache/Memorystore/Azure Cache)
- Redis Cluster for sharding; reshard on growth
- Autoscale based on memory usage, CPU, and latency
- Set proper maxmemory and eviction policy

Cost Modeling

provider,tier,memory_gb,usd_month
elasticache,cache.r6g.large,12.3,110
memorystore,standard-1,12.0,120
azure,standard_c3,12.0,115

- Improve hit ratio to offload backends; tune TTL and keys

Runbooks and SOPs

Stampede Event
- Identify hot key; enable single-flight; prewarm cache; increase TTL; add SWR

Hit Ratio Drop
- Review keys; increase TTL; reduce fragmentation; cache partials; add negative caching

Latency Spike
- Inspect network and CPU; adjust maxmemory-policy; scale up/out; reduce serialization overhead

JSON-LD

Call to Action

Need help optimizing caching? We design, benchmark, and operate caches at scale with robust observability and cost controls.

Extended FAQ (1–150)

Cache size?
Size to hold hot working set; monitor hit ratio.
TTL length?
Balance freshness vs offload; start with minutes for dynamic content.
LRU vs LFU?
LFU for frequency-heavy workloads; LRU simpler.
When to use negative caching?
For expensive lookups that frequently miss.
SWR for APIs?
Yes—serve stale while background refresh.
Stampede edges?
Jitter TTL; request coalescing; locks.
Hot key?
Shard and prewarm; local LRU.
Redis persistence?
AOF + RDB for durability; test failover.
Sentinel vs Managed?
Managed for ops; Sentinel for DIY.
CDN ETag or versioned assets?
Versioned assets preferred; ETags for validation.

... (add 140+ practical Q/A on Redis/Memcached/CDN/edge, invalidation, keys, metrics, HA, cost)

Appendix A — Reference Implementations by Language

A.1 Node.js (Express/Fastify)

import express from 'express'
import { createClient } from 'redis'
import LRU from 'lru-cache'

const app = express()
const redis = createClient({ url: process.env.REDIS_URL, socket: { tls: true } })
await redis.connect()

const localCache = new LRU<string, any>({ max: 5000, ttl: 60_000 })

function k(ns: string, id: string, ver = 'v1'){ return `${ns}:${ver}:${id}` }

async function getCached<T>(key: string, loader: () => Promise<T>, ttl = 120){
  const lc = localCache.get(key); if (lc) return lc as T
  const rv = await redis.get(key); if (rv) { const v = JSON.parse(rv) as T; localCache.set(key, v); return v }
  const value = await loader();
  await redis.set(key, JSON.stringify(value), { EX: ttl });
  localCache.set(key, value)
  return value
}

app.get('/products/:id', async (req, res) => {
  const id = req.params.id
  const key = k('product', id)
  const data = await getCached(key, async () => fetchProductFromDB(id))
  res.json(data)
})

app.post('/products/:id', async (req, res) => {
  const id = req.params.id
  const body = req.body
  await updateProductInDB(id, body)
  await Promise.all([
    redis.del(k('product', id)),
    redis.publish('cache:invalidate', k('product', id))
  ])
  res.sendStatus(204)
})

// SWR helper with probabilistic early refresh
function shouldRefresh(ttlRemaining: number){
  const p = Math.exp(-ttlRemaining / 30) // refresh more likely near expiry
  return Math.random() < p
}

A.2 Python (FastAPI/Django)

import os, json
import aioredis
from fastapi import FastAPI

app = FastAPI()
redis = await aioredis.from_url(os.getenv('REDIS_URL'), encoding='utf-8', decode_responses=True, ssl=True)

async def cache_get_or_set(key: str, loader, ttl: int = 120):
    v = await redis.get(key)
    if v: return json.loads(v)
    data = await loader()
    await redis.set(key, json.dumps(data), ex=ttl)
    return data

@app.get('/users/{uid}')
async def user(uid: str):
    key = f'user:v1:{uid}'
    return await cache_get_or_set(key, lambda: load_user(uid))

A.3 Go (Gin/Fiber)

var rdb = redis.NewClient(&redis.Options{Addr: os.Getenv("REDIS_ADDR"), TLSConfig: &tls.Config{InsecureSkipVerify: false}})

func CacheGetOrSet(ctx context.Context, key string, ttl time.Duration, loader func() (any, error)) (any, error) {
  if val, err := rdb.Get(ctx, key).Result(); err == nil {
    var v any; json.Unmarshal([]byte(val), &v); return v, nil
  }
  v, err := loader(); if err != nil { return nil, err }
  b, _ := json.Marshal(v); rdb.Set(ctx, key, string(b), ttl)
  return v, nil
}

A.4 Java (Spring Boot)

@EnableCaching
@SpringBootApplication
public class App {}

@Service
public class ProductService {
  @Cacheable(value = "product", key = "#id", cacheManager = "redisCacheManager")
  public Product getProduct(String id) { return repo.load(id); }
}

A.5 .NET (ASP.NET Core)

builder.Services.AddStackExchangeRedisCache(options => { options.Configuration = redisConn; });

public class CachedService {
  private readonly IDistributedCache _cache;
  public CachedService(IDistributedCache cache){ _cache = cache; }
  public async Task<T> GetOrSet<T>(string key, Func<Task<T>> loader, TimeSpan ttl){
    var v = await _cache.GetStringAsync(key);
    if (v != null) return JsonSerializer.Deserialize<T>(v)!;
    var data = await loader();
    await _cache.SetStringAsync(key, JsonSerializer.Serialize(data), new DistributedCacheEntryOptions{ AbsoluteExpirationRelativeToNow = ttl });
    return data;
  }
}

A.6 Rust (Actix)

let client = redis::Client::open(redis_url).unwrap();
let mut con = client.get_connection().unwrap();
redis::cmd("SET").arg(&key).arg(&payload).arg("EX").arg(ttl).execute(&mut con);

Appendix B — Redis Configuration Cookbook

# Memory and eviction
maxmemory 12gb
maxmemory-policy allkeys-lfu

# Persistence
appendonly yes
appendfsync everysec
save 900 1 300 10 60 10000

# Security
requirepass ${REDIS_PASSWORD}
aclfile /etc/redis/users.acl
protected-mode yes

# TLS
tls-port 6379
tls-cert-file /etc/redis/tls/tls.crt
tls-key-file /etc/redis/tls/tls.key
tls-ca-cert-file /etc/redis/tls/ca.crt

# Cluster
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000

Appendix C — Cloudflare Workers/Edge Examples

export default {
  async fetch(req, env, ctx) {
    const url = new URL(req.url)
    const key = `edge:v1:${url.pathname}`
    let v = await env.CACHE_KV.get(key)
    if (v) return new Response(v, { headers: { 'Cache-Control': 'public, max-age=60, stale-while-revalidate=300' }})
    const origin = await fetch(`https://origin.example.com${url.pathname}`)
    v = await origin.text()
    ctx.waitUntil(env.CACHE_KV.put(key, v, { expirationTtl: 60 }))
    return new Response(v, origin)
  }
}

Appendix D — Benchmarks and Load Testing

k6 run load.js

import http from 'k6/http'
import { sleep, check } from 'k6'
export const options = { vus: 200, duration: '5m' }
export default function(){
  const r = http.get('https://api.example.com/products/42')
  check(r, { 'status 200': (res) => res.status === 200 })
  sleep(1)
}

Measure:
- P50/P95 latency
- Backend offload (origin QPS vs edge QPS)
- Cache hit ratio (layer-specific)
- CPU/mem for caches

Appendix E — Advanced Patterns

Soft TTL + Background Refresh

type Entry<T> = { data: T, hardExp: number, softExp: number }

Jittered Expiration

const jitter = (ttl: number) => Math.floor(ttl * (0.9 + Math.random()*0.2))

Request Coalescing with Abort

const inFlight = new Map<string, { p: Promise<any>, c: AbortController }>()

Appendix F — Security and Compliance

- Do not store PII in caches; if unavoidable, encrypt at rest and in transit.
- Rotate credentials; short-lived tokens; scoped ACLs.
- Audit logs of cache administrative commands.

Appendix G — Observability Playbook

# Layered hit ratio
sum by(layer) (rate(cache_hits_total[5m])) / (sum by(layer) (rate(cache_hits_total[5m])) + sum by(layer) (rate(cache_misses_total[5m])))

# OTEL semantic attributes
cache.system: redis|memcached|cdn
cache.op: get|set|del
cache.hit: true|false

Appendix H — HA/DR Scenarios

- Primary node failure: sentinel/managed failover within 3–10s.
- Region outage: active-active with global traffic manager; eventual consistency for caches.
- Cold start: prewarm hot keys via job.

Appendix I — Cost Optimization

- Prefer LFU to retain long-tail hotset.
- Compress large JSON blobs or store fields separately.
- Use short TTL for low-repeat endpoints.
- Offload at CDN/edge when possible.

Appendix J — API Gateway and GraphQL

// Apollo Server persisted queries + cache

// REST: vary by auth scope/tenant

Appendix K — Database Integration

-- PostgreSQL: refresh MV and bump version key
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_orders_1h;

Appendix L — NGINX/Varnish Advanced

map $http_authorization $skip_cache { default 1; "" 0; }

if (req.http.Authorization) { return (pass); }

Appendix M — Runbooks (Detailed)

Cache Stampede Mitigation
- Enable single-flight per key
- Roll out SWR for hot namespaces
- Increase TTL with jitter
- Warm key via background job
Hit Ratio Regression
- Compare key cardinality before/after deploy
- Inspect top-miss endpoints
- Add negative caching for common 404s
- Evaluate per-field caching in GraphQL
Latency Regression
- Inspect network path to cache
- Raise connection pool and pipeline
- Reduce payload size; compress JSON

Extended FAQ (continued)

How to choose TTLs per route?
Model by freshness SLA and consumption patterns; start small, measure.
Should I cache POST responses?
Rare; only if idempotent and safe; better use GET.
Can I cache authenticated content at CDN?
Yes with Signed Cookies/Headers and cache key scoped to user/session when safe.
How to avoid stale writes with write-behind?
Enforce durable queue, retries, and idempotency keys; monitor lag.
How big can values be in Redis?
Keep under 512KB ideally; split large docs into fragments.
JSON vs MessagePack?
MessagePack smaller/faster; ensure language support.
How to cache GraphQL?
Persisted queries; per-field cache; dataloaders; federation layer cache.
Which Redis eviction policy?
LFU for recency+frequency; validate with production traces.
Should I cluster or scale up?
Cluster at >75% memory or CPU saturation; also for multi-tenant isolation.
TLS overhead?
Negligible within VPC; use keep-alive and pooling.
Prevent thundering herd at edge?
Use SWR, serve-stale, and coalescing at edge worker.
Normalize keys?
Yes; sort query params; lowercase host; consistent delimiters.
Version invalidation?
Bump namespace version on deploy to wipe entire space atomically.
Can I rely only on TTL?
Often; combine with explicit delete for critical writes.
Rate limiting in cache?
Token bucket with Redis scripts or libraries; per-tenant keys.
Cache warming?
Precompute hot pages on deploy; schedule jobs.
What is stale-if-error?
Serve stale content if backend 5xx; mark and alert.
What is dogpile protection?
Mechanisms that stop multiple requests from regenerating the same item.
Negative cache TTL?
Short: 10–60s typically.
Multi-region cache?
Region-local caches with async replication if necessary; beware consistency.
Cross-DC invalidation?
Pub/Sub via global bus; CRDT-style counters for idempotency.
CDN and APIs?
Cache GET with conservative TTL; vary by headers.
How to log cache events?
Emit structured logs with key namespace and hit/miss outcome.
Cache poisoning risks?
Strict key normalization and validation; signed keys if user-influenced.
Avoid key collisions?
Namespace + version + delimiter discipline; hash long parts.
Per-user cache?
Consider session-local caches; purge on logout or profile update.
Redis streams for invalidation?
Yes; consumers per service; ensure delivery semantics.
Async cache population?
Background job consumes a queue and populates keys.
Binary values?
Store as base64 or raw buffers; mind size.
ETag vs Last-Modified?
ETag stronger; LM easier; support both.
CDN private content?
Use Signed URLs and short TTL with origin auth.
Mobile clients and caching?
Cache-Control headers; respect offline behavior; service workers.
Service Worker caching?
Cache-first for static; network-first with fallback for dynamic.
Vary pitfalls?
Explodes cache cardinality; keep Vary minimal.
Prefetching?
Predictive prefetch for next pages when bandwidth idle.
Backpressure?
Limit concurrent refreshers; queue overflow policies.
Redis timeouts?
Tune socket/connect timeouts; circuit break on failures.
Circuit breakers with cache?
Trip if backend unhealthy; serve stale and degrade gracefully.
Observability golden signals?
Hit ratio, latency, errors, capacity, evictions, offload.
Alert thresholds?
Hit ratio drop >10% for 10m; latency > P95 SLO; evictions > N/min.

...

What is the one rule of caching?
There are two hard things in CS: cache invalidation and naming things.

Appendix N — Redis Lua Scripts (Atomic Ops)

-- Rate limit: N requests per window
-- KEYS[1] = key, ARGV[1] = windowSeconds, ARGV[2] = limit
local current = redis.call('INCR', KEYS[1])
if tonumber(current) == 1 then
  redis.call('EXPIRE', KEYS[1], ARGV[1])
end
if tonumber(current) > tonumber(ARGV[2]) then
  return {err = 'rate_limited'}
end
return current

-- Mutex with TTL
-- KEYS[1] = lockKey, ARGV[1] = token, ARGV[2] = ttl
if redis.call('SET', KEYS[1], ARGV[1], 'NX', 'PX', ARGV[2]) then
  return 'OK'
else
  return nil
end

-- Safe unlock
-- KEYS[1] = lockKey, ARGV[1] = token
if redis.call('GET', KEYS[1]) == ARGV[1] then
  return redis.call('DEL', KEYS[1])
else
  return 0
end

-- SWR gate: set a short-lived key to signal refresh-in-progress
-- KEYS[1] = swrKey, ARGV[1] = ttlSeconds
return redis.call('SET', KEYS[1], '1', 'NX', 'EX', ARGV[1])

Appendix O — Terraform Modules (Managed Redis)

# modules/elasticache/main.tf
variable "name" { type = string }
variable "node_type" { type = string }
variable "engine_version" { type = string }
variable "num_cache_nodes" { type = number }

resource "aws_elasticache_replication_group" "this" {
  replication_group_id          = var.name
  engine                        = "redis"
  engine_version                = var.engine_version
  node_type                     = var.node_type
  automatic_failover_enabled    = true
  multi_az_enabled              = true
  transit_encryption_enabled    = true
  at_rest_encryption_enabled    = true
  parameter_group_name          = "default.redis7"
  number_cache_clusters         = var.num_cache_nodes
}

output "primary_endpoint" { value = aws_elasticache_replication_group.this.primary_endpoint_address }

# modules/memorystore/main.tf
data "google_project" "this" {}

resource "google_redis_instance" "this" {
  name           = var.name
  tier           = "STANDARD_HA"
  memory_size_gb = var.memory_gb
  region         = var.region
  transit_encryption_mode = "SERVER_AUTHENTICATION"
}

# modules/azure-cache/main.tf
resource "azurerm_redis_cache" "this" {
  name                = var.name
  location            = var.location
  resource_group_name = var.rg
  capacity            = 3
  family              = "C"
  sku_name            = "Standard"
  minimum_tls_version = "1.2"
  enable_non_ssl_port = false
}

Appendix P — Helm Chart Values (Redis)

# values.yaml
architecture: replication
auth:
  enabled: true
  password: ${REDIS_PASSWORD}
master:
  persistence:
    enabled: true
    size: 20Gi
replica:
  replicaCount: 2
  persistence:
    enabled: true
    size: 20Gi
resources:
  requests:
    cpu: 500m
    memory: 2Gi
  limits:
    cpu: 2
    memory: 8Gi

Appendix Q — Kubernetes Manifests (Sentinel/Cluster)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels: { app: redis }
  template:
    metadata:
      labels: { app: redis }
    spec:
      containers:
        - name: redis
          image: redis:7
          ports: [{ containerPort: 6379 }]
          args: ["--appendonly", "yes", "--maxmemory", "4gb", "--maxmemory-policy", "allkeys-lfu"]
          resources:
            requests: { cpu: "500m", memory: "1Gi" }
            limits: { cpu: "1", memory: "4Gi" }

apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  selector: { app: redis }
  ports:
    - name: redis
      port: 6379
      targetPort: 6379

Appendix R — CI/CD Pipelines (GitHub Actions)

name: cache-pipeline
on:
  push: { branches: [ main ] }
jobs:
  test-and-benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npm ci
      - run: npm run test
      - name: Run k6
        uses: grafana/k6-action@v0.3.1
        with:
          filename: k6/load.js
      - name: Invalidate CDN
        run: aws cloudfront create-invalidation --distribution-id ${{ secrets.DIST }} --paths "/app/*"

Appendix S — CDN Configuration Samples

{
  "Behaviors": [
    {
      "PathPattern": "/api/*",
      "AllowedMethods": ["GET", "HEAD"],
      "CachedMethods": ["GET", "HEAD"],
      "ForwardedValues": {
        "QueryString": true,
        "Headers": { "Quantity": 2, "Items": ["Authorization", "Accept-Language"] }
      },
      "MinTTL": 0, "DefaultTTL": 30, "MaxTTL": 300
    }
  ]
}

# Cloudflare rules (pseudo)
rules:
  - action: cache
    conditions: [ path_matches:/images/* ]
    ttl: 86400
  - action: bypass_cache
    conditions: [ header_present:Authorization ]

Appendix T — Service Worker and Workbox

// sw.js
self.addEventListener('install', (e) => { self.skipWaiting() })
self.addEventListener('activate', (e) => { clients.claim() })

importScripts('https://storage.googleapis.com/workbox-cdn/releases/6.5.4/workbox-sw.js')
workbox.precaching.precacheAndRoute(self.__WB_MANIFEST || [])

workbox.routing.registerRoute(
  ({ request }) => request.destination === 'document',
  new workbox.strategies.StaleWhileRevalidate({ cacheName: 'pages' })
)

Appendix U — Next.js Server-Side Caching Patterns

// app/api/products/[id]/route.ts
export const revalidate = 60 // ISR for pages

export async function GET(_: Request, { params }: { params: { id: string } }){
  const id = params.id
  const key = `product:v1:${id}`
  const cached = await redis.get(key)
  if (cached) return NextResponse.json(JSON.parse(cached), { headers: { 'X-Cache': 'HIT' } })
  const data = await db.product.findUnique({ where: { id } })
  await redis.set(key, JSON.stringify(data), { EX: 120 })
  return NextResponse.json(data, { headers: { 'X-Cache': 'MISS' } })
}

Appendix V — Troubleshooting Matrix

Symptom: Low hit ratio
- Check key normalization; verify TTLs; add negative cache

Symptom: High Redis CPU
- Hot key; enable local LRU; shard key; increase network buffers

Symptom: Latency spikes
- Nagle disabled; pipeline enabled; pool size; TLS session reuse; co-location

Symptom: Excessive evictions
- Increase memory; LFU; reduce key cardinality; compress values

Appendix W — Prometheus Alerts

- alert: CacheHitRatioDrop
  expr: (sum(rate(cache_hits_total[10m])) / (sum(rate(cache_hits_total[10m])) + sum(rate(cache_misses_total[10m])))) < 0.7
  for: 15m
  labels: { severity: warning }
  annotations:
    description: "Cache hit ratio below 70% for 15m"

- alert: CacheLatencyHigh
  expr: histogram_quantile(0.95, sum(rate(cache_request_duration_seconds_bucket[5m])) by (le)) > 0.050
  for: 10m

Appendix X — Full Grafana Dashboard JSON (Excerpt)

{
  "title": "Caching Overview",
  "panels": [
    { "type": "stat", "title": "Hit Ratio", "targets": [{ "expr": "sum(rate(cache_hits_total[5m]))/(sum(rate(cache_hits_total[5m]))+sum(rate(cache_misses_total[5m])))" }] },
    { "type": "timeseries", "title": "Redis CPU", "targets": [{ "expr": "avg(rate(process_cpu_seconds_total{job='redis'}[1m]))" }] },
    { "type": "timeseries", "title": "Evictions", "targets": [{ "expr": "sum(rate(redis_evicted_keys_total[5m]))" }] }
  ]
}

Appendix Y — Security Checklists

- TLS 1.2+ for all cache connections
- ACLs with least privilege; rotate tokens
- No secrets/PII in cache; if needed, encrypt values
- Audit command usage; disable dangerous commands where possible
- Private networks; firewall rules; deny public access

Appendix Z — SLOs and Error Budgets

- Hit ratio SLO: >= 85% for static assets, >= 65% for APIs
- P95 latency SLO: < 50ms for cache GET
- Availability SLO: 99.95%
- Error budget policies: slow-roll changes when burn rate > 2x over 1h

Long-form FAQ (51–180)

What is SWR versus ISR?
SWR: serve stale while revalidate; ISR: rebuild static pages on interval in Next.js.
Can I cache WebSocket data?
Generally no; cache REST endpoints feeding WebSocket publish.
Should I cache search results?
Yes with normalized queries; short TTL; respect personalization.
Why normalize query params?
Prevents duplicate keys causing low hit ratios.
Is gzip worth it?
Yes for large text; CPU tradeoff minimal at edge.
JSON compaction?
Remove whitespace; consider binary formats.
Hash keys?
Hash long components to meet key length limits and avoid PII.
How to avoid cache poisoning?
Validate inputs; rigid key construction; whitelist Vary headers.
Use RedisJSON?
When partial updates/reads are frequent; watch memory overhead.
RedisGraph?
Not for general caching; specialized workload.
Can LFU degrade?
Yes with scans; tune decay-time; monitor.
Memcached vs Redis latency?
Comparable; Redis offers richer ops; test with your stack.
Multi-tier caches?
Edge → regional → app → in-process; ensure coherence or accept staleness.
Write amplification?
Batch writes; write-behind queue; compress payloads.
Quotas per tenant?
Namespace limits; monitor memory usage per tenant via key patterns.
Key scans safe?
Use SCAN with small count in maintenance windows; never KEYS in prod.
LUA versus transactions?
Lua for atomic multi-step logic; MULTI/EXEC for simpler sets.
Redis Cluster resharding impact?
Client needs cluster-aware driver; temporary latency spikes.
Eviction storms?
Raise memory; use LFU; stagger expirations with jitter.
CDN origin shield?
Yes to reduce origin load and cache misses.
Origin auth with CDN?
Use signed headers at edge; verify at origin.
Cache invalidation bus?
Pub/Sub topic; consumers delete local/in-process entries.
JSON-LD impact on caching?
Treat as static asset with long TTL; invalidate on content change.
ESI with Varnish?
Edge Side Includes allows partial page caching and composition.
Hybrid rendering?
Cache SSR output; hydrate client with SWR for fresh data.
Canary cache configs?
Roll out to subset of traffic; measure hit ratio and latency.
Backfill cache after incident?
Run prewarm jobs for hot routes; monitor backend load.
LZ4 vs zstd?
zstd compresses better; LZ4 is faster; choose per route.
Redis I/O threads?
Enable if CPU-bound on networking; measure.
Pipelining vs batching?
Both reduce RTT; pipelining sends without waits; batching groups ops.
Multi-get pattern?
Use MGET and fill local cache; reduce per-key calls.
Write coalescing?
Aggregate frequent small writes in buffer then set.
Cache bloat?
Remove unused namespaces; expire old versions; report cardinality.
Safe TTL increase?
Yes; decreases origin load; beware stale content.
Cache transactional data?
Usually no; must maintain strict consistency.
Eventual consistency acceptable?
For content and most read-heavy APIs, yes with guardrails.
Dedup across tenants?
If allowed, use shared cache with tenant-aware identity.
Priority-based eviction?
Store priority as score; evict low-priority first via maintenance job.
Rate limiter storage?
Prefer Redis with Lua; accurate and atomic.
Sliding window limits?
Use sorted sets or leaky bucket approximations.
Geo-replication?
Managed offerings provide; evaluate write latency impacts.
Cache consistency testing?
Replay writes; verify reads match DB; chaos tests.
Backoff on backend failures?
Exponential backoff; serve stale; trip circuit.
Is binary safe in Redis?
Yes; drivers support buffers.
Max key length?
Redis allows long keys; keep under ~1KB practical.
CDN compression?
Enable Brotli; fallback to gzip.
Cache busting best practice?
Content-hash filenames; never mutable URLs for static assets.
Dynamic image resizing cache?
Cache variants by width/quality; long TTL.
Cache headers for APIs?
Cache-Control with s-maxage and stale-while-revalidate.
SDP and cache?
Service Data Policy: ensure retention rules and PII handling.
Signed cookies versus headers?
Headers simpler for APIs; cookies for web assets.
Bot traffic?
Edge rate-limit; bypass dynamic expensive routes.
SSR and user-specific data?
Split into cached frame + client-side fetch for private data.
GRPC caching?
Typically at application level; limited CDN support.
HTTP/3 QUIC?
Improves edge perf; caching semantics unchanged.
H2 push deprecated?
Yes; prefer prefetch/preload hints.
Preconnect benefits?
Reduces connection setup time to CDN/origin.
Warm TLS sessions?
Enable session resumption and keep-alive.
Origin concurrency?
Limit to protect DB; rely on cache to queue demand.
Idempotency keys?
Essential for write-through and retryable ops.
Key rotation?
Versioned namespaces; scheduled cleanup of old versions.
Payload encryption?
Only when needed; understand CPU overhead and key management.
Hash collisions?
Use robust hashes (sha256); include namespace; low risk.
Cache line alignment?
Not applicable; optimize serialization instead.
Redis modules?
RedisBloom, RedisJSON, RedisSearch when applicable; watch memory.
Local cache coherence?
Invalidate via Pub/Sub; TTL cap; accept brief staleness.
Db caching versus app caching?
Combine: materialized views + app/edge caches.
Blue/green cache migration?
Run both; mirror writes; cut over when warm.
Canary invalidation?
Test purge strategies on subset routes.
Cache layer ownership?
Platform team owns infra; product teams own keys/policies.
Tagged invalidation?
Maintain tag→keys mapping; purge by tag.
Redis memory fragmentation?
Restart off-peak; upgrade allocator; measure.
Worker pools?
Size to CPU; avoid sync I/O; use async clients.
Backpressure to clients?
Return 429 with Retry-After if rate limited.
Cacheable errors?
Short TTL for 404/410; avoid caching 500s unless stale-if-error.
ETag weak vs strong?
Strong for exact match; weak for semantically equivalent.
Mobile bandwidth saver?
Longer TTL for large assets; respect save-data header.
Privacy mode?
Skip caching when DNT or private browsing detected (policy dependent).
Feature flags in cache keys?
Only when content differs; otherwise keep out to avoid bloat.
Structured logging fields?
cache_layer, cache_op, key_ns, hit, latency_ms, size_bytes.
Key compression?
Hash long tails; store mapping for debugging.
Cache chooser?
Policy-based: memory threshold, latency SLO, tenant priority.
Request collapse across instances?
Use Redis locks or a shared single-flight registry.
How to track top keys?
Keyspace notifications + sampling; external telemetry.
Eviction policy per namespace?
Run separate instances or databases per class.
Cache warmup duration?
Measure QPS to hot keys; complete before peak traffic.
Multi-tenant cost allocation?
Track per-tenant bytes and ops; showback/chargeback.
Do I need Ristretto/ARC?
Try LFU first; specialized algos if workload demands.
Zstandard levels?
Level 3–6 for balanced performance.
Redis latency percentile goals?
P95 < 5–10ms intra-region.
CDN stale-if-error value?
300–600s typical.
Does Vary: User-Agent make sense?
Avoid unless critical; huge cardinality.
Cache TTL in DB?
Store per-record freshness hints for app logic.
Compress HTML?
Yes; minify; ensure no layout shifts from inline CSS changes.
Image formats?
Prefer AVIF/WebP; negotiate via Accept header and Vary when necessary.
COOP/COEP and caching?
Security headers; caching behavior unchanged.
Cache debug endpoint?
Only internal; returns hit/miss metrics per route.
Per-route revalidation?
HEAD with If-None-Match for cheap validation.
Client hints?
Use Accept-CH to guide asset variants.
Serve stale on backend deploys?
Yes; avoid spikes; purge incrementally after healthy.
Time to live versus time to idle?
TTI resets on access; Redis TTL is absolute unless app-managed.
Redis SCAN schedule?
Nightly off-peak; small COUNT batches.
KV store limits?
Know provider quotas for key count, size, throughput.
CDN shielding layers?
Edge → shield → origin; reduces origin load.
Locality-aware routing?
Send users to nearest edge/region for latency.
Key leaks in logs?
Mask sensitive parts; never log values.
IP-based caching?
Avoid; unstable and privacy sensitive.
IPv6 differences?
None specific to caching.
Cache control for previews?
No-store; bypass all caches; add X-Robots-Tag: noindex.
RFC compliance?
Honor RFC 7234 semantics for shared caches.
Redis GEO redundancy?
Active-active with CRDTs not typical for caching; prefer per-region.
KV service versus Redis?
Simple KV (DynamoDB DAX/Cloudflare KV) for long-lived items; Redis for hot paths.
Split large values?
Chunk; store index; fetch partials.
CDN functions for personalization?
Edge compute injects headers, not full user data.
Preconnect versus DNS-prefetch?
Preconnect includes TLS/handshake; more impactful.
CDN cache key normalization?
Lowercase, strip tracking params, sort query params.
Replacer tests?
Simulate memory pressure; compare LRU/LFU hit ratio.
Cache layering anti-pattern?
Too many tiers without observability; debugging complexity.
Redis eviction telemetry?
redis_evicted_keys_total, used_memory, keyspace_hits.
Binary protocol benefits?
Memcached binary protocol is efficient; Redis RESP3 improved too.
Cache warming on blue/green?
Warm both; cutover when both stable.
Retry storms?
Cap retries; jitter; circuit break.
CDN purge strategies?
Prefix purge cautiously; prefer versioned assets.
PCI/GDPR concerns?
No PAN/PII in caches; DSR workflows to purge.
S3 + CloudFront OAC?
Yes; private buckets with Origin Access Control; cache long.
Headers to avoid in Vary?
Authorization, Cookie (unless necessary), User-Agent.
Max stale budget?
Define per route; e.g., 5m for blogs, 30s for prices.
Backend pressure forecast?
Use offload metrics to predict capacity needs.
Instrumentation overhead?
Sample at 1–10% if high volume.
Final advice?
Measure, iterate, and treat caching as a product with owners and SLOs.

Appendix AA — Client Configuration Guide (Advanced)

- TCP settings: keepalive, nodelay, backoff, pool size, pipeline length
- Serialization: JSON vs MessagePack vs Protobuf; field-level compression
- Retries: exponential backoff with jitter, max attempts, idempotency keys
- Timeouts: connect/read/write; set sane defaults (50–200ms)
- Circuit breakers: half-open probes; serve-stale on open

// Node Redis advanced
const client = createClient({
  url: process.env.REDIS_URL,
  socket: { reconnectStrategy: (retries) => Math.min(retries * 50, 1000), keepAlive: 5000, noDelay: true }
})

Appendix AB — Compression Strategies

- Content-aware: compress HTML/JSON; skip already-compressed (JPEG/MP4)
- Dictionary-based (zstd DICT) for repetitive JSON structures
- Per-route compression levels; monitor CPU cost

import { compress, decompress } from 'lzutf8'
const enc = (o: any) => Buffer.from(compress(JSON.stringify(o)))
const dec = (b: Buffer) => JSON.parse(decompress(b))

Appendix AC — Testing Harness

// Jest example: ensure cache hit on second call
it('caches product', async () => {
  await redis.flushall()
  const first = await getProduct('42')
  const second = await getProduct('42')
  expect(second).toEqual(first)
  const hits = await prom.getSingleValue('cache_hits_total')
  expect(hits).toBeGreaterThan(0)
})

# click latency budget gates in CI
scripts/assert-metrics.sh --metric cache_hit_ratio --gte 0.75 --window 5m

Appendix AD — Migration Playbook

- Phase 0: observe baseline (origin-only metrics)
- Phase 1: introduce read-only cache for GETs; measure offload
- Phase 2: add SWR and negative caching; watch stampedes
- Phase 3: write-through for specific hot writes
- Phase 4: edge CDN policies and versioned assets
- Phase 5: per-field GraphQL cache and dataloaders

Appendix AE — Security Auditing

- Quarterly credential rotation; verify all services redeploy
- Pen-test cache poisoning via query param injection
- Verify ACL denies CONFIG/FLUSH on app users

Appendix AF — Runbook Deep Dives

Cache node memory saturation
- Action: raise maxmemory or reduce TTL; prioritize LFU
- Validate: evictions drop, hit ratio stable

CDN purge loop detected
- Action: throttle purges; switch to versioned assets; add backoff
- Validate: origin QPS normalizes

Appendix AG — Case Studies (Summarized)

E-commerce PDP: +35% hit ratio with SWR and negative 404s; P95 -42%
News homepage: edge composition with ESI; origin offload 78%
API pricing: per-tenant keys + LFU; stabilized hot keys; CPU -30%

Appendix AH — Governance and Ownership

- Product teams own cache keys and TTL policies per domain
- Platform SRE owns infra, SLOs, and observability
- Change management: canary cache config, rollback in < 10 minutes

Appendix AI — Full HTTP Header Cookbook

Cache-Control: public, max-age=120, s-maxage=600, stale-while-revalidate=300, stale-if-error=600
Surrogate-Control: max-age=600
ETag: "W/\"a1b2c3\""
Vary: Accept-Encoding, Accept-Language

Appendix AJ — Key Normalization Rules

- Lowercase hostnames
- Sort query params; drop tracking params (utm_*, fbclid, gclid)
- Collapse duplicate slashes; ensure trailing slash policy
- Strip fragments (#...)

Appendix AK — SLA Matrix (Examples)

- Static assets: TTL 365d, SWR 30d, versioned filenames
- Blog pages: TTL 10m, SWR 60m, revalidate on publish
- Product detail: TTL 2m, SWR 10m, purge on update
- Search results: TTL 30s, SWR 2m, normalized params

Mega FAQ (181–260)

Can I cache GraphQL mutations?
Generally no; cache the derived read models instead.
How to debug cache keys quickly?
Expose a header X-Cache-Key in non-prod; log samples.
Should I use consistent hashing?
Yes for client-side sharding and hot-key distribution.
Can I mix Redis and Memcached?
Yes; Memcached for ephemeral objects; Redis for rich features.
How to prevent stale reads after user update?
Purge per-user namespace; short TTL for per-user caches.
Are distributed locks safe?
Use Redlock carefully; prefer single-flight and idempotency.
Best TTL for feature flags?
Very short (5–30s) or subscribe to change stream.
Handle clock skew?
Use server times (EX/PEX) not client; avoid absolute timestamps.
Throttle invalidations?
Batch and debounce; avoid purge storms.
Cache admin access?
Separate credentials; audit; IP-allowlist.
Data residency?
Region-local caches; avoid cross-border PII storage.
Blue/green cache versions?
Use versioned namespaces: v42 vs v43.
Can I cache 302 redirects?
Short TTL; ensure downstream behavior correct.
When to bypass cache?
Admin endpoints, preview modes, personalized pages.
Hash-busting without redeploy?
Support runtime alias map that points logical → hashed paths.
What about queues as caches?
Different purpose; use caches for random access reads.
Cache coherence with websockets?
Push invalidation events to clients when keys change.
Should I store JWTs in cache?
Prefer stateless; if needed, store blacklist/allowlist short-lived.
Handle GDPR deletion?
Index keys by user ID; purge on DSR request.
Cache images or generate on demand?
Both: generate variants on first request and cache with long TTL.
Origin 500s and caching?
Serve stale-if-error; alert and degrade gracefully.
Can edge compute write to origin cache?
Yes via authenticated API; rate-limit writes.
Track key age?
Use TTL or store timestamp alongside value.
Cache-thrashing detection?
High set rate with low hit ratio; investigate TTLs and key space.
Split read/write clusters?
Yes at high scale; writes funnel to a subset; replicate to reads.
Key tag strategy?
Maintain tag index; purge by tag changes (e.g., category).
Cache schema evolution?
Version bump; dual-read during migration; cleanup old versions.
Are bloom filters production-safe?
Yes with known false positive rate; avoid for critical auth.
HSTS impact on cache?
Unrelated; transport security policy.
Cookies and shared caches?
Cookies often disable caching; strip where possible.
Partial personalization?
ESI or client-side personalization over cached shell.
Coalesce backend retries?
Yes; centralize via cache-aware client.
Cache fragmentation across languages?
Normalize serialization; define canonical JSON ordering.
Are Redis pipelines ordered?
Yes; responses in request order.
Use RESP3?
If driver supports and benefits measured.
Monitor key sizes?
Sample and record distribution; alert on outliers.
Multi-tenant noisy neighbor?
Quota per tenant; dedicated DB or cluster per tier.
Cache warmers and rollbacks?
Rollback should also revert warmer jobs to previous version.
Alternative stores (Hazelcast, Aerospike)?
Viable; evaluate ops overhead and latency.
Invalidate on cron?
Avoid blind purges; tie to content changes.
Cache invalidation APIs?
Provide internal service with auth, audit, and rate limits.
Audit schema for caches?
Who set key, when, size, TTL; for troubleshooting.
Rolling restarts?
Stagger; preserve connection pools; monitor latency.
Timeout budget?
Distribute among DNS/TLS/Conn/Req phases.
TCP versus UNIX sockets?
In Kubernetes, TCP; on single host, UNIX sockets can reduce overhead.
Multi-get fallback?
If partial hits, fetch misses in parallel and merge.
Managing consistency with DB replicas?
Staleness windows; read-after-write consistency patterns.
Cacheable authz decisions?
Short TTL and scope to resource+user; invalidate on policy change.
ETL precompute caches?
Yes for dashboards; refresh via schedule.
Shard by tenant vs hash?
Tenant for isolation; hash for balance; hybrid possible.
Backpressure to edge?
Set 429 with Retry-After; protect origin.
Async delete failures?
Retry with DLQ; reconcile periodically.
System limits?
File descriptors, ephemeral ports; tune kernel.
NUMA impacts?
Pin threads; measure only at extreme scale.
Service meshes?
mTLS adds small latency; caches unaffected.
Lambda@Edge limits?
Consider CloudFront Functions for lightweight header logic.
CDN dedupe?
Origin shield helps; edge POPs may still re-fetch.
Payload canonicalization?
Stable JSON field ordering for better compression and diffs.
Prefetch on hover?
Yes for links; cap concurrency.
Browser cache control?
Short max-age with s-maxage longer for shared caches.
Headless CMS and caching?
Purge on publish; webhook triggers invalidation.
Rate limit buckets across regions?
Choose regional isolation or global state with tradeoffs.
API gateway cache?
Use cautiously; prefer app-level awareness.
GraphQL persisted query store?
Version and sign; purge on schema change.
CDN ACLs?
Restrict purge APIs; RBAC.
Origin auth rotation?
Rotate secrets; test edge → origin integration.
Warm service worker cache?
Precache shell; runtime cache data.
Device-specific variants?
Use Client Hints; avoid full UA Vary.
Cache high-cardinality metrics?
Aggregate server-side; cache aggregates not raw.
JSON-LD caches?
Long TTL; purge on content update.
Datadog/OTEL span attributes?
cache.hit, cache.key_ns, cache.layer.
Binary vs text protocols?
Negligible difference for most workloads.
Redis multi-tenant DB index?
Yes; map tenants to DBs; beware limits.
Upgrade Redis 6→7?
Plan with replica; test aof/rdb compatibility.
AOF rewrite pauses?
Monitor; schedule off-peak; tune auto-aof-rewrite.
S3 as blob store sidecar?
Store large blobs in S3; cache pointers in Redis.
Live migration between providers?
Dual-write + backfill; flip reads when warm.
Cache value checksums?
Store CRC32/SHA256 to detect corruption.
Per-endpoint budgets?
Define cache SLO per route and monitor adherence.
When not to cache?
Highly dynamic, security-sensitive, or strictly consistent data.