API Gateway Rate Limiting Strategies in 2026: Ocelot, YARP, and Kong

·By Elysiate·Updated Apr 3, 2026·
api gatewayrate limitingocelotyarpkongdistributed systems
·

Level: advanced · ~17 min read · Intent: informational

Audience: backend engineers, platform engineers, security engineers, .NET teams

Prerequisites

  • basic familiarity with API gateways and HTTP middleware
  • working knowledge of distributed systems and caching
  • some experience with production traffic management

Key takeaways

  • Rate limiting protects backend services, improves fairness, and reduces abuse, but the right strategy depends on traffic shape, client identity, and deployment topology.
  • Ocelot, YARP, and Kong all support rate limiting well, but they differ in built-in features, distributed options, and how much custom implementation work teams must own.
  • Distributed storage, clear headers, monitoring, and well-chosen algorithms matter more in production than simply turning on a default limit.

FAQ

Why is rate limiting important at the API gateway?
The gateway is the ideal enforcement point because it can stop abusive or excessive traffic before it reaches backend services, while also applying consistent limits across routes, clients, and environments.
Which rate limiting algorithm is best?
There is no single best algorithm. Fixed windows are simple, sliding windows are fairer, token buckets handle bursts well, and leaky buckets smooth traffic. The right choice depends on your workload and client behavior.
When do I need distributed rate limiting?
You need distributed rate limiting when the gateway runs on multiple instances or nodes and you want limits enforced consistently across the whole deployment rather than per process.
Should I rate limit by IP, API key, or user?
In practice, many systems use a combination. IP-based limits help with DDoS and anonymous traffic, API key limits work well for integrations, and user-based limits are useful for fairness and account-level policy enforcement.
What is the biggest mistake teams make with rate limiting?
A common mistake is enabling simple rate limits without thinking through client identity, burst behavior, distributed consistency, observability, and the user experience of 429 responses.
0

Rate limiting is one of the most important controls an API gateway can enforce.

It protects backend services from overload, slows down abusive clients, reduces the cost of uncontrolled traffic, and helps ensure that one noisy consumer does not degrade the experience for everyone else. In a microservices environment, that matters even more because a badly managed spike at the edge can cascade into retries, queue buildup, service instability, and full platform incidents.

That is why rate limiting belongs at the gateway.

The gateway is where traffic first enters the platform, which makes it the natural place to apply:

  • shared request quotas,
  • per-client rules,
  • burst controls,
  • and defensive throttling patterns.

But good rate limiting is not just a number in a config file.

Teams need to decide:

  • what identity to rate limit against,
  • which algorithm matches the traffic pattern,
  • how limits are enforced across multiple gateway instances,
  • what headers and retry guidance clients receive,
  • and how rate limiting should interact with authentication, billing, service tiers, and DDoS protection.

This guide explains how to design API gateway rate limiting in 2026, how Ocelot, YARP, and Kong approach the problem, which algorithms matter most, how distributed enforcement changes the design, and what production patterns separate useful controls from fragile ones.

Executive Summary

API gateway rate limiting exists to control request volume before traffic reaches backend services.

It is useful for:

  • backend protection,
  • cost control,
  • abuse mitigation,
  • fairness between clients,
  • and maintaining service quality during spikes.

The main algorithm choices are:

  • fixed window for simplicity,
  • sliding window for smoother fairness,
  • token bucket for controlled bursts,
  • and leaky bucket for more stable traffic shaping.

The practical gateway differences are:

  • Ocelot is easy for basic .NET-centric gateway limits and route-level rules.
  • YARP is strong when teams want full control and are comfortable building on .NET rate limiting middleware.
  • Kong is strongest when teams want richer built-in policies, plugin-based controls, and mature distributed enforcement options.

For most production systems:

  • use gateway rate limiting with a distributed store when multiple instances are involved,
  • decide whether identity is based on IP, user, API key, or tenant,
  • include standard limit headers,
  • and monitor both allowed and blocked traffic continuously.

The hardest part is not enabling rate limiting. It is choosing limits that are protective without breaking legitimate traffic.

Who This Is For

This guide is for:

  • backend engineers running APIs behind a gateway,
  • platform teams managing shared traffic controls,
  • security teams reviewing abuse protection,
  • and .NET teams choosing between Ocelot, YARP, and Kong implementations.

It is especially useful if you are working on:

  • multi-instance gateways,
  • API key or tier-based limits,
  • edge protection,
  • or production rate limiting policies that need to hold up under real traffic.

Why Rate Limiting Matters

A lot of teams think about rate limiting only as an anti-abuse feature.

It is broader than that.

What Rate Limiting Protects

Backend Capacity

Without limits, one aggressive or broken client can flood internal services and cause:

  • latency spikes,
  • timeouts,
  • retry storms,
  • and wasted compute.

Fairness

Rate limits prevent a small subset of consumers from taking a disproportionate share of resources.

Cost

If your platform pays for requests, compute time, storage, or downstream third-party usage, uncontrolled API volume becomes a direct financial problem.

Security

Rate limiting helps reduce:

  • brute-force attempts,
  • credential stuffing,
  • endpoint probing,
  • and some layers of DDoS pressure.

Quality of Service

Well-designed limits create more stable traffic patterns and help preserve performance for well-behaved clients.

That is why rate limiting is as much an availability feature as a security feature.

Choosing the Right Algorithm

Different traffic patterns need different algorithms.

This is one of the most important design decisions.

1. Fixed Window

Fixed window is the simplest model.

A client gets a maximum number of requests in a fixed interval such as:

  • 100 requests per minute,
  • or 1,000 requests per hour.

Strengths

  • simple to understand
  • easy to implement
  • cheap to evaluate

Weaknesses

  • can allow bursts at window boundaries
  • can feel unfair when requests cluster at the end and start of adjacent windows

Fixed window is good for:

  • simple internal APIs
  • lower-complexity enforcement
  • early-stage systems that need easy observability

2. Sliding Window

Sliding window tracks requests over a rolling period rather than a hard reset boundary.

Strengths

  • smoother fairness
  • fewer burst artifacts at window boundaries
  • better approximation of actual sustained rate

Weaknesses

  • more complex
  • more expensive to implement at scale

Sliding window is often better when fairness and burst control matter more than simplicity.

3. Token Bucket

Token bucket is one of the most practical production algorithms.

A bucket fills at a steady rate, and requests consume tokens. Bursts are allowed as long as tokens are available.

Strengths

  • supports legitimate bursts
  • smooths long-term usage
  • good fit for real API traffic, which is often uneven

Weaknesses

  • more stateful than basic fixed windows
  • can be misunderstood if the refill model is not documented clearly

Token bucket is often the best default for APIs with spiky but valid traffic.

4. Leaky Bucket

Leaky bucket smooths traffic by allowing requests into a queue and leaking them out at a fixed pace.

Strengths

  • stabilizes downstream load
  • smooths spikes aggressively

Weaknesses

  • less forgiving for bursty clients
  • can add queuing behavior that complicates user experience

Leaky bucket is often more useful for traffic shaping than for simple quota enforcement.

Ocelot Rate Limiting

Ocelot is one of the easiest ways for .NET teams to add basic route-level rate limiting at the gateway.

It is configuration-friendly and works well when the rate limiting model is relatively straightforward.

Basic Ocelot Configuration

{
  "Routes": [
    {
      "DownstreamPathTemplate": "/api/{everything}",
      "UpstreamPathTemplate": "/api/{everything}",
      "RateLimitOptions": {
        "ClientWhitelist": [],
        "EnableRateLimiting": true,
        "Period": "1m",
        "PeriodTimespan": 60,
        "Limit": 100
      }
    }
  ]
}

This is useful for:

  • simple per-route fixed-window enforcement,
  • getting a baseline control in place quickly,
  • and smaller gateway footprints.

Per-Client Rate Limiting in Ocelot

builder.Services.AddOcelot()
    .AddCacheManager(x => x.WithDictionaryHandle())
    .AddRateLimiting(options =>
    {
        options.EnableRateLimiting = true;
        options.ClientIdHeader = "X-Client-Id";
        options.HttpStatusCode = 429;
        options.QuotaExceededMessage = "API rate limit exceeded";
    });

This works well when:

  • a client identifier is already available,
  • partner integrations use a stable ID,
  • or API keys are mapped to client-level quotas.

Ocelot Trade-Offs

Ocelot is strong for basic and medium-complexity use cases, but it becomes less attractive when:

  • rate limit logic must be deeply custom,
  • distributed consistency needs more advanced control,
  • or adaptive, tiered, and highly dynamic policies become central.

Custom Ocelot Rate Limiting

If the default policy is not enough, custom logic becomes useful.

public class CustomRateLimitProcessor : IRateLimitProcessor
{
    private readonly IMemoryCache _cache;
    private readonly ILogger<CustomRateLimitProcessor> _logger;

    public async Task<RateLimitCounter> ProcessRequestAsync(
        ClientRequestIdentity identity,
        RateLimitRule rule)
    {
        var key = $"{identity.ClientId}:{rule.Endpoint}";
        var counter = await GetOrCreateCounterAsync(key, rule);

        if (counter.Count >= rule.Limit)
        {
            _logger.LogWarning(
                "Rate limit exceeded for {ClientId} on {Endpoint}",
                identity.ClientId, rule.Endpoint);
            
            throw new RateLimitExceededException(
                $"Rate limit of {rule.Limit} exceeded");
        }

        counter.Count++;
        await _cache.SetAsync(key, counter, 
            TimeSpan.FromSeconds(rule.PeriodTimespan));

        return counter;
    }
}

This allows:

  • per-client custom keys,
  • endpoint-aware quotas,
  • and more tailored logging or rejection behavior.

But as soon as rate limiting becomes code-heavy, some teams start preferring YARP because it is more naturally framework-like.

YARP Rate Limiting

YARP does not package rate limiting the same way as Ocelot.

Instead, it benefits from .NET's rate limiting middleware, which makes it flexible and powerful for teams comfortable writing gateway behavior in code.

Built-In Middleware Approach

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("api", opt =>
    {
        opt.Window = TimeSpan.FromMinutes(1);
        opt.PermitLimit = 100;
        opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
        opt.QueueLimit = 10;
    });
});

var app = builder.Build();

app.UseRateLimiter();

app.MapReverseProxy()
    .RequireRateLimiting("api");

This is a strong baseline because:

  • it integrates naturally with ASP.NET Core,
  • it keeps policy composition clear,
  • and it is easy to extend.

Per-Route YARP Policies

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("api", opt =>
    {
        opt.Window = TimeSpan.FromMinutes(1);
        opt.PermitLimit = 100;
    });

    options.AddFixedWindowLimiter("strict", opt =>
    {
        opt.Window = TimeSpan.FromMinutes(1);
        opt.PermitLimit = 10;
    });
});

app.MapReverseProxy(proxyPipeline =>
{
    proxyPipeline.Use(async (context, next) =>
    {
        var route = context.Request.Path;
        
        if (route.StartsWithSegments("/api/admin"))
        {
            context.Features.Set<IRateLimiterPolicyKey>("strict");
        }
        else
        {
            context.Features.Set<IRateLimiterPolicyKey>("api");
        }
        
        await next();
    });
});

This makes YARP especially attractive when:

  • some routes need stricter protection,
  • auth-sensitive endpoints must be throttled differently,
  • or rate limit logic needs to respond to route shape or claims.

Token Bucket in YARP

YARP becomes especially strong when you want custom algorithm control.

public class TokenBucketRateLimiter : IRateLimiter
{
    private readonly ConcurrentDictionary<string, TokenBucket> _buckets;
    private readonly int _capacity;
    private readonly int _refillRate;

    public TokenBucketRateLimiter(int capacity, int refillRate)
    {
        _buckets = new ConcurrentDictionary<string, TokenBucket>();
        _capacity = capacity;
        _refillRate = refillRate;
    }

    public ValueTask<RateLimitLease> AcquireAsync(
        HttpContext context,
        int permitCount = 1,
        CancellationToken cancellationToken = default)
    {
        var key = GetClientKey(context);
        var bucket = _buckets.GetOrAdd(key, _ => new TokenBucket(_capacity, _refillRate));

        if (bucket.TryConsume(permitCount))
        {
            return new ValueTask<RateLimitLease>(new SuccessLease());
        }

        return new ValueTask<RateLimitLease>(new FailureLease());
    }
}

This is where YARP's code-first model shines.

It lets you build:

  • custom client identity rules,
  • route-aware policies,
  • burst handling,
  • and integration with your broader ASP.NET Core telemetry and middleware stack.

YARP Trade-Offs

The price of this flexibility is that you often own more logic directly. That is excellent for strong teams, but less attractive if you want packaged policy behavior with minimal code.

Kong Rate Limiting

Kong is often the richest out-of-the-box option in this comparison.

Its plugin-based model makes rate limiting feel more like attachable policy than application code.

Basic Kong Rate Limiting

services:
  - name: api-service
    url: https://api.example.com
    routes:
      - name: api-route
        paths:
          - /api
        plugins:
          - name: rate-limiting
            config:
              minute: 100
              hour: 1000
              day: 10000
              policy: local

This is a strong fit when teams want:

  • route-level quota policies,
  • simple config-based control,
  • and less custom code inside the gateway.

Distributed Kong Rate Limiting

plugins:
  - name: rate-limiting
    config:
      minute: 100
      policy: redis
      redis_host: redis.example.com
      redis_port: 6379
      redis_password: secret
      redis_timeout: 2000
      redis_database: 1

This is one of Kong's strengths: distributed rate limiting is a natural part of the platform story rather than something teams must fully invent themselves.

Advanced Kong Options

plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: cluster
      limit_by: consumer
      hide_client_headers: false
      header_name: X-RateLimit-Limit

This makes Kong attractive when you need:

  • consumer-based limits,
  • cluster-aware enforcement,
  • richer plugin-driven policy composition,
  • and stronger enterprise control surfaces.

Kong Trade-Offs

Kong is operationally heavier than Ocelot or YARP for many .NET teams, but it becomes more compelling when:

  • the environment is larger,
  • the platform is polyglot,
  • or rate limiting is part of a broader API governance strategy.

Distributed Rate Limiting

The moment you have more than one gateway instance, local counters stop being enough.

That is where distributed storage matters.

Why Distributed Rate Limiting Matters

Without shared state, each gateway instance enforces its own limit independently.

That means a client may effectively get:

  • 100 requests per minute per node, instead of:
  • 100 requests per minute globally.

That is a serious policy gap.

Redis-Based Sliding Window Example

public class RedisRateLimiter : IRateLimiter
{
    private readonly IConnectionMultiplexer _redis;
    private readonly ILogger<RedisRateLimiter> _logger;

    public RedisRateLimiter(
        IConnectionMultiplexer redis,
        ILogger<RedisRateLimiter> logger)
    {
        _redis = redis;
        _logger = logger;
    }

    public async Task<RateLimitResult> CheckRateLimitAsync(
        string key,
        int limit,
        TimeSpan window)
    {
        var db = _redis.GetDatabase();
        var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
        var windowStart = now - (long)window.TotalSeconds;

        var count = await db.SortedSetCountAsync(key, windowStart, now);

        if (count >= limit)
        {
            var ttl = await db.KeyTimeToLiveAsync(key);
            return new RateLimitResult
            {
                Allowed = false,
                Remaining = 0,
                ResetAt = DateTimeOffset.FromUnixTimeSeconds(now + (ttl ?? window.TotalSeconds))
            };
        }

        await db.SortedSetAddAsync(key, now, now);
        await db.KeyExpireAsync(key, window);

        var remaining = limit - (int)count - 1;
        return new RateLimitResult
        {
            Allowed = true,
            Remaining = remaining,
            ResetAt = DateTimeOffset.FromUnixTimeSeconds(now + (long)window.TotalSeconds)
        };
    }
}

This is a strong pattern because it supports:

  • rolling-window fairness,
  • multiple gateway instances,
  • and shared limit state.

Lua for Atomicity

Distributed rate limiting often needs atomic operations so multiple concurrent requests do not race incorrectly.

That is why Redis plus Lua is such a common production pattern.

It makes:

  • check count,
  • remove old entries,
  • add current request,
  • and compute reset time

happen as one atomic unit.

What Identity Should You Limit By?

This is often more important than the algorithm itself.

Common Identity Keys

IP Address

Good for:

  • anonymous traffic
  • basic DDoS defense
  • public unauthenticated endpoints

Weakness:

  • many users may share one IP
  • mobile and proxy environments make it imperfect

API Key

Good for:

  • partner integrations
  • service accounts
  • paid plans
  • quota-based billing

User ID

Good for:

  • authenticated user fairness
  • account-level usage enforcement

Tenant ID

Good for:

  • SaaS environments
  • org-level quota controls
  • enterprise plans

Endpoint

Good for:

  • route-sensitive protections
  • expensive or admin APIs
  • login and auth endpoints

In practice, strong systems often combine multiple dimensions, such as:

  • IP + endpoint for anonymous protection
  • user or API key for fairness
  • tenant for billing-tier enforcement

Rate Limit Headers

Clients need clear feedback when rate limiting is active.

That is why headers matter.

Standard Headers to Return

  • X-RateLimit-Limit
  • X-RateLimit-Remaining
  • X-RateLimit-Reset
  • Retry-After when blocked

Example Middleware

public class RateLimitHeaderMiddleware
{
    private readonly RequestDelegate _next;
    private readonly IRateLimiter _rateLimiter;

    public async Task InvokeAsync(HttpContext context)
    {
        var key = GetClientKey(context);
        var result = await _rateLimiter.CheckRateLimitAsync(key, 100, TimeSpan.FromMinutes(1));

        context.Response.Headers.Add("X-RateLimit-Limit", "100");
        context.Response.Headers.Add("X-RateLimit-Remaining", result.Remaining.ToString());
        context.Response.Headers.Add("X-RateLimit-Reset", 
            result.ResetAt.ToUnixTimeSeconds().ToString());

        if (!result.Allowed)
        {
            context.Response.StatusCode = 429;
            context.Response.Headers.Add("Retry-After", 
                ((int)(result.ResetAt - DateTimeOffset.UtcNow).TotalSeconds).ToString());
            await context.Response.WriteAsync("Rate limit exceeded");
            return;
        }

        await _next(context);
    }
}

Why This Matters

Headers improve:

  • client retry behavior,
  • developer experience,
  • and transparency during throttling.

Poorly handled 429 responses often lead to worse client behavior, not better.

Advanced Patterns

Simple limits are often not enough for real production systems.

Tiered Rate Limiting

Different clients may have different quotas.

public class TieredRateLimiter
{
    public int GetLimitForTier(RateLimitTier tier)
    {
        return tier switch
        {
            RateLimitTier.Free => 100,
            RateLimitTier.Basic => 1000,
            RateLimitTier.Premium => 10000,
            RateLimitTier.Enterprise => 100000,
            _ => 100
        };
    }
}

This is useful for:

  • SaaS plans,
  • partner tiers,
  • and commercial API products.

Adaptive Rate Limiting

Sometimes limits should respond to system load.

public class AdaptiveRateLimiter
{
    private readonly ISystemLoadMonitor _loadMonitor;

    public async Task<int> GetAdaptiveLimitAsync(string clientId)
    {
        var baseLimit = GetBaseLimit(clientId);
        var systemLoad = await _loadMonitor.GetSystemLoadAsync();
        
        if (systemLoad > 0.8)
        {
            return (int)(baseLimit * 0.5);
        }
        else if (systemLoad > 0.6)
        {
            return (int)(baseLimit * 0.75);
        }
        
        return baseLimit;
    }
}

This can be useful in:

  • overload protection,
  • degraded-mode traffic shaping,
  • and cost-sensitive systems.

DDoS and Abuse Protection

Not all rate limiting is about fairness.

Some of it is about defense.

public class DdosProtectionRateLimiter
{
    public async Task<bool> IsDdosAttackAsync(string clientIp)
    {
        var requestCount = await GetRequestCountAsync(clientIp, TimeSpan.FromSeconds(1));
        
        if (requestCount > 100)
        {
            await BlockIpAsync(clientIp, TimeSpan.FromMinutes(10));
            return true;
        }
        
        return false;
    }
}

This kind of aggressive short-window protection is useful for:

  • auth endpoints,
  • anonymous APIs,
  • and suspicious traffic bursts.

Monitoring and Observability

Rate limiting should be measured continuously.

Otherwise teams end up with:

  • limits that are too weak,
  • limits that block legitimate users,
  • or no visibility into who is being throttled and why.

What to Track

Track:

  • total requests
  • blocked requests
  • block rate by route
  • block rate by client tier
  • top throttled clients
  • peak usage windows
  • retry behavior after 429s

Example Metrics

public class RateLimitMetrics
{
    private readonly Counter<long> _rateLimitHits;
    private readonly Counter<long> _rateLimitExceeded;

    public void RecordRateLimitHit(string clientId)
    {
        _rateLimitHits.Add(1, new KeyValuePair<string, object>("client_id", clientId));
    }

    public void RecordRateLimitExceeded(string clientId)
    {
        _rateLimitExceeded.Add(1, new KeyValuePair<string, object>("client_id", clientId));
    }
}

This is how teams learn:

  • whether limits are too low,
  • whether abusive traffic is concentrated,
  • and whether premium tiers actually use their higher quotas.

Best Practices

1. Match the Algorithm to the Traffic

Do not choose an algorithm only because it is easy. Choose one that matches:

  • burst tolerance
  • fairness needs
  • client behavior
  • and operational complexity

2. Use Distributed Storage for Multi-Node Gateways

If multiple nodes exist, local counters are rarely enough.

3. Rate Limit by the Right Identity

IP is not enough for everything. User, API key, tenant, and endpoint often matter more.

4. Return Clear Headers and 429 Responses

Clients should be told:

  • how much quota remains,
  • when to retry,
  • and what happened.

5. Monitor Before and After Tuning

Rate limiting is not set-and-forget. It should be tuned against real usage.

6. Protect Sensitive Routes More Aggressively

Login, auth, admin, export, and expensive query endpoints often deserve their own stricter policies.

Common Mistakes to Avoid

Teams often make the same avoidable errors:

  • using local in-memory counters across multiple instances
  • choosing fixed windows where burst behavior matters a lot
  • rate limiting only by IP in authenticated systems
  • skipping headers and returning unhelpful 429 responses
  • applying one global limit to every route
  • ignoring rate limit telemetry
  • and setting limits before measuring real client behavior

The algorithm matters. The identity key matters. The deployment topology matters even more.

Practical Selection Advice

Choose Ocelot When

  • the system is .NET-centric
  • you want simple route-level limits
  • built-in gateway configuration is attractive
  • the traffic policy is not highly custom

Choose YARP When

  • you want deep control in .NET code
  • custom rate limit behavior matters
  • token bucket or advanced dynamic logic is needed
  • performance and middleware composition matter

Choose Kong When

  • you want richer built-in policy handling
  • distributed limits are central
  • the platform is broader than .NET
  • plugin-based API governance is part of the roadmap

Conclusion

Rate limiting is one of the most important controls an API gateway can provide because it protects backend services, preserves fairness, and reduces abuse before requests move deeper into the platform.

But good rate limiting is not only about picking a number.

It is about:

  • choosing the right algorithm,
  • enforcing limits with the right identity key,
  • making the rules consistent across instances,
  • exposing clear headers,
  • and tuning policies based on real traffic rather than guesswork.

Ocelot, YARP, and Kong can all support strong rate limiting strategies, but they do so in different ways:

  • Ocelot is simpler and configuration-friendly,
  • YARP is flexible and code-driven,
  • Kong is rich and policy-oriented.

The best choice depends on the platform you are actually operating, not just the feature list you like most.

A good gateway rate limiting system should feel predictable under normal traffic, firm under abuse, and measurable at all times.

That is the standard to aim for.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

Related posts