Rate Limiting

Rate limiting controls how many requests users can make to the API within a time window. It's to prevents abuse, ensure fair resource usage for everyone, and protect the infra from overload.

Without proper rate limiting, we'd be vulnerable to a single user overehelming the system, racking up a large compute bill, possibly enumerating all images and scraping public data, or even launching denial-of-service attacks. So, while it may seem restrictive at times, it's a crucial part of maintaining a healthy and reliable service for everyone.

Quick start

Rate limiting is enabled by default and requires no configuration for most deployments. The system automatically adjusts limits based on:

Whether billing is enabled (self-hosted vs SaaS)
User's subscription tier (Free/Starter/Pro)
API key scope (read/write/admin)

For self-hosted instances without billing, rate limits are disabled entirely - you get unlimited requests.

How it works

The rate limiting system uses a token bucket algorithm backed by Cloudflare Workers KV. Here's the flow:

mermaid

graph TB
    A[Incoming Request] --> B{Billing Enabled?}
    B -->|No| C[✓ Allow - Unlimited]
    B -->|Yes| D{Authenticated?}
    D -->|No| E[Check IP-based limit<br/>60 req/min]
    D -->|Yes| F[Get user's plan tier<br/>from cache/DB]
    F --> G[Calculate effective limit<br/>base × plan × scope]
    G --> H[Check KV counter]
    H --> I{Within limit?}
    I -->|Yes| J[✓ Allow + set headers]
    I -->|No| K[✗ Reject 429]
    E --> L{Within limit?}
    L -->|Yes| J
    L -->|No| K

    style C fill:#90EE904D
    style J fill:#90EE904D
    style K fill:#FFB6C64D

The system checks rate limits in this order:

If billing is disabled (STRIPE_ENABLED not set), skip all rate limiting
For authenticated requests, calculate: base_limit × plan_multiplier × scope_multiplier
For unauthenticated requests, use IP-based limits
Check the counter in Workers KV
If under limit, increment counter and allow request
If over limit, return HTTP 429 with Retry-After header

Components

The rate limiting system consists of three main parts:

Global middleware (packages/api/src/middleware/global-rate-limit.ts)

Applied to all /v1/* API routes
Handles both authenticated and public requests
Sets rate limit headers on every response

Rate limit engine (packages/api/src/lib/rate-limit.ts)

Core token bucket implementation
Fail-open behavior (allows requests if KV is down)
Returns limit, remaining, and reset time

Plan tier calculator (packages/api/src/lib/rate-limit-utils.ts)

Fetches user's subscription tier
Caches plan info for 5 minutes
Handles grace periods and subscription changes

Default rate limits

API endpoints

All authenticated endpoints under /v1/* share a global rate limit based on your plan and API key scope.

User Tier	API Key Scope	Requests per Minute
Self-hosted (no billing)	Any	Unlimited
Free	Read	2,000
Free	Write/Admin	1,000
Starter	Read	20,000
Starter	Write/Admin	10,000
Pro	Read	200,000
Pro	Write/Admin	100,000

The calculation is: 1,000 (base) × plan_multiplier × scope_multiplier

Plan multipliers:

Free: 1×
Starter: 10×
Pro: 100×

Scope multipliers:

Read: 2× (monitoring tools and dashboards need higher limits)
Write: 1×
Admin: 1×

Public endpoints

Unauthenticated requests to /v1/public/* are rate limited by IP address:

Endpoint	Limit
Public profiles, image listings	60 requests per minute

CDN endpoints

Image serving has different limits based on the operation:

Operation	Limit	Notes
Serving cached images	Unlimited	Original and preset variants
Generating new preset variants	1,000 per minute per IP	First request for a variant
Generating custom transforms	10 per minute per IP	On-the-fly transformations with query params
Custom variants per image	100 total	Storage limit

Configuration

All rate limit values are defined in packages/config/src/api.ts:

typescript

export const RATE_LIMITS = {
  defaults: {
    authenticated: 1000, // Base limit for API endpoints
    public: 60, // Base limit for public endpoints
  },
  scopeMultipliers: {
    read: 2, // Read operations get 2× limit
    write: 1, // Write operations use base limit
    admin: 1, // Admin operations use base limit
  },
  planMultipliers: {
    free: 1, // Free tier uses base rate
    starter: 10, // Starter gets 10× limit
    pro: 100, // Pro gets 100× limit
  },
  cdn: {
    variantMiss: 1000, // CDN variant generation per IP
    window: 60_000, // Time window in milliseconds
  },
  customTransform: {
    limit: 10, // Custom transform generations per IP
    window: 60_000, // Time window in milliseconds
    maxVariantsPerImage: 100, // Max cached custom variants
  },
  cache: {
    planTierTtl: 300, // Cache plan info for 5 minutes
    minBucketTtl: 60, // Minimum KV expiration
  },
};

To change limits, edit these values and redeploy. For example, to give free tier users more generous limits:

typescript

planMultipliers: {
  free: 5,     // Changed from 1 to 5 (5× base rate)
  starter: 10,
  pro: 100,
}

After changes, rebuild and deploy:

bash

pnpm --filter @pixflare/config build
pnpm --filter @pixflare/api build
pnpm deploy:prod

Response headers

Every rate-limited response includes these headers:

http

X-RateLimit-Limit: 2000           # Your current limit
X-RateLimit-Remaining: 1847        # Requests left in window
X-RateLimit-Reset: 1704123456789   # Unix timestamp when limit resets

When you exceed the limit, you'll get a 429 response:

http

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704123456789
Retry-After: 42

{
  "code": "RATE_LIMITED",
  "message": "Too many requests. Please try again in 42 seconds."
}

The Retry-After header tells you exactly how many seconds to wait before retrying.

Architecture deep dive

Token bucket algorithm

The system uses a fixed window counter approach:

Each user/IP gets a bucket identified by a key: ratelimit:global:{owner}:{scope}
Every request increments the counter in that bucket
When the counter reaches the limit, requests are rejected
After the time window expires, the counter resets to zero

mermaid

sequenceDiagram
    participant C as Client
    participant M as Middleware
    participant KV as Workers KV
    participant DB as Database

    C->>M: Request to /v1/images
    M->>DB: Get user's plan tier (if not cached)
    DB-->>M: Plan: Starter
    M->>M: Calculate limit<br/>1000 × 10 × 1 = 10,000 req/min
    M->>KV: Get counter for key
    KV-->>M: count: 5847, resetAt: 1704123456
    M->>M: 5847 < 10,000 ✓
    M->>KV: Increment counter to 5848
    M->>C: 200 OK + rate limit headers

Caching strategy

To minimize database queries, the system caches user plan tiers in Workers KV for 5 minutes. This means:

First request: Queries database for plan tier
Next 5 minutes: Uses cached value
After 5 minutes: Cache expires, queries DB again

If the database is unavailable, the system falls back to free tier (most restrictive) to prevent abuse.

Fail-open behavior

If Workers KV is unavailable (network issues, service disruption), the rate limiter fails open:

Logs an error for monitoring
Allows the request through
Returns optimistic rate limit headers

This prioritizes availability over strict enforcement. It's better to let legitimate traffic through during an outage than to block everyone.

Scope-based limits

API keys can have different scopes that affect rate limits:

Read scope

GET requests, analytics queries, listing operations
Gets 2× multiplier because monitoring tools and dashboards need frequent polling
Example: 2,000 req/min for free tier

Write scope

POST/PATCH requests, uploads, updates
Uses base multiplier (1×)
Example: 1,000 req/min for free tier

Admin scope

DELETE operations, key management, settings changes
Uses base multiplier (1×)
Example: 1,000 req/min for free tier

Monitoring

Rate limit events are logged with context:

typescript

// Successful requests (debug level)
logger.debug('Global rate limit check passed', {
  owner: 'user@example.com',
  scope: 'read',
  limit: 2000,
  remaining: 1847,
});

// Rate limit exceeded (warning level)
logger.warn('Global rate limit exceeded for authenticated user', {
  owner: 'user@example.com',
  scope: 'write',
  limit: 1000,
  resetAt: 1704123456789,
});

// System failures (error level)
logger.error('Rate limit check failed, allowing request (fail-open behavior)', error);

You can monitor these logs to:

Track which users are hitting limits
Identify potential abuse patterns
Detect KV or database issues
Measure API usage across tiers

Troubleshooting

Users reporting 429 errors

Check if they're actually over the limit:

Look for their rate limit warnings in logs
Verify their subscription tier is correct
Check if they need a higher tier or different API key scope

Self-hosted instance enforcing limits unexpectedly

Make sure billing is disabled:

bash

# Should NOT be set for self-hosted
echo $STRIPE_ENABLED

If STRIPE_ENABLED is set to anything (even an empty string), the system treats it as billing enabled. Unset it completely:

bash

unset STRIPE_ENABLED
# or remove from wrangler.toml / environment

Limits not updating after subscription change

Plan tier info is cached for 5 minutes. Wait up to 5 minutes for the cache to expire, then the new limits will apply. You can verify the cached value by checking the logs for "Plan tier cache hit" messages.

Rate limits not working at all

Verify the global middleware is applied. Check packages/api/src/index.ts line 65:

typescript

app.use('/v1/*', createGlobalRateLimitMiddleware());

This must be present for rate limiting to work.

Advanced topics

Batch operations

Batch requests (like fetching stats for 100 images) count as a single request, not per-item. This makes batching more efficient:

GET /v1/analytics/images/img1,img2,img3,...,img100
# Counts as 1 request, not 100

Race conditions

The rate limiter uses a read-modify-write pattern that isn't atomic. With very high concurrency (100+ simultaneous requests), users might slightly exceed their limit. This is an acceptable trade-off for using Workers KV, which provides low latency and high availability. For strict enforcement, you'd need Durable Objects, which adds significant latency.

Grace periods

When a subscription expires, users enter a grace period (configurable, typically 3-7 days) where they keep their plan tier. After the grace period, they're downgraded to free tier. The cached plan info respects grace periods automatically.

Custom rate limit keys

The global middleware uses these key formats:

Authenticated: ratelimit:global:{owner}:{scope}
Public: ratelimit:global:public:{ip}

This means each user gets separate limits per scope. If you want to share limits across scopes, you'd need to modify the key generation in packages/api/src/middleware/global-rate-limit.ts.

Rate Limiting ​

Quick start ​

How it works ​

Components ​

Default rate limits ​

API endpoints ​

Public endpoints ​

CDN endpoints ​

Configuration ​

Response headers ​

Architecture deep dive ​

Token bucket algorithm ​

Caching strategy ​

Fail-open behavior ​

Scope-based limits ​

Monitoring ​

Troubleshooting ​

Users reporting 429 errors ​

Self-hosted instance enforcing limits unexpectedly ​

Limits not updating after subscription change ​

Rate limits not working at all ​

Advanced topics ​

Batch operations ​

Race conditions ​

Grace periods ​

Custom rate limit keys ​

Rate Limiting

Quick start

How it works

Components

Default rate limits

API endpoints

Public endpoints

CDN endpoints

Configuration

Response headers

Architecture deep dive

Token bucket algorithm

Caching strategy

Fail-open behavior

Scope-based limits

Monitoring

Troubleshooting

Users reporting 429 errors

Self-hosted instance enforcing limits unexpectedly

Limits not updating after subscription change

Rate limits not working at all

Advanced topics

Batch operations

Race conditions

Grace periods

Custom rate limit keys