Rate Limiting
Rate limiting controls how many requests users can make to the API within a time window. It's to prevents abuse, ensure fair resource usage for everyone, and protect the infra from overload.
Without proper rate limiting, we'd be vulnerable to a single user overehelming the system, racking up a large compute bill, possibly enumerating all images and scraping public data, or even launching denial-of-service attacks. So, while it may seem restrictive at times, it's a crucial part of maintaining a healthy and reliable service for everyone.
Quick start
Rate limiting is enabled by default and requires no configuration for most deployments. The system automatically adjusts limits based on:
- Whether billing is enabled (self-hosted vs SaaS)
- User's subscription tier (Free/Starter/Pro)
- API key scope (read/write/admin)
For self-hosted instances without billing, rate limits are disabled entirely - you get unlimited requests.
How it works
The rate limiting system uses a token bucket algorithm backed by Cloudflare Workers KV. Here's the flow:
graph TB
A[Incoming Request] --> B{Billing Enabled?}
B -->|No| C[✓ Allow - Unlimited]
B -->|Yes| D{Authenticated?}
D -->|No| E[Check IP-based limit<br/>60 req/min]
D -->|Yes| F[Get user's plan tier<br/>from cache/DB]
F --> G[Calculate effective limit<br/>base × plan × scope]
G --> H[Check KV counter]
H --> I{Within limit?}
I -->|Yes| J[✓ Allow + set headers]
I -->|No| K[✗ Reject 429]
E --> L{Within limit?}
L -->|Yes| J
L -->|No| K
style C fill:#90EE904D
style J fill:#90EE904D
style K fill:#FFB6C64DThe system checks rate limits in this order:
- If billing is disabled (
STRIPE_ENABLEDnot set), skip all rate limiting - For authenticated requests, calculate:
base_limit × plan_multiplier × scope_multiplier - For unauthenticated requests, use IP-based limits
- Check the counter in Workers KV
- If under limit, increment counter and allow request
- If over limit, return HTTP 429 with
Retry-Afterheader
Components
The rate limiting system consists of three main parts:
Global middleware (packages/api/src/middleware/global-rate-limit.ts)
- Applied to all
/v1/*API routes - Handles both authenticated and public requests
- Sets rate limit headers on every response
Rate limit engine (packages/api/src/lib/rate-limit.ts)
- Core token bucket implementation
- Fail-open behavior (allows requests if KV is down)
- Returns limit, remaining, and reset time
Plan tier calculator (packages/api/src/lib/rate-limit-utils.ts)
- Fetches user's subscription tier
- Caches plan info for 5 minutes
- Handles grace periods and subscription changes
Default rate limits
API endpoints
All authenticated endpoints under /v1/* share a global rate limit based on your plan and API key scope.
| User Tier | API Key Scope | Requests per Minute |
|---|---|---|
| Self-hosted (no billing) | Any | Unlimited |
| Free | Read | 2,000 |
| Free | Write/Admin | 1,000 |
| Starter | Read | 20,000 |
| Starter | Write/Admin | 10,000 |
| Pro | Read | 200,000 |
| Pro | Write/Admin | 100,000 |
The calculation is: 1,000 (base) × plan_multiplier × scope_multiplier
Plan multipliers:
- Free: 1×
- Starter: 10×
- Pro: 100×
Scope multipliers:
- Read: 2× (monitoring tools and dashboards need higher limits)
- Write: 1×
- Admin: 1×
Public endpoints
Unauthenticated requests to /v1/public/* are rate limited by IP address:
| Endpoint | Limit |
|---|---|
| Public profiles, image listings | 60 requests per minute |
CDN endpoints
Image serving has different limits based on the operation:
| Operation | Limit | Notes |
|---|---|---|
| Serving cached images | Unlimited | Original and preset variants |
| Generating new preset variants | 1,000 per minute per IP | First request for a variant |
| Generating custom transforms | 10 per minute per IP | On-the-fly transformations with query params |
| Custom variants per image | 100 total | Storage limit |
Configuration
All rate limit values are defined in packages/config/src/api.ts:
export const RATE_LIMITS = {
defaults: {
authenticated: 1000, // Base limit for API endpoints
public: 60, // Base limit for public endpoints
},
scopeMultipliers: {
read: 2, // Read operations get 2× limit
write: 1, // Write operations use base limit
admin: 1, // Admin operations use base limit
},
planMultipliers: {
free: 1, // Free tier uses base rate
starter: 10, // Starter gets 10× limit
pro: 100, // Pro gets 100× limit
},
cdn: {
variantMiss: 1000, // CDN variant generation per IP
window: 60_000, // Time window in milliseconds
},
customTransform: {
limit: 10, // Custom transform generations per IP
window: 60_000, // Time window in milliseconds
maxVariantsPerImage: 100, // Max cached custom variants
},
cache: {
planTierTtl: 300, // Cache plan info for 5 minutes
minBucketTtl: 60, // Minimum KV expiration
},
};To change limits, edit these values and redeploy. For example, to give free tier users more generous limits:
planMultipliers: {
free: 5, // Changed from 1 to 5 (5× base rate)
starter: 10,
pro: 100,
}After changes, rebuild and deploy:
pnpm --filter @pixflare/config build
pnpm --filter @pixflare/api build
pnpm deploy:prodResponse headers
Every rate-limited response includes these headers:
X-RateLimit-Limit: 2000 # Your current limit
X-RateLimit-Remaining: 1847 # Requests left in window
X-RateLimit-Reset: 1704123456789 # Unix timestamp when limit resetsWhen you exceed the limit, you'll get a 429 response:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704123456789
Retry-After: 42
{
"code": "RATE_LIMITED",
"message": "Too many requests. Please try again in 42 seconds."
}The Retry-After header tells you exactly how many seconds to wait before retrying.
Architecture deep dive
Token bucket algorithm
The system uses a fixed window counter approach:
- Each user/IP gets a bucket identified by a key:
ratelimit:global:{owner}:{scope} - Every request increments the counter in that bucket
- When the counter reaches the limit, requests are rejected
- After the time window expires, the counter resets to zero
sequenceDiagram
participant C as Client
participant M as Middleware
participant KV as Workers KV
participant DB as Database
C->>M: Request to /v1/images
M->>DB: Get user's plan tier (if not cached)
DB-->>M: Plan: Starter
M->>M: Calculate limit<br/>1000 × 10 × 1 = 10,000 req/min
M->>KV: Get counter for key
KV-->>M: count: 5847, resetAt: 1704123456
M->>M: 5847 < 10,000 ✓
M->>KV: Increment counter to 5848
M->>C: 200 OK + rate limit headersCaching strategy
To minimize database queries, the system caches user plan tiers in Workers KV for 5 minutes. This means:
- First request: Queries database for plan tier
- Next 5 minutes: Uses cached value
- After 5 minutes: Cache expires, queries DB again
If the database is unavailable, the system falls back to free tier (most restrictive) to prevent abuse.
Fail-open behavior
If Workers KV is unavailable (network issues, service disruption), the rate limiter fails open:
- Logs an error for monitoring
- Allows the request through
- Returns optimistic rate limit headers
This prioritizes availability over strict enforcement. It's better to let legitimate traffic through during an outage than to block everyone.
Scope-based limits
API keys can have different scopes that affect rate limits:
Read scope
- GET requests, analytics queries, listing operations
- Gets 2× multiplier because monitoring tools and dashboards need frequent polling
- Example: 2,000 req/min for free tier
Write scope
- POST/PATCH requests, uploads, updates
- Uses base multiplier (1×)
- Example: 1,000 req/min for free tier
Admin scope
- DELETE operations, key management, settings changes
- Uses base multiplier (1×)
- Example: 1,000 req/min for free tier
Monitoring
Rate limit events are logged with context:
// Successful requests (debug level)
logger.debug('Global rate limit check passed', {
owner: 'user@example.com',
scope: 'read',
limit: 2000,
remaining: 1847,
});
// Rate limit exceeded (warning level)
logger.warn('Global rate limit exceeded for authenticated user', {
owner: 'user@example.com',
scope: 'write',
limit: 1000,
resetAt: 1704123456789,
});
// System failures (error level)
logger.error('Rate limit check failed, allowing request (fail-open behavior)', error);You can monitor these logs to:
- Track which users are hitting limits
- Identify potential abuse patterns
- Detect KV or database issues
- Measure API usage across tiers
Troubleshooting
Users reporting 429 errors
Check if they're actually over the limit:
- Look for their rate limit warnings in logs
- Verify their subscription tier is correct
- Check if they need a higher tier or different API key scope
Self-hosted instance enforcing limits unexpectedly
Make sure billing is disabled:
# Should NOT be set for self-hosted
echo $STRIPE_ENABLEDIf STRIPE_ENABLED is set to anything (even an empty string), the system treats it as billing enabled. Unset it completely:
unset STRIPE_ENABLED
# or remove from wrangler.toml / environmentLimits not updating after subscription change
Plan tier info is cached for 5 minutes. Wait up to 5 minutes for the cache to expire, then the new limits will apply. You can verify the cached value by checking the logs for "Plan tier cache hit" messages.
Rate limits not working at all
Verify the global middleware is applied. Check packages/api/src/index.ts line 65:
app.use('/v1/*', createGlobalRateLimitMiddleware());This must be present for rate limiting to work.
Advanced topics
Batch operations
Batch requests (like fetching stats for 100 images) count as a single request, not per-item. This makes batching more efficient:
GET /v1/analytics/images/img1,img2,img3,...,img100
# Counts as 1 request, not 100Race conditions
The rate limiter uses a read-modify-write pattern that isn't atomic. With very high concurrency (100+ simultaneous requests), users might slightly exceed their limit. This is an acceptable trade-off for using Workers KV, which provides low latency and high availability. For strict enforcement, you'd need Durable Objects, which adds significant latency.
Grace periods
When a subscription expires, users enter a grace period (configurable, typically 3-7 days) where they keep their plan tier. After the grace period, they're downgraded to free tier. The cached plan info respects grace periods automatically.
Custom rate limit keys
The global middleware uses these key formats:
- Authenticated:
ratelimit:global:{owner}:{scope} - Public:
ratelimit:global:public:{ip}
This means each user gets separate limits per scope. If you want to share limits across scopes, you'd need to modify the key generation in packages/api/src/middleware/global-rate-limit.ts.