As part of the Multi-API Routing & Fallback Chains pattern for geocoding pipelines, rate limiting is the control layer that keeps your batch jobs within contractual throughput boundaries without sacrificing normalization throughput. This page explains how to implement that control layer — from quota inventory through token-bucket algorithms, centralized async gates, jittered backoff, and circuit breakers — for Python-based address processing workloads.
Geocoding and address normalization pipelines routinely ingest tens or hundreds of thousands of records per execution. Commercial mapping providers, open-source geocoders, and municipal data portals all enforce strict throughput caps to preserve infrastructure stability. When batch jobs exceed these thresholds, pipelines trigger HTTP 429 responses, incur punitive overage fees, or face temporary IP-level bans. Effective rate limiting is not merely about slowing down requests; it is about maximizing throughput within contractual boundaries while maintaining predictable latency and graceful degradation.
Prerequisites
Before deploying rate-limiting logic into a production pipeline, confirm these foundations are in place:
Understanding your provider’s exact enforcement mechanism is critical. Some APIs use fixed-window counters, others implement sliding windows, and a few rely on token-bucket models. Your rate limiter should slightly under-provision against the provider’s actual enforcement to avoid edge-case throttling.
Step 1 — Inventory Quotas and Calculate Concurrency Ceilings
Start by extracting the strictest rate limit across your target providers. Convert requests-per-minute (RPM) to a safe concurrency ceiling:
max_concurrency = floor(RPM / 60) - 1
Always reserve one slot for retries, health checks, and unexpected latency spikes. If a provider enforces a 100 RPM limit, your theoretical maximum is 1 concurrent request. For higher RPM tiers, scale concurrency proportionally but never exceed floor(RPM / 60). This guardrail prevents queue saturation and ensures your pipeline respects the provider’s sliding window boundaries.
| Provider tier | RPM | Safe concurrency | Reserve slots |
|---|---|---|---|
| Free / sandbox | 60 | 0 | Use serial dispatch |
| Standard | 300 | 4 | 1 retry + 1 health |
| Business | 1 200 | 19 | 1 retry + 1 health |
| Enterprise | 6 000 | 99 | 1 retry + 1 health |
Step 2 — Select the Optimal Throttling Algorithm
For batch geocoding, fixed-window counters are generally inadequate because they reset at arbitrary intervals, causing burst spikes that violate provider SLAs. Implement a token bucket or leaky bucket algorithm instead.
Token buckets allow controlled bursts while maintaining long-term averages, which aligns well with address normalization workloads that experience variable payload sizes. When a batch contains a mix of simple street addresses and complex rural coordinates with multiple lookup attempts, the bucket absorbs the initial processing load without exhausting the quota. The bucket refills at a steady rate, guaranteeing that your pipeline never exceeds the provider’s sustained throughput limit.
For pipelines that need perfectly smooth output — such as when feeding a downstream provider that enforces strict per-second ceilings — a leaky bucket drains at a fixed rate regardless of burst arrival, preventing any instantaneous spike.
Step 3 — Implement a Centralized Rate Limiter
Decouple rate limiting from individual request handlers. A shared limiter instance gates all outbound calls, ensuring that concurrent workers respect the global throughput budget regardless of how many tasks are spawned.
Centralising the limiter prevents the “thundering herd” problem, where multiple workers independently decide to send requests simultaneously. By funnelling all outbound traffic through a single gatekeeper, you maintain deterministic pacing and simplify quota tracking. This pattern pairs naturally with building async geocoding requests in Python, where the dispatcher itself is already structured around a shared event loop.
import asyncio
from aiolimiter import AsyncLimiter
# Per-provider limiters — configure one per geocoding service
PROVIDER_LIMITERS: dict[str, AsyncLimiter] = {
"primary": AsyncLimiter(max_rate=300, time_period=60),
"secondary": AsyncLimiter(max_rate=120, time_period=60),
"fallback": AsyncLimiter(max_rate=60, time_period=60),
}
# Global connection cap: sum of all concurrency ceilings
GLOBAL_SEMAPHORE = asyncio.Semaphore(24)
Step 4 — Integrate with Async Dispatchers
Wire the centralised limiter into your async HTTP dispatcher. Pair AsyncLimiter with asyncio.Semaphore to enforce both rate limits and connection concurrency simultaneously.
import asyncio
import aiohttp
from aiolimiter import AsyncLimiter
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
wait_random,
retry_if_exception_type,
)
PROVIDER_LIMITERS: dict[str, AsyncLimiter] = {
"primary": AsyncLimiter(max_rate=300, time_period=60),
"secondary": AsyncLimiter(max_rate=120, time_period=60),
}
GLOBAL_SEMAPHORE = asyncio.Semaphore(24)
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=2, max=30) + wait_random(0, 2),
retry=retry_if_exception_type((aiohttp.ClientError, asyncio.TimeoutError)),
)
async def fetch_geocode(
session: aiohttp.ClientSession,
address: str,
provider: str = "primary",
base_url: str = "https://api.provider.example/v1/geocode",
) -> dict:
"""Fetch geocoding result for a single address string.
Applies per-provider rate limiting and global connection capping
before issuing the HTTP request. Retries on 429 after honouring
the Retry-After header, and retries on transient network errors
with jittered exponential backoff.
Args:
session: Shared aiohttp ClientSession.
address: Raw address string to geocode.
provider: Key into PROVIDER_LIMITERS.
base_url: Geocoding endpoint for this provider.
Returns:
Parsed JSON response dict from the provider.
Raises:
aiohttp.ClientResponseError: On non-retryable HTTP errors.
"""
limiter = PROVIDER_LIMITERS[provider]
async with limiter:
async with GLOBAL_SEMAPHORE:
async with session.get(
base_url,
params={"q": address},
timeout=aiohttp.ClientTimeout(total=10),
) as response:
if response.status == 429:
retry_after = int(response.headers.get("Retry-After", 2))
await asyncio.sleep(retry_after)
raise aiohttp.ClientResponseError(
response.request_info,
response.history,
status=429,
message="Rate limited",
)
response.raise_for_status()
return await response.json()
async def geocode_batch(
addresses: list[str],
provider: str = "primary",
) -> list[dict]:
"""Geocode a batch of addresses concurrently with rate limiting.
Args:
addresses: List of raw address strings.
provider: Provider key for rate limiter selection.
Returns:
List of geocoding result dicts in input order.
"""
async with aiohttp.ClientSession() as session:
tasks = [
fetch_geocode(session, addr, provider=provider)
for addr in addresses
]
return await asyncio.gather(*tasks, return_exceptions=True)
Vectorized pandas variant
For pipelines that read from a DataFrame rather than a plain list, wrap the async dispatcher with asyncio.run in a .apply call or, better, collect results in bulk:
import asyncio
import pandas as pd
def geocode_dataframe(df: pd.DataFrame, address_col: str = "address") -> pd.DataFrame:
"""Add geocoding results to a DataFrame using the async batch dispatcher.
Args:
df: Input DataFrame containing raw addresses.
address_col: Column name holding address strings.
Returns:
DataFrame with a new 'geocode_result' column.
"""
addresses: list[str] = df[address_col].tolist()
results = asyncio.run(geocode_batch(addresses))
df = df.copy()
df["geocode_result"] = results
return df
Provider Parameter Reference
| Parameter | Description | Recommended value |
|---|---|---|
max_rate |
Tokens granted per time_period |
Provider RPM minus 5% safety margin |
time_period |
Window length in seconds | 60 for RPM quotas; 1 for RPS quotas |
ClientTimeout.total |
Per-request wall-clock timeout | 10 s for standard providers; 20 s for rural/international records |
stop_after_attempt |
Maximum retry count | 4 (3 retries) |
wait_exponential.min |
Minimum backoff in seconds | 2 |
wait_exponential.max |
Maximum backoff cap | 30 |
wait_random |
Jitter range added to exponential | 0–2 s (full jitter) |
| Semaphore count | Max concurrent open connections | floor(RPM / 60) - 1 summed across providers |
Edge Cases
1. Provider enforces per-second RPS alongside per-minute RPM
Some providers stack two distinct quota windows: a per-second burst ceiling (e.g. 5 RPS) and a per-minute sustained cap (e.g. 120 RPM). Stack two limiters in sequence:
RPS_LIMITER = AsyncLimiter(max_rate=4, time_period=1) # 4 RPS — leave 1 headroom
RPM_LIMITER = AsyncLimiter(max_rate=115, time_period=60) # 115 RPM — leave 5 headroom
async with RPS_LIMITER:
async with RPM_LIMITER:
async with GLOBAL_SEMAPHORE:
# issue request
...
2. Batch contains malformed addresses that generate slow provider responses
Rural routes and non-standard addresses (e.g. addresses without a street number) often cause providers to perform deeper fuzzy matching, increasing latency. A fixed ClientTimeout will surface these as asyncio.TimeoutError. Capture them separately and route to a fallback chain rather than retrying on the same provider:
from tenacity import retry_if_not_exception_type
@retry(
stop=stop_after_attempt(2),
retry=retry_if_not_exception_type(asyncio.TimeoutError),
# TimeoutError goes to fallback; other errors retry
)
async def fetch_with_timeout_guard(
session: aiohttp.ClientSession,
address: str,
) -> dict:
"""Attempt geocoding; raises TimeoutError for caller to route elsewhere."""
return await fetch_geocode(session, address)
3. Distributed workers sharing a quota across processes
When running workers across multiple processes or containers, asyncio-native limiters operate per-process and cannot enforce a shared ceiling. Use a Redis-backed token bucket with an atomic Lua decrement, or centralise dispatch in a single async orchestrator that all workers submit addresses to via a queue. Tracking per-provider spend across processes is covered in API quota tracking and cost management.
4. Provider returns Retry-After as an HTTP date instead of seconds
RFC 6585 allows Retry-After to be either an integer delay or an HTTP-date string. Parse both:
import email.utils
import time
def parse_retry_after(header_value: str) -> float:
"""Parse Retry-After header as seconds from now.
Handles both integer-seconds and HTTP-date formats.
Args:
header_value: Raw Retry-After header string.
Returns:
Float seconds to wait before retrying.
"""
try:
return float(header_value)
except ValueError:
retry_ts = email.utils.parsedate_to_datetime(header_value).timestamp()
return max(0.0, retry_ts - time.time())
5. Provider quota shared across API keys in the same account
Some providers count usage at the account level, not per key. Rotating API keys does not bypass the quota. Detect this by monitoring whether 429s persist across key rotations and switch to genuine provider diversification via dynamic provider selection based on region.
Performance and Vectorization
For most address normalization workloads, the optimal concurrency point is at floor(RPM / 60) - 1 per provider, as derived above. At this setting, CPython’s GIL is not a bottleneck because the async event loop is IO-bound, not CPU-bound.
For pipelines exceeding 10 000 records per run:
- Chunk batches: Split the input DataFrame into chunks of 500–1 000 addresses. Submit each chunk as a separate
asyncio.gathercall. This limits peak memory and gives you natural checkpointing points. - Connection pooling: Share a single
aiohttp.ClientSessionacross all tasks in a batch. Creating a new session per request wastes TCP handshake time and depletes ephemeral ports at high concurrency. - Avoid
asyncio.runin a loop: Callasyncio.runonce per batch, not once per address. The overhead of creating and destroying an event loop per record eliminates most of the concurrency benefit. - Pin a
connectorwithlimit: Passaiohttp.TCPConnector(limit=MAX_CONNECTIONS)to the session constructor to set a hard cap on open connections independent of the semaphore. This prevents the OS from running out of file descriptors.
A 300-RPM provider with 4 concurrent workers sustains roughly 280 successful geocodes per minute after accounting for retry overhead and network jitter — approximately 16 800 addresses per hour per provider.
Troubleshooting
429s persist despite limiter being active
The limiter is per-process. If multiple processes share the same API key, they each maintain independent token buckets and the combined throughput exceeds the provider ceiling. Move to a Redis token bucket or a single-process orchestrator with a queue.
Semaphore blocks indefinitely
If GLOBAL_SEMAPHORE is never released, a task has exited without releasing the context manager — typically from an unhandled exception outside the async with block. Always wrap request logic inside the context manager rather than acquiring and releasing manually.
Retry count exhausted before Retry-After expires
When a provider sets Retry-After: 60 and your stop_after_attempt allows only 3 retries with a 30-second max backoff, you will exhaust retries before the provider resets its window. Increase max in wait_exponential to at least the provider’s maximum Retry-After value, or parse the header explicitly and await asyncio.sleep(retry_after) before re-enqueueing.
aiohttp.ServerDisconnectedError under high concurrency
Some providers close the connection when the client holds it open between rate-limited pauses. Set aiohttp.TCPConnector(keepalive_timeout=30) and reduce GLOBAL_SEMAPHORE by 20% until errors stop.
Queue depth grows unboundedly during sustained 429 periods
If all providers throttle simultaneously, tasks pile up in the event loop’s pending queue. Implement a backpressure limit: cap the number of in-flight asyncio.Task objects using a second semaphore or a bounded asyncio.Queue. When the queue is full, suspend ingestion and flush pending records to durable storage (Parquet or object storage) before resuming.
FAQ
How do I choose between a token bucket and a leaky bucket for geocoding?
Use a token bucket when your address batches have variable burst patterns — it absorbs short spikes while enforcing a long-term average. Use a leaky bucket when you need perfectly smooth output pacing, such as when a downstream provider enforces a strict per-second ceiling with no burst tolerance.
Should I apply rate limiting per provider or globally across all providers?
Per-provider. Each geocoding service enforces its own quota independently. Use separate AsyncLimiter instances keyed by provider name, and aggregate a global concurrency semaphore over the total connections your infrastructure can support.
What is full jitter and why does it outperform equal jitter for retries?
Full jitter draws the retry delay from the interval [0, cap] uniformly, whereas equal jitter adds randomness only to the upper half. Full jitter spreads retry storms across a wider window, reducing the probability of synchronized spikes that re-trigger 429s.
When should I open a circuit breaker versus just retrying?
Open the circuit when error rate exceeds a threshold (e.g. >50% over a 30-second window) or when consecutive failures exceed a count limit. Retrying into an already-overloaded provider amplifies its degradation. The circuit breaker lets the provider recover before traffic resumes.
How do I track per-provider rate limit headroom in a multi-worker setup?
Use a Redis token bucket shared across worker processes — each worker atomically decrements the token count via a Lua script. Alternatively, centralise dispatch in a single async orchestrator process and use asyncio-native limiters that need no cross-process coordination.
Related
- Multi-API Routing & Fallback Chains — parent section covering the full architecture for resilient provider routing, quota-aware dispatch, and graceful degradation.
- Building Async Geocoding Requests in Python — covers structuring concurrent HTTP calls with
aiohttpandasyncio, including session lifecycle and dispatcher patterns that rate limiting integrates into. - Implementing Fallback Chains for Failed Lookups — explains how to cascade exhausted or throttled requests to secondary providers without stalling the pipeline.
- API Quota Tracking and Cost Management — covers tracking spend and quota consumption across providers in real time, complementing the per-provider limiter pattern here.
- Dynamic Provider Selection Based on Region — describes routing decisions based on address geography, which affects which provider’s rate limit headroom you need to preserve.