Rate Limiting Strategies for Geocoding Batch Jobs

As part of the Multi-API Routing & Fallback Chains pattern for geocoding pipelines, rate limiting is the control layer that keeps your batch jobs within contractual throughput boundaries without sacrificing normalization throughput. This page explains how to implement that control layer — from quota inventory through token-bucket algorithms, centralized async gates, jittered backoff, and circuit breakers — for Python-based address processing workloads.

Geocoding and address normalization pipelines routinely ingest tens or hundreds of thousands of records per execution. Commercial mapping providers, open-source geocoders, and municipal data portals all enforce strict throughput caps to preserve infrastructure stability. When batch jobs exceed these thresholds, pipelines trigger HTTP 429 responses, incur punitive overage fees, or face temporary IP-level bans. Effective rate limiting is not merely about slowing down requests; it is about maximizing throughput within contractual boundaries while maintaining predictable latency and graceful degradation.


Rate limiting pipeline for batch geocoding Diagram showing an address batch entering a quota inventory stage, passing through a token bucket gate, an async dispatcher with semaphore, then reaching the provider API. A 429 response feeds back into a jittered backoff block that re-enters the token bucket. Address Batch Quota Inventory RPM → concurrency ceiling Token Bucket AsyncLimiter Async Dispatcher + Semaphore (max connections) Provider API Jittered backoff (429)

Prerequisites

Before deploying rate-limiting logic into a production pipeline, confirm these foundations are in place:

Understanding your provider’s exact enforcement mechanism is critical. Some APIs use fixed-window counters, others implement sliding windows, and a few rely on token-bucket models. Your rate limiter should slightly under-provision against the provider’s actual enforcement to avoid edge-case throttling.

Step 1 — Inventory Quotas and Calculate Concurrency Ceilings

Start by extracting the strictest rate limit across your target providers. Convert requests-per-minute (RPM) to a safe concurrency ceiling:

max_concurrency = floor(RPM / 60) - 1

Always reserve one slot for retries, health checks, and unexpected latency spikes. If a provider enforces a 100 RPM limit, your theoretical maximum is 1 concurrent request. For higher RPM tiers, scale concurrency proportionally but never exceed floor(RPM / 60). This guardrail prevents queue saturation and ensures your pipeline respects the provider’s sliding window boundaries.

Provider tier RPM Safe concurrency Reserve slots
Free / sandbox 60 0 Use serial dispatch
Standard 300 4 1 retry + 1 health
Business 1 200 19 1 retry + 1 health
Enterprise 6 000 99 1 retry + 1 health

Step 2 — Select the Optimal Throttling Algorithm

For batch geocoding, fixed-window counters are generally inadequate because they reset at arbitrary intervals, causing burst spikes that violate provider SLAs. Implement a token bucket or leaky bucket algorithm instead.

Token buckets allow controlled bursts while maintaining long-term averages, which aligns well with address normalization workloads that experience variable payload sizes. When a batch contains a mix of simple street addresses and complex rural coordinates with multiple lookup attempts, the bucket absorbs the initial processing load without exhausting the quota. The bucket refills at a steady rate, guaranteeing that your pipeline never exceeds the provider’s sustained throughput limit.

For pipelines that need perfectly smooth output — such as when feeding a downstream provider that enforces strict per-second ceilings — a leaky bucket drains at a fixed rate regardless of burst arrival, preventing any instantaneous spike.

Step 3 — Implement a Centralized Rate Limiter

Decouple rate limiting from individual request handlers. A shared limiter instance gates all outbound calls, ensuring that concurrent workers respect the global throughput budget regardless of how many tasks are spawned.

Centralising the limiter prevents the “thundering herd” problem, where multiple workers independently decide to send requests simultaneously. By funnelling all outbound traffic through a single gatekeeper, you maintain deterministic pacing and simplify quota tracking. This pattern pairs naturally with building async geocoding requests in Python, where the dispatcher itself is already structured around a shared event loop.

import asyncio
from aiolimiter import AsyncLimiter

# Per-provider limiters — configure one per geocoding service
PROVIDER_LIMITERS: dict[str, AsyncLimiter] = {
    "primary":   AsyncLimiter(max_rate=300, time_period=60),
    "secondary": AsyncLimiter(max_rate=120, time_period=60),
    "fallback":  AsyncLimiter(max_rate=60,  time_period=60),
}

# Global connection cap: sum of all concurrency ceilings
GLOBAL_SEMAPHORE = asyncio.Semaphore(24)

Step 4 — Integrate with Async Dispatchers

Wire the centralised limiter into your async HTTP dispatcher. Pair AsyncLimiter with asyncio.Semaphore to enforce both rate limits and connection concurrency simultaneously.

import asyncio
import aiohttp
from aiolimiter import AsyncLimiter
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    wait_random,
    retry_if_exception_type,
)


PROVIDER_LIMITERS: dict[str, AsyncLimiter] = {
    "primary":   AsyncLimiter(max_rate=300, time_period=60),
    "secondary": AsyncLimiter(max_rate=120, time_period=60),
}
GLOBAL_SEMAPHORE = asyncio.Semaphore(24)


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential(multiplier=1, min=2, max=30) + wait_random(0, 2),
    retry=retry_if_exception_type((aiohttp.ClientError, asyncio.TimeoutError)),
)
async def fetch_geocode(
    session: aiohttp.ClientSession,
    address: str,
    provider: str = "primary",
    base_url: str = "https://api.provider.example/v1/geocode",
) -> dict:
    """Fetch geocoding result for a single address string.

    Applies per-provider rate limiting and global connection capping
    before issuing the HTTP request. Retries on 429 after honouring
    the Retry-After header, and retries on transient network errors
    with jittered exponential backoff.

    Args:
        session:   Shared aiohttp ClientSession.
        address:   Raw address string to geocode.
        provider:  Key into PROVIDER_LIMITERS.
        base_url:  Geocoding endpoint for this provider.

    Returns:
        Parsed JSON response dict from the provider.

    Raises:
        aiohttp.ClientResponseError: On non-retryable HTTP errors.
    """
    limiter = PROVIDER_LIMITERS[provider]

    async with limiter:
        async with GLOBAL_SEMAPHORE:
            async with session.get(
                base_url,
                params={"q": address},
                timeout=aiohttp.ClientTimeout(total=10),
            ) as response:
                if response.status == 429:
                    retry_after = int(response.headers.get("Retry-After", 2))
                    await asyncio.sleep(retry_after)
                    raise aiohttp.ClientResponseError(
                        response.request_info,
                        response.history,
                        status=429,
                        message="Rate limited",
                    )
                response.raise_for_status()
                return await response.json()


async def geocode_batch(
    addresses: list[str],
    provider: str = "primary",
) -> list[dict]:
    """Geocode a batch of addresses concurrently with rate limiting.

    Args:
        addresses: List of raw address strings.
        provider:  Provider key for rate limiter selection.

    Returns:
        List of geocoding result dicts in input order.
    """
    async with aiohttp.ClientSession() as session:
        tasks = [
            fetch_geocode(session, addr, provider=provider)
            for addr in addresses
        ]
        return await asyncio.gather(*tasks, return_exceptions=True)

Vectorized pandas variant

For pipelines that read from a DataFrame rather than a plain list, wrap the async dispatcher with asyncio.run in a .apply call or, better, collect results in bulk:

import asyncio
import pandas as pd


def geocode_dataframe(df: pd.DataFrame, address_col: str = "address") -> pd.DataFrame:
    """Add geocoding results to a DataFrame using the async batch dispatcher.

    Args:
        df:           Input DataFrame containing raw addresses.
        address_col:  Column name holding address strings.

    Returns:
        DataFrame with a new 'geocode_result' column.
    """
    addresses: list[str] = df[address_col].tolist()
    results = asyncio.run(geocode_batch(addresses))
    df = df.copy()
    df["geocode_result"] = results
    return df

Provider Parameter Reference

Parameter Description Recommended value
max_rate Tokens granted per time_period Provider RPM minus 5% safety margin
time_period Window length in seconds 60 for RPM quotas; 1 for RPS quotas
ClientTimeout.total Per-request wall-clock timeout 10 s for standard providers; 20 s for rural/international records
stop_after_attempt Maximum retry count 4 (3 retries)
wait_exponential.min Minimum backoff in seconds 2
wait_exponential.max Maximum backoff cap 30
wait_random Jitter range added to exponential 0–2 s (full jitter)
Semaphore count Max concurrent open connections floor(RPM / 60) - 1 summed across providers

Edge Cases

1. Provider enforces per-second RPS alongside per-minute RPM

Some providers stack two distinct quota windows: a per-second burst ceiling (e.g. 5 RPS) and a per-minute sustained cap (e.g. 120 RPM). Stack two limiters in sequence:

RPS_LIMITER = AsyncLimiter(max_rate=4, time_period=1)   # 4 RPS — leave 1 headroom
RPM_LIMITER = AsyncLimiter(max_rate=115, time_period=60) # 115 RPM — leave 5 headroom

async with RPS_LIMITER:
    async with RPM_LIMITER:
        async with GLOBAL_SEMAPHORE:
            # issue request
            ...

2. Batch contains malformed addresses that generate slow provider responses

Rural routes and non-standard addresses (e.g. addresses without a street number) often cause providers to perform deeper fuzzy matching, increasing latency. A fixed ClientTimeout will surface these as asyncio.TimeoutError. Capture them separately and route to a fallback chain rather than retrying on the same provider:

from tenacity import retry_if_not_exception_type

@retry(
    stop=stop_after_attempt(2),
    retry=retry_if_not_exception_type(asyncio.TimeoutError),
    # TimeoutError goes to fallback; other errors retry
)
async def fetch_with_timeout_guard(
    session: aiohttp.ClientSession,
    address: str,
) -> dict:
    """Attempt geocoding; raises TimeoutError for caller to route elsewhere."""
    return await fetch_geocode(session, address)

3. Distributed workers sharing a quota across processes

When running workers across multiple processes or containers, asyncio-native limiters operate per-process and cannot enforce a shared ceiling. Use a Redis-backed token bucket with an atomic Lua decrement, or centralise dispatch in a single async orchestrator that all workers submit addresses to via a queue. Tracking per-provider spend across processes is covered in API quota tracking and cost management.

4. Provider returns Retry-After as an HTTP date instead of seconds

RFC 6585 allows Retry-After to be either an integer delay or an HTTP-date string. Parse both:

import email.utils
import time


def parse_retry_after(header_value: str) -> float:
    """Parse Retry-After header as seconds from now.

    Handles both integer-seconds and HTTP-date formats.

    Args:
        header_value: Raw Retry-After header string.

    Returns:
        Float seconds to wait before retrying.
    """
    try:
        return float(header_value)
    except ValueError:
        retry_ts = email.utils.parsedate_to_datetime(header_value).timestamp()
        return max(0.0, retry_ts - time.time())

5. Provider quota shared across API keys in the same account

Some providers count usage at the account level, not per key. Rotating API keys does not bypass the quota. Detect this by monitoring whether 429s persist across key rotations and switch to genuine provider diversification via dynamic provider selection based on region.

Performance and Vectorization

For most address normalization workloads, the optimal concurrency point is at floor(RPM / 60) - 1 per provider, as derived above. At this setting, CPython’s GIL is not a bottleneck because the async event loop is IO-bound, not CPU-bound.

For pipelines exceeding 10 000 records per run:

  • Chunk batches: Split the input DataFrame into chunks of 500–1 000 addresses. Submit each chunk as a separate asyncio.gather call. This limits peak memory and gives you natural checkpointing points.
  • Connection pooling: Share a single aiohttp.ClientSession across all tasks in a batch. Creating a new session per request wastes TCP handshake time and depletes ephemeral ports at high concurrency.
  • Avoid asyncio.run in a loop: Call asyncio.run once per batch, not once per address. The overhead of creating and destroying an event loop per record eliminates most of the concurrency benefit.
  • Pin a connector with limit: Pass aiohttp.TCPConnector(limit=MAX_CONNECTIONS) to the session constructor to set a hard cap on open connections independent of the semaphore. This prevents the OS from running out of file descriptors.

A 300-RPM provider with 4 concurrent workers sustains roughly 280 successful geocodes per minute after accounting for retry overhead and network jitter — approximately 16 800 addresses per hour per provider.

Troubleshooting

429s persist despite limiter being active

The limiter is per-process. If multiple processes share the same API key, they each maintain independent token buckets and the combined throughput exceeds the provider ceiling. Move to a Redis token bucket or a single-process orchestrator with a queue.

Semaphore blocks indefinitely

If GLOBAL_SEMAPHORE is never released, a task has exited without releasing the context manager — typically from an unhandled exception outside the async with block. Always wrap request logic inside the context manager rather than acquiring and releasing manually.

Retry count exhausted before Retry-After expires

When a provider sets Retry-After: 60 and your stop_after_attempt allows only 3 retries with a 30-second max backoff, you will exhaust retries before the provider resets its window. Increase max in wait_exponential to at least the provider’s maximum Retry-After value, or parse the header explicitly and await asyncio.sleep(retry_after) before re-enqueueing.

aiohttp.ServerDisconnectedError under high concurrency

Some providers close the connection when the client holds it open between rate-limited pauses. Set aiohttp.TCPConnector(keepalive_timeout=30) and reduce GLOBAL_SEMAPHORE by 20% until errors stop.

Queue depth grows unboundedly during sustained 429 periods

If all providers throttle simultaneously, tasks pile up in the event loop’s pending queue. Implement a backpressure limit: cap the number of in-flight asyncio.Task objects using a second semaphore or a bounded asyncio.Queue. When the queue is full, suspend ingestion and flush pending records to durable storage (Parquet or object storage) before resuming.

FAQ

How do I choose between a token bucket and a leaky bucket for geocoding?

Use a token bucket when your address batches have variable burst patterns — it absorbs short spikes while enforcing a long-term average. Use a leaky bucket when you need perfectly smooth output pacing, such as when a downstream provider enforces a strict per-second ceiling with no burst tolerance.

Should I apply rate limiting per provider or globally across all providers?

Per-provider. Each geocoding service enforces its own quota independently. Use separate AsyncLimiter instances keyed by provider name, and aggregate a global concurrency semaphore over the total connections your infrastructure can support.

What is full jitter and why does it outperform equal jitter for retries?

Full jitter draws the retry delay from the interval [0, cap] uniformly, whereas equal jitter adds randomness only to the upper half. Full jitter spreads retry storms across a wider window, reducing the probability of synchronized spikes that re-trigger 429s.

When should I open a circuit breaker versus just retrying?

Open the circuit when error rate exceeds a threshold (e.g. >50% over a 30-second window) or when consecutive failures exceed a count limit. Retrying into an already-overloaded provider amplifies its degradation. The circuit breaker lets the provider recover before traffic resumes.

How do I track per-provider rate limit headroom in a multi-worker setup?

Use a Redis token bucket shared across worker processes — each worker atomically decrements the token count via a Lua script. Alternatively, centralise dispatch in a single async orchestrator process and use asyncio-native limiters that need no cross-process coordination.