Step-by-Step Guide to CASS Address Validation

Submit cleaned records to a USPS-licensed CASS engine, evaluate the returned DPV flags, and route every outcome — this guide is part of the USPS CASS Certification Guidelines and covers the exact Python pipeline that takes raw input to a deliverability-confirmed output.


Pipeline at a Glance

The five-stage flow below shows how a raw address moves from ingestion to a geocoded, DPV-confirmed record (or a routed exception). Each stage maps directly to a section in this guide.

CASS Address Validation Pipeline Five sequential stages: Normalize Schema, Select CASS Endpoint, Submit and Retry, Evaluate DPV Flags, Geocode and Route. 1. Normalize Schema 2. Select CASS Endpoint 3. Submit & Retry 4. Evaluate DPV Flags 5. Geocode & Route Y/D → downstream S/M → review queue

Step 1 — Pre-Validation Schema Normalization

Raw address data rarely arrives CASS-ready. Before submitting to any certified engine, strip control characters, collapse whitespace, enforce consistent casing, and map fields to the USPS Publication 28 standard schema: address_line1, address_line2, city, state, zip_code.

Inconsistent delimiters, trailing punctuation, and mixed-case abbreviations all produce false negatives. As part of the broader Core Address Parsing & Standardization discipline, the normalization layer is the foundation everything else sits on. A lightweight pre-filter that rejects records with missing state codes or ZIP lengths outside 5–9 digits protects against batch rejections and reduces API spend.

import re
from typing import Optional

_ZIP_RE = re.compile(r"^\d{5}(?:-\d{4})?$")
_STATE_CODES = frozenset([
    "AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
    "IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV",
    "NH","NJ","NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN",
    "TX","UT","VT","VA","WA","WV","WI","WY","DC","PR","GU","VI","AS","MP",
])

def normalize_for_cass(raw: dict) -> Optional[dict]:
    """
    Sanitize a raw address dict and map to USPS Publication 28 field names.
    Returns None when mandatory fields are absent or structurally invalid.
    """
    line1 = re.sub(r"[^\x20-\x7E]", "", raw.get("address1", "")).strip().upper()
    city  = re.sub(r"[^\x20-\x7E]", "", raw.get("city",  "")).strip().upper()
    state = re.sub(r"[^A-Za-z]", "", raw.get("state", "")).strip().upper()
    zip_  = re.sub(r"[^\d-]", "", raw.get("zip", "")).strip()

    if not line1 or not city:
        return None
    if state not in _STATE_CODES:
        return None
    if not _ZIP_RE.match(zip_):
        return None

    return {
        "address_line1": line1,
        "address_line2": re.sub(r"[^\x20-\x7E]", "", raw.get("address2", "")).strip().upper(),
        "city":  city,
        "state": state,
        "zip_code": zip_,
    }

Step 2 — Select a USPS-Certified Validation Endpoint

CASS compliance requires a licensed, annually recertified provider. The USPS maintains a public registry at USPS PostalPro, and only engines on that list may legally return DPV-confirmed results. Review the USPS CASS Certification Guidelines for a summary of what each certification tier covers and how to verify a vendor’s current status.

When selecting an endpoint, prioritize these capabilities:

Capability Why it matters
Batch size 1,000–50,000 per request Avoids excessive HTTP overhead on large jobs
Granular DPV codes (Y, D, S, M) Enables automated routing without human review of every record
LACSLink® and SuiteLink® coverage Converts rural route and highway-contract addresses to USPS-preferred forms
Retry-After header on 429 responses Enables safe exponential backoff without hard-coded waits
Idempotent request keys Lets you safely retry failed chunks without duplicating records

Step 3 — Implement the Validation Pipeline

The script below submits addresses in chunks, handles rate-limit responses with exponential backoff, and normalizes the provider response into a stable output schema. Replace CASS_ENDPOINT and API_KEY with your certified provider’s credentials.

import requests
import time
import logging
from typing import Any

logging.basicConfig(level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s")
logger = logging.getLogger(__name__)

CASS_ENDPOINT: str = "https://api.your-cass-provider.com/v1/validate"
API_KEY:       str = "YOUR_API_KEY"
CHUNK_SIZE:    int = 100
MAX_RETRIES:   int = 3
BASE_DELAY:  float = 2.0


def _backoff(attempt: int) -> float:
    """Exponential backoff capped at 30 s."""
    return min(BASE_DELAY * (2 ** attempt) + 0.5, 30.0)


def _map_result(addr_result: dict[str, Any]) -> dict[str, Any]:
    """Normalize a provider response item to a stable internal schema."""
    dpv_code: str = addr_result.get("dpv_confirmation_code", "")
    return {
        "original":      addr_result.get("input_address"),
        "standardized":  addr_result.get("standardized_address"),
        "zip4":          addr_result.get("zip4"),
        "carrier_route": addr_result.get("carrier_route"),
        "dpv_status":    dpv_code,
        "is_deliverable": dpv_code in ("Y", "D"),
    }


def validate_cass_batch(addresses: list[dict[str, str]]) -> list[dict[str, Any]]:
    """
    Submit addresses in CHUNK_SIZE batches to the CASS endpoint.
    Handles 429 rate-limit headers and transient network errors with
    exponential backoff. Logs and skips any chunk that exhausts retries.
    """
    session = requests.Session()
    session.headers.update({
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type":  "application/json",
        "Accept":        "application/json",
    })

    results: list[dict[str, Any]] = []

    for chunk_start in range(0, len(addresses), CHUNK_SIZE):
        chunk   = addresses[chunk_start : chunk_start + CHUNK_SIZE]
        payload = {"addresses": chunk}

        for attempt in range(MAX_RETRIES):
            try:
                resp = session.post(CASS_ENDPOINT, json=payload, timeout=30)
                resp.raise_for_status()
                data = resp.json()
                results.extend(_map_result(r) for r in data.get("validated_addresses", []))
                break

            except requests.exceptions.HTTPError as exc:
                status = exc.response.status_code if exc.response is not None else 0
                if status == 429:
                    wait = float(
                        exc.response.headers.get("Retry-After", _backoff(attempt))
                        if exc.response is not None else _backoff(attempt)
                    )
                    logger.warning("Rate limited — retrying in %.1f s", wait)
                    time.sleep(wait)
                    continue
                logger.error("HTTP %s — skipping chunk %d", status, chunk_start // CHUNK_SIZE)
                break

            except requests.exceptions.RequestException as exc:
                wait = _backoff(attempt)
                logger.warning("Request failed (attempt %d/%d): %s — retrying in %.1f s",
                               attempt + 1, MAX_RETRIES, exc, wait)
                time.sleep(wait)
        else:
            logger.error("Chunk %d exhausted all retries — skipped.", chunk_start // CHUNK_SIZE)

    return results

Vectorized usage with pandas — pass the normalized frame rows directly and attach results back as new columns:

import pandas as pd

def validate_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """
    Run CASS validation over a DataFrame with columns matching the
    USPS Publication 28 schema and return the frame with DPV columns appended.
    """
    records = df.to_dict(orient="records")
    validated = validate_cass_batch(records)
    result_df = pd.DataFrame(validated)
    return df.join(result_df[["zip4", "carrier_route", "dpv_status", "is_deliverable"]])

Step 4 — Evaluate DPV Flags and Route Exceptions

The DPV confirmation code is the most operationally significant field in the CASS response. It signals whether a given address corresponds to an actual USPS deliverable location.

Code Meaning Recommended action
Y Address confirmed, fully deliverable Pass to downstream systems immediately
D Default match — building confirmed, unit/suite missing Enrich via SuiteLink or flag for optional user correction
S Secondary information missing (e.g., apartment required) Trigger a secondary enrichment pass or prompt the data owner
M Missing primary number — address cannot be confirmed Send to manual review queue or dead-letter store

Note that the USPS does not return a standalone N code — undeliverable addresses surface as M (missing primary) or produce an empty response body. Some vendor extensions add N as a proprietary extension; treat it the same as M in your routing logic.

from enum import Enum

class DPVRoute(Enum):
    DOWNSTREAM  = "downstream"
    ENRICH      = "enrich"
    REVIEW      = "review"

def route_by_dpv(dpv_code: str) -> DPVRoute:
    """Return the processing route for a given DPV confirmation code."""
    if dpv_code == "Y":
        return DPVRoute.DOWNSTREAM
    if dpv_code in ("D", "S"):
        return DPVRoute.ENRICH
    return DPVRoute.REVIEW

Step 5 — Geocoding Enrichment and Fallback Routing

Once CASS validation returns a Y or D code, append spatial coordinates using your geocoding provider. Many CASS vendors bundle geocoding in their response; if your pipeline separates the two steps, chain the standardized address into a geocoding API call and cache successful lookups under a composite key of standardized_line1 + zip4 to avoid redundant requests.

For records that reach the REVIEW route, a three-tier fallback prevents silent data loss:

  1. Fuzzy match against your internal CRM or ERP address table using normalized Levenshtein distance on the street component.
  2. Cross-reference with a third-party dataset (OpenStreetMap Nominatim, commercial parcel data) for independent deliverability confirmation.
  3. Quarantine records that still cannot be resolved — store them in a dead-letter queue with the original input, the DPV response, and a timestamp for periodic human review.

For teams building more complex multi-provider workflows, the Implementing Fallback Chains for Failed Lookups guide covers how to wire this routing logic into a provider-agnostic fallback chain.


Edge Cases and Failure Modes

PO Boxes and Rural Routes Return M When Misformatted

PO Boxes and Rural Routes follow their own Publication 28 rules. RR 2 BOX 15 must appear in address_line1 exactly as USPS expects — not embedded in a secondary-unit field or fused with a street address. Misformatted entries produce M codes even though they represent valid delivery points. The Handling PO Boxes and Rural Routes guide covers the exact regex patterns to detect and reformat these before submission.

_PO_BOX_RE  = re.compile(r"^P\.?\s*O\.?\s*BOX\s+(\d+)$", re.IGNORECASE)
_RURAL_RE   = re.compile(r"^RR\s+(\d+)\s+BOX\s+(\d+)$", re.IGNORECASE)

def classify_address_type(line1: str) -> str:
    """Return 'po_box', 'rural_route', or 'street' for routing decisions."""
    if _PO_BOX_RE.match(line1):
        return "po_box"
    if _RURAL_RE.match(line1):
        return "rural_route"
    return "street"

State/ZIP Mismatches Cause Silent Overrides

If the state code in a record does not match the ZIP prefix, a CASS engine may silently correct the state without flagging the change. Always log the state_corrected flag returned by the provider and route mismatches to a reconciliation queue for data-quality audits.

Unicode Characters in Legacy Exports

Legacy CRM exports in Windows-1252 or ISO-8859-1 produce silent byte corruption when fed directly to a CASS engine that expects UTF-8. Handling Special Characters in Global Address Data covers the chardet-based detection and NFKC normalization approach that eliminates this at the ingestion boundary before records ever reach the normalization step described in Step 1.


Integration Note

This pipeline sits inside the broader USPS CASS Certification Guidelines workflow, which describes vendor selection criteria, annual recertification cycles, and monitoring protocols. The normalization step (Step 1) feeds directly from the regex-based pre-processing patterns in Regex Patterns for US Address Parsing, and the fallback routing in Step 5 connects naturally to Implementing Fallback Chains for Failed Lookups for teams that route unresolved records across multiple geocoding providers.