As part of the Core Address Parsing & Standardization pipeline, the Coding Accuracy Support System (CASS) is the USPS compliance framework that transforms raw, inconsistent US address input into deterministic, deliverability-confirmed records. Where other standardization steps clean and parse, CASS is the authoritative validation layer: it assigns ZIP+4 extensions, Delivery Point Validation (DPV) codes, and Carrier Route identifiers that downstream mailing, logistics, and geocoding systems depend on.
Prerequisites
Production Workflow
CASS compliance is a deterministic sequence. Skipping or reordering stages causes DPV mismatches or certification test failures.
Step 1 — Ingest and sanitize
Pull raw records from source systems (CRM, ERP, web forms, legacy databases). At the ingestion boundary:
- Detect and transcode non-UTF-8 encodings (
chardetor explicit codec declarations coverWindows-1252andISO-8859-1export artifacts from legacy systems). - Strip non-printable characters and normalize whitespace to a single space.
- Reject records missing a primary number and street name before they enter the normalization queue.
Use streaming parsers (polars scan, chunked pandas iteration, or generator-based CSV reads) for high-volume ingestion to avoid memory bottlenecks.
Step 2 — Pre-normalize against Publication 28
Convert colloquial inputs into forms the CASS engine can match against its reference tables:
- Expand directional abbreviations:
N→NORTH,SW→SOUTHWEST - Standardize suffix variants:
ST→STREET,AVE→AVENUE,BLVD→BOULEVARD - Enforce canonical secondary-unit designators:
#123→APT 123,123B→APT B - Validate city–state–ZIP triads; flag cross-state ZIP discrepancies for the DPV routing queue
CASS only processes domestic US records. If your pipeline receives mixed-country data, apply International Address Format Standardization to route non-US records to appropriate regional parsers before reaching this step.
For inputs that include PO Boxes or Rural Routes, enforce Publication 28 canonical forms before submission — PO BOX <n> and RR <n> BOX <n>. The Handling PO Boxes and Rural Routes guide covers the extraction patterns and edge cases specific to those address types.
Step 3 — Validate and append (DPV, ZIP+4, Carrier Route)
Route the standardized payload to your CASS vendor endpoint. The engine returns:
| Field | Meaning |
|---|---|
dpv_code |
Deliverability verdict: Y, D, S, or M (see table below) |
zip4 |
Four-digit ZIP extension (e.g. 1234) |
carrier_route |
Delivery route code (e.g. C001 city, R001 rural, B001 PO Box) |
dpv_footnotes |
Supplementary flags — vacant, seasonal, military, throwback |
standardized_line1 |
CASS-corrected primary address line |
standardized_city |
USPS-preferred city name |
standardized_state |
Two-letter state abbreviation |
standardized_zip |
Corrected five-digit ZIP |
Parse dpv_footnotes in addition to dpv_code. A dpv_code of S with a footnote of H (unit missing but building confirmed) has a different recovery path than S with footnote N (no match found at all).
Step 4 — Route by DPV code
| DPV Code | Meaning | Action |
|---|---|---|
Y |
Exact match — primary and secondary confirmed deliverable | Write to output |
D |
Default — building confirmed, unit not verified | Write to output; flag for secondary-unit enrichment |
S |
Secondary missing — building exists, unit absent or ambiguous | Route to manual review queue |
M |
Primary missing — no match for primary number + street | Route to fallback or discard |
For S and M codes, consider routing through a multi-API fallback chain before discarding the record — a second geocoding provider may resolve ambiguous addresses that the CASS engine cannot confirm against its reference tables.
Step 5 — Write validated records
Enforce strict typing in the output schema:
zip4asVARCHAR(4)— never numeric (leading zeros are valid)dpv_codeas a categorical enumcarrier_routeasVARCHAR(4)
Store an idempotent hash of the original input alongside the CASS response for reconciliation. This enables efficient deduplication on re-runs and simplifies debugging when upstream schema changes produce unexpected DPV regressions.
The Step-by-Step Guide to CASS Address Validation provides the exact API call sequences and error-handling routines for each vendor endpoint.
Primary Code Implementation
"""
cass_pipeline.py — Production CASS validation with async batching and Pydantic v2 validation.
Requires: httpx>=0.27, pydantic>=2.0, tenacity>=8.0
"""
import hashlib
import logging
import uuid
from typing import Literal, Optional
import httpx
from pydantic import BaseModel, Field, ValidationError
from tenacity import retry, stop_after_attempt, wait_exponential
logger = logging.getLogger("cass_pipeline")
class RawAddress(BaseModel):
address_line_1: str
address_line_2: Optional[str] = None
city: str
state: str
zip: str
country: str = "US"
def input_hash(self) -> str:
"""Stable idempotency key for deduplication and reconciliation."""
raw = f"{self.address_line_1}|{self.address_line_2}|{self.city}|{self.state}|{self.zip}"
return hashlib.sha256(raw.encode()).hexdigest()[:16]
class CASSResponse(BaseModel):
dpv_code: Literal["Y", "D", "S", "M"]
dpv_footnotes: Optional[str] = None
zip4: Optional[str] = Field(default=None, min_length=4, max_length=4)
carrier_route: Optional[str] = None
standardized_line1: str
standardized_city: str
standardized_state: str
standardized_zip: str
@property
def is_deliverable(self) -> bool:
return self.dpv_code in ("Y", "D")
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=30))
async def _post_batch(
client: httpx.AsyncClient,
api_endpoint: str,
api_key: str,
addresses: list[dict],
request_id: str,
) -> list[dict]:
"""POST a single batch; raises httpx.HTTPStatusError on non-2xx responses."""
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"X-Request-ID": request_id,
}
payload = {"addresses": addresses, "options": {"cass": True, "dpv": True}}
resp = await client.post(api_endpoint, json=payload, headers=headers, timeout=30.0)
resp.raise_for_status()
return resp.json().get("results", [])
async def validate_address_batch(
client: httpx.AsyncClient,
addresses: list[RawAddress],
api_endpoint: str,
api_key: str,
chunk_size: int = 500,
) -> list[tuple[RawAddress, Optional[CASSResponse]]]:
"""
Validate a list of RawAddress records via CASS and return paired results.
Returns a list of (input, CASSResponse | None) tuples.
None indicates a validation schema mismatch or non-retryable API error.
"""
results: list[tuple[RawAddress, Optional[CASSResponse]]] = []
for offset in range(0, len(addresses), chunk_size):
chunk = addresses[offset : offset + chunk_size]
request_id = str(uuid.uuid4())
payload = [a.model_dump() for a in chunk]
try:
raw_results = await _post_batch(client, api_endpoint, api_key, payload, request_id)
except httpx.HTTPStatusError as exc:
logger.error(
"CASS batch failed: status=%s request_id=%s chunk_offset=%d",
exc.response.status_code,
request_id,
offset,
)
results.extend((addr, None) for addr in chunk)
continue
for addr, item in zip(chunk, raw_results):
try:
cass = CASSResponse.model_validate(item)
results.append((addr, cass))
logger.info(
"dpv=%s zip4=%s route=%s hash=%s",
cass.dpv_code,
cass.zip4,
cass.carrier_route,
addr.input_hash(),
)
except ValidationError as exc:
logger.warning("Schema mismatch for %s: %s", addr.input_hash(), exc)
results.append((addr, None))
return results
Vectorized pandas variant
import asyncio
import httpx
import pandas as pd
from cass_pipeline import RawAddress, CASSResponse, validate_address_batch
async def validate_dataframe(
df: pd.DataFrame,
api_endpoint: str,
api_key: str,
) -> pd.DataFrame:
"""
Accepts a DataFrame with columns matching RawAddress fields.
Returns the original DataFrame with CASS result columns appended.
"""
addresses = [RawAddress(**row) for row in df.to_dict(orient="records")]
async with httpx.AsyncClient() as client:
pairs = await validate_address_batch(client, addresses, api_endpoint, api_key)
records = []
for addr, cass in pairs:
if cass:
records.append({
"dpv_code": cass.dpv_code,
"dpv_footnotes": cass.dpv_footnotes,
"zip4": cass.zip4,
"carrier_route": cass.carrier_route,
"standardized_line1": cass.standardized_line1,
"standardized_city": cass.standardized_city,
"standardized_state": cass.standardized_state,
"standardized_zip": cass.standardized_zip,
"cass_hash": addr.input_hash(),
})
else:
records.append({k: None for k in [
"dpv_code", "dpv_footnotes", "zip4", "carrier_route",
"standardized_line1", "standardized_city", "standardized_state",
"standardized_zip", "cass_hash",
]})
cass_df = pd.DataFrame(records)
return pd.concat([df.reset_index(drop=True), cass_df], axis=1)
USPS DPV Footnote Reference
The dpv_footnotes string carries one or more two-character codes. The most operationally significant:
| Code | Meaning | Recommended action |
|---|---|---|
AA |
Input ZIP + city/state matched a valid ZIP | Proceed |
A1 |
ZIP not matched — ZIP correction applied | Log correction; audit upstream |
BB |
Entire address DPV confirmed (code Y) |
Write to output |
CC |
Primary number invalid; corrected by engine | Log correction |
N1 |
Address missing secondary number (apt/unit) | Route to secondary-unit enrichment |
M1 |
Primary number missing | Manual review queue |
M3 |
Primary number invalid | Manual review queue |
P1 |
PO Box zip code was assigned | Verify intent |
RR |
Confirmed rural route address | Proceed |
R1 |
Rural route default — RR found, box not confirmed | Verify box number |
H# |
Unit number confirmed (H3 = exact, H6 = only highrise default) |
Proceed / flag |
F1 |
Military address (APO, FPO, DPO) |
Route to military-mail path |
G1 |
General delivery address | Flag; not a standard residential/commercial delivery |
U1 |
Unique ZIP code (campus, firm, USPS facility) | Proceed; typically confirmed |
Edge Cases
Secondary unit ambiguity
CASS requires explicit, recognized unit designators (APT, STE, UNIT, FL, RM, BLDG). Inputs like #123, 123B, or Apt. 4 (with punctuation) fail to match despite referring to valid delivery points. Normalize secondary components to {DESIGNATOR} {VALUE} format before submission.
import re
_SECONDARY_RE = re.compile(
r"(?P<pre>.*?)\s*(?:#|No\.?)\s*(?P<num>\d+[A-Za-z]?)\s*$",
re.IGNORECASE,
)
def normalize_secondary(line: str) -> str:
"""Convert '#123' or 'No. 4B' patterns to 'APT {num}' for CASS compatibility."""
m = _SECONDARY_RE.match(line.strip())
if m:
return f"{m.group('pre').strip()} APT {m.group('num').upper()}".strip()
return line
PO Box and Rural Route formatting
CASS processes these delivery types differently from street addresses. Submissions must strictly follow Publication 28 canonical forms. Refer to Handling PO Boxes and Rural Routes for extraction patterns covering colloquial variants like P.O. Box, Post Office Box, and Rt. 2 Bx 15.
State / ZIP mismatch
If a record’s state abbreviation does not correspond to the ZIP code prefix ranges, the CASS engine either rejects the record or silently overrides the state. Always log standardized_state alongside original_state and route any mismatch to an audit queue — these often reveal upstream data-entry errors or multi-state ZIP codes near state borders.
Unicode and encoding drift
Legacy CRM or ERP exports frequently use Windows-1252 or ISO-8859-1. Characters like Ñ, smart quotes, or accented vowels corrupt silently if not transcoded at ingest. Apply chardet detection at the file-open boundary and encode explicitly to UTF-8 before any string operations. If the addresses contain non-ASCII characters that survived transcoding, also apply Unicode and character normalization (NFKC) before CASS submission to collapse ligatures and compatibility forms.
Batch size and silent truncation
Vendors typically enforce payload caps between 1,000 and 5,000 records per request. Exceeding the cap silently truncates the result set in some implementations; others return HTTP 413. Always chunk defensively (500 per batch is safe across all major vendors) and assert len(results) == len(chunk) after each API call to detect truncation immediately.
Performance and Vectorization
| Approach | Throughput (records/sec) | Notes |
|---|---|---|
Synchronous requests loop |
~20–80 | Baseline; unsuitable for volumes above 10k |
httpx async with asyncio.gather |
~800–2,000 | Saturates most vendor rate limits; add semaphore |
httpx async + semaphore (50 concurrent) |
~400–600 | Respects typical vendor quotas; preferred default |
Parallel processes (multiprocessing) |
~2,000–5,000 | Only worthwhile above 500k records/run |
Practical recommendations:
- Use
asyncio.Semaphore(50)to cap concurrent requests and avoid429 Too Many Requestsresponses. - Prefer
polarsfor pre/post-processing:polarslazy evaluation and Arrow-backed columns process 1M-row address frames in under 10 seconds on commodity hardware, versus 45–90 seconds withpandas. - Cache CASS results by input hash in Redis (TTL 30 days). Re-runs on incrementally updated CRM exports typically see 60–80% cache hit rates, cutting API costs proportionally. See API Quota Tracking and Cost Management for budget guardrails around per-call vendor costs.
Certification Testing and Maintenance
Annual recertification cycle
The USPS releases updated test datasets each year containing newly constructed streets, retired delivery points, and edge cases added from real-world failure reports. To maintain certification:
- Download the official test suite from the USPS PostalPro portal.
- Run your engine against the full dataset in a staging environment that mirrors production configuration exactly.
- Achieve ≥ 98% accuracy on DPV matching and ZIP+4 assignment.
- Submit results via the vendor portal or direct USPS submission system before the certification deadline.
Monthly database updates
Address data decays rapidly — the USPS estimates 14–18% of addresses change annually. Automate monthly reference-table updates:
- Schedule updates during low-traffic windows (02:00–04:00 UTC).
- Use database transactions to swap reference tables atomically; partial-state queries during an update produce transient DPV failures that are difficult to distinguish from structural errors.
- Tag versions (
v2026.05,v2026.06) to enable rollback if a vendor release introduces regressions.
Continuous monitoring
Deploy dashboards tracking:
cass_api_latency_p99— alert at > 2 sdpv_match_rate— alert if drops below 95%; common causes: stale vendor data, upstream schema drift, or secondary-unit normalization regressionerror_rate_by_code— broken down byS,M, and API failuresbatch_throughput_records_per_second— baseline this at deploy and alert on sustained 20% drops
Correlate latency spikes with vendor status pages to distinguish internal bottlenecks from external outages before escalating.
Troubleshooting
DPV match rate drops suddenly
Root cause: Monthly vendor database update introduced a regression, or upstream data schema changed silently (new source system exporting state as full name rather than two-letter code).
Fix: Roll back to the previous vendor version tag. Run the recertification test suite against both versions to confirm the regression. If the upstream schema changed, update the pre-normalization layer and re-process the affected date range.
API returns HTTP 413 errors
Root cause: Batch payload exceeds vendor’s undocumented size limit (some vendors count bytes, not records).
Fix: Reduce chunk_size to 200 and add a payload-size guard: assert len(json.dumps(payload)) < 512_000. Dynamic chunking based on average record size is more robust than a fixed record count.
zip4 field is None on confirmed deliverable records
Root cause: The address matched at the building level (DPV code D) but a unique ZIP+4 cannot be assigned without a confirmed unit. Common for new construction or recently subdivided parcels.
Fix: This is expected behavior. Store the record with zip4=None and flag it for periodic re-validation — once the USPS adds the unit to its reference tables (typically within one to two monthly cycles), a re-submission will return a full ZIP+4.
Silent state override in output
Root cause: The CASS engine corrected a state–ZIP mismatch by trusting the ZIP code over the submitted state field.
Fix: Always log original_state alongside standardized_state. Route any mismatch to a data-quality queue. Multi-state ZIP codes (ZIP codes near state borders assigned to post offices in the neighboring state) are a common cause; maintain a lookup table of known cross-border ZIPs to reduce false-positive audit flags.
tenacity retries exhausted on large batch jobs
Root cause: Sustained 429 Too Many Requests responses indicate the concurrency cap needs to be lowered, or the vendor’s daily quota has been reached.
Fix: Reduce asyncio.Semaphore count to 20 and add a daily quota guard using the approach in API Quota Tracking and Cost Management. If the quota is genuinely exhausted, checkpoint the current offset and resume the next day; the input_hash idempotency key prevents double-billing previously processed records.
FAQ
Do I need to obtain CASS certification directly from USPS or can I use a vendor?
Most production teams use a CASS-certified vendor (Smarty, Melissa, Loqate, etc.) rather than pursuing direct USPS certification. Direct certification requires annual testing submissions and is typically reserved for large mailers or software vendors distributing certified engines. Using a certified vendor API is compliant for downstream pipelines.
What DPV code means an address is fully deliverable?
A DPV code of Y (confirmed deliverable to the exact unit) is the gold standard. D (default — building confirmed, unit not verified) is considered conditionally deliverable. S (secondary missing) and M (missing primary number) require human review or fallback routing before treating the record as deliverable.
How often does the USPS address database change?
USPS issues monthly address data updates (NCOALink, AMS, DPV reference files). New subdivisions, renamed streets, and retired delivery points appear within one to two update cycles. Vendors propagate these within days of the USPS release date. Pipelines that skip monthly updates can see DPV match rates drop by 1–3 percentage points per quarter as the reference data ages.
Can CASS validation handle PO Boxes and Rural Routes?
Yes, but only if they are formatted per USPS Publication 28 before submission. PO Box inputs must use the canonical form PO BOX <number>, and Rural Routes must follow the RR <n> BOX <n> convention. Colloquial variants like P.O. Box or Route 2 Box 15 trigger M or S DPV codes despite referring to valid delivery points.
What happens to non-US addresses sent to a CASS engine?
A CASS engine has no reference data for non-US addresses and will return a non-deliverable DPV code or an error. You must branch before the CASS call: detect country, route domestic US records to CASS, and send international records to a separate normalization path. Failing to branch contaminates DPV match-rate metrics with structurally unresolvable failures.
Related
- Step-by-Step Guide to CASS Address Validation — exact API call sequences, request payloads, and error-handling routines for each major CASS vendor endpoint
- Regex Patterns for US Address Parsing — pre-normalization patterns for street numbers, directional prefixes, and unit designators that feed the CASS sanitization stage
- Handling PO Boxes and Rural Routes — extraction patterns and canonical formatting for the address types CASS handles separately from street addresses
- International Address Format Standardization — the routing branch for non-US records that must be separated from the CASS path before validation
- API Quota Tracking and Cost Management — budget guardrails and Redis-backed counters for managing per-call CASS vendor costs across high-volume batch jobs