## Cleaned Address Table
| Original | Street | City | State | ZIP | Country | Issues |
|---|---|---|---|---|---|---|
| 123 main st. apt 4, new york NY 10001 | 123 Main Street Apt 4 | New York | NY | 10001 | USA | Country inferred from state |
| 456 OAK AVENUE,Los angeles,ca 90210 | 456 Oak Avenue | Los Angeles | CA | 90210 | USA | Country inferred from state |
| 789 elm rd chicago illinois | 789 Elm Road | Chicago | IL | | | Missing ZIP; state spelled out (converted to IL); no delimiters |
---
## Reusable Excel Formulas
**1. Trim & Proper-case the street:**
```excel
=PROPER(TRIM(SUBSTITUTE(SUBSTITUTE(A2,"."," "),",",", ")))
```
Cleans punctuation, collapses spaces, standardizes capitalization.
**2. Expand street abbreviations:**
```excel
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2," St "," Street ")," Ave "," Avenue ")," Rd "," Road ")
```
Swaps common abbreviations for full words (add Blvd, Dr, Ln as needed).
**3. Split comma-delimited address into parts (Excel 365):**
```excel
=TEXTSPLIT(A2,",")
```
Returns Street, City, State+ZIP as separate spill cells.
**4. Extract 5-digit ZIP from any cell:**
```excel
=IFERROR(TEXT(LOOKUP(99^9,--MID(A2,ROW($1:$40),5)),"00000"),"")
```
Pulls the first valid 5-digit ZIP; blank if none found.
**5. Convert full state name to 2-letter code:**
```excel
=IFERROR(VLOOKUP(PROPER(TRIM(D2)),StateLookup,2,FALSE),UPPER(D2))
```
Uses a named range `StateLookup` (col1=full name, col2=code); falls back to uppercase input.
**6. Flag rows with missing fields:**
```excel
=TEXTJOIN("; ",TRUE,IF(B2="","Missing Street",""),IF(C2="","Missing City",""),IF(E2="","Missing ZIP",""))
```
Concatenates issue flags for QA review before CRM import.
Parse and Clean Messy Address Columns in Excel
Tested prompts for clean address data in excel compared across 5 leading AI models.
You have a spreadsheet with addresses crammed into one column, or split awkwardly across five columns, and you need them clean before a mail merge, a CRM import, or a geocoding job. Excel formulas can handle predictable patterns, but real address data is rarely predictable. You get inconsistent abbreviations (St, Street, STR), missing ZIP codes, apartment numbers glued to street names, country names in some rows but not others, and the occasional phone number dropped in by accident.
This page shows how to use an AI prompt to parse and standardize messy address columns into structured fields (street, unit, city, state, postal code, country). You paste the prompt, feed it your column, and get back a clean table you can drop straight into Excel. Below you will find the tested prompt, outputs from four models compared side by side, plus guidance on when this beats formulas or Power Query, and where it falls short.
When to use this
Use an AI prompt when addresses are inconsistent enough that no single formula or delimiter works. If you have mixed formats, free-text entries, or multi-country data, AI handles the judgment calls (is '100 N Main' a street or a building name?) that regex cannot. It is ideal for one-off cleanups or batches under a few thousand rows.
- Cleaning a CRM export before importing into HubSpot, Salesforce, or Mailchimp
- Preparing a mailing list where some rows have full addresses and others have just 'NYC office'
- Standardizing vendor or customer address fields collected from web forms with no validation
- Splitting a single 'Address' column with 5,000 rows into street, city, state, ZIP for a pivot report
- Normalizing international addresses that mix US, UK, and EU formats
When this format breaks down
- Datasets over 50,000 rows where token costs and latency exceed the value of cleaning. Use Power Query or a dedicated address API like Smarty or Google Geocoding.
- When you need verified deliverability (real USPS-validated addresses). AI will confidently reformat a nonexistent address. Use an address verification service.
- When the data is already well structured and just needs casing or trimming. A simple PROPER(), TRIM(), or Flash Fill does it faster.
- Regulated data (HIPAA, patient records) where you cannot send raw addresses to a third-party model without a BAA in place.
The prompt we tested
You are an expert data cleaning assistant specializing in Excel address normalization. Parse and clean the messy address data provided below by splitting it into standardized components (Street, City, State, ZIP, Country), fixing capitalization, expanding or standardizing abbreviations (e.g., 'St' → 'Street', 'CA' stays as state code), removing extra whitespace and stray punctuation, and flagging any rows with missing or ambiguous fields. Output a pipe-delimited table with columns: Original | Street | City | State | ZIP | Country | Issues. Then below the table, provide 3-5 reusable Excel formulas with brief one-line explanations. Keep the total response under 300 words and do not invent missing data—leave blank and note it in the Issues column. Messy address data: Here's a sample from my contacts spreadsheet column A: '123 main st. apt 4, new york NY 10001', '456 OAK AVENUE,Los angeles,ca 90210', '789 elm rd chicago illinois'. I need to split these into separate columns and standardize the formatting before importing into our CRM. Return the cleaned output as a pipe-delimited table ready to paste into Excel, followed by the exact Excel formulas (TEXTSPLIT, TRIM, PROPER, SUBSTITUTE, etc.) the user can apply to replicate this cleaning on the full column.
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 9 | $0.0713 | 13.7s | 470 / 857 |
| Claude Haiku 4.5 | 8 | $0.0030 | 5.3s | 320 / 541 |
| GPT-5.4 | Not run yet | |||
| Gemini 2.5 Pro | Not run yet | |||
| Grok 4.1 Fast Reasoning | 6 | $0.0003 | 12.8s | 431 / 350 |
Outputs
# Cleaned Address Data
| Original | Street | City | State | ZIP | Country | Issues |
|---|---|---|---|---|---|---|
| 123 main st. apt 4, new york NY 10001 | 123 Main Street Apt 4 | New York | NY | 10001 | USA | None |
| 456 OAK AVENUE,Los angeles,ca 90210 | 456 Oak Avenue | Los Angeles | CA | 90210 | USA | None |
| 789 elm rd chicago illinois | 789 Elm Road | Chicago | IL | | USA | Missing ZIP; State inferred from city |
---
## Reusable Excel Formulas
**Formula 1: Extract & Standardize Street Address**
```
=PROPER(TRIM(LEFT(A1,FIND(",",A1)-1)))
```
Extracts text before first comma, removes extra spaces, and applies title case.
**Formula 2: Extract City (between 1st and 2nd comma)**
```
=PROPER(TRIM(MID(A1,FIND(",",A1)+1,FIND(",",A1,FIND(",",A1)+1)-FIND(",",A1)-1)))
```
Isolates city between first and second comma, trims, and capitalizes properly.
**Formula 3: Extract State Code**
```
=UPPER(TRIM(RIGHT(LEFT(A1,FIND(",",A1,FIND(",",A1)+1)-1),2)))
```
Extracts last 2 characters before final comma as state abbreviation in uppercase.
**Formula 4: Extract ZIP Code**
```
=IFERROR(TRIM(RIGHT(A1,5)),"")
```
Pulls rightmost 5 characters as ZIP; returns blank if unavailable.
**Formula 5: Standardize Abbreviations (St/Rd/Ave)**
```
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"st.","Street"),"rd","Road"),"ave","Avenue")
```
Replaces common street abbreviations with full words before other processing.
---
**Note:** These formulas assume consistent comma-delimited formatting. Rows without ZIPs or ambiguous state/city pairings are flagged for manual review before CRM import.
```
Original|Street|City|State|ZIP|Country|Issues
123 main st. apt 4, new york NY 10001|123 Main Street Apt 4|New York|NY|10001||
456 OAK AVENUE,Los angeles,ca 90210|456 Oak Avenue|Los Angeles|CA|90210||
789 elm rd chicago illinois|789 Elm Road|Chicago|IL|| |Missing ZIP
```
Excel formulas (assume original in A2:A4; drag down):
1. Clean base: `=TRIM(PROPER(SUBSTITUTE(SUBSTITUTE(A2,"st.","Street"),"rd","Road")))`
Proper cases & expands basic abbrevs.
2. Street (exc. city/state/ZIP): `=TRIM(LEFT(TEXTSPLIT(TRIM(PROPER(A2)),",")(1),FIND(",",SUBSTITUTE(A2," ",REPT(" ",50),LEN(A2)-LEN(SUBSTITUTE(A2," ",""))))-1))`
Splits at comma, takes left part.
3. ZIP: `=TRIM(IFERROR(INDEX(TEXTSPLIT(A2," "),1,COUNTA(TEXTSPLIT(A2," ")))," "))`
Extracts last space-separated token if numeric-like.
4. State (last 2 pre-ZIP): `=UPPER(TRIM(MID(TEXTSPLIT(A2,",")(2),FIND(" ",TEXTSPLIT(A2,",")(2))+1,2)))`
Grabs 2-letter code post-city.
5. Issues flag: `=IF(OR(ISBLANK(D2),LEN(D2)<5),"Missing ZIP","")`
Flags blank/short ZIPs.
What makes these work
-
01Specify the output columns
Tell the model exactly which fields you want: Street, Unit, City, State, ZIP, Country. If you leave it open, different rows get different schemas and the paste into Excel breaks. Lock the column order in the prompt.
-
02Give it a format example
One or two input-output pairs in the prompt anchor the abbreviation style (St vs Street, CA vs California). Without an example the model picks its own convention and you spend time normalizing the normalization.
-
03Handle missing data explicitly
Instruct the model to leave cells blank rather than guess. Otherwise it will hallucinate ZIP codes or assume 'US' for a UK address. Add: 'If a field is not present in the source, leave it empty.'
-
04Batch in chunks of 50-100 rows
Large pastes cause models to drop rows or truncate. Feed the data in blocks, paste results into Excel, and repeat. Keep a row count check so you catch dropped records.
More example scenarios
123 main st apt 4b, brooklyn ny 11201 456 Oak Avenue Suite 200 Chicago IL 60601 789 elm rd, austin tx 1000 Market St. #501, San Francisco, California 94103 USA
| Street | Unit | City | State | ZIP | Country | |---|---|---|---|---|---| | 123 Main St | Apt 4B | Brooklyn | NY | 11201 | US | | 456 Oak Ave | Suite 200 | Chicago | IL | 60601 | US | | 789 Elm Rd | | Austin | TX | | US | | 1000 Market St | #501 | San Francisco | CA | 94103 | US |
Acme Ltd, 10 Downing Street, London SW1A 2AA, United Kingdom muller gmbh // hauptstrasse 15 // 10115 berlin germany Sakura Corp, 1-2-3 Shibuya, Tokyo 150-0002, JP
| Company | Street | City | Postal Code | Country | |---|---|---|---|---| | Acme Ltd | 10 Downing Street | London | SW1A 2AA | United Kingdom | | Muller GmbH | Hauptstrasse 15 | Berlin | 10115 | Germany | | Sakura Corp | 1-2-3 Shibuya | Tokyo | 150-0002 | Japan |
John Smith - 22 pine street apt2, boston, ma 02108 - phone 617-555-0100 sarah lee | PO BOX 445 seattle wa | sarah@email.com Dr. R. Patel, 5th floor 88 King's Cross Rd London WC1X 9DE
| Name | Street | Unit | City | State/Region | Postal Code | |---|---|---|---|---|---| | John Smith | 22 Pine St | Apt 2 | Boston | MA | 02108 | | Sarah Lee | PO Box 445 | | Seattle | WA | | | Dr. R. Patel | 88 King's Cross Rd | 5th Floor | London | | WC1X 9DE |
1420 Ocean Blvd Unit 7, Miami FL 33139 1420 ocean blvd #7 miami fl 33139 1420 OCEAN BOULEVARD APARTMENT 7 MIAMI FLORIDA 33139
All three rows normalize to the same canonical record: | Street | Unit | City | State | ZIP | |---|---|---|---|---| | 1420 Ocean Blvd | Unit 7 | Miami | FL | 33139 | This makes deduplication trivial with a COUNTIFS on the combined fields.
ship to: warehouse 3, 4500 industrial pkwy bldg c dock 12, fort worth tx 76106 consignee- smithco, 99 cedar ln ste 400, atlanta ga 30303
| Recipient | Street | Building/Unit | City | State | ZIP | |---|---|---|---|---|---| | Warehouse 3 | 4500 Industrial Pkwy | Bldg C Dock 12 | Fort Worth | TX | 76106 | | Smithco | 99 Cedar Ln | Ste 400 | Atlanta | GA | 30303 |
Common mistakes to avoid
-
Trusting the ZIP code output
AI will invent a plausible ZIP when the source is missing one. Always diff the output against the input for any field the model 'completed'. Flag rows where the output has a value the input did not.
-
Mixing countries in one pass
Asking for a single US-style schema on international data mangles postal codes and collapses regions into the State column. Split by country first, or include a Country column and let the model use region-appropriate fields.
-
Skipping the TRIM and CLEAN step
Non-breaking spaces and invisible characters from copy-paste confuse the model and the downstream VLOOKUP. Run TRIM(CLEAN()) on the column before sending, or ask the model to strip them explicitly.
-
No dedupe check after cleaning
Cleaning exposes duplicates that looked different before (e.g., 'St' vs 'Street'). If you skip the dedupe pass after normalization, you import the same record twice into your CRM.
-
Pasting as plain text into Excel
Markdown tables paste as a single column. Use Data > Text to Columns with pipe delimiter, or ask the model for tab-separated output so Excel splits it automatically.
Related queries
Frequently asked questions
Can I clean address data in Excel without using AI?
Yes, for consistent data. Text to Columns, Flash Fill, and Power Query handle predictable patterns. Formulas like LEFT, MID, FIND, and TEXTSPLIT work when delimiters are reliable. AI is only worth it when the data is inconsistent enough that no rule covers every row.
How do I split a full address into street, city, state, and ZIP in Excel?
If the format is consistent (comma-separated, US style), use TEXTSPLIT(A1, {',',' '}) or Power Query's Split Column. For inconsistent data, paste the column into the AI prompt on this page and request the four fields as output columns.
Will AI validate that the addresses actually exist?
No. AI formats and parses, but it does not verify deliverability. For USPS-validated or geocoded addresses use Smarty, Google Maps Geocoding API, or Melissa. Run AI cleanup first, then validation, to save API calls on unparseable junk.
How many rows can I clean at once?
Most models handle 50 to 200 rows per prompt reliably. Beyond that you risk truncation or dropped rows. For files over 1,000 rows, loop through batches or use a script with the API rather than pasting into a chat interface.
What about apartment numbers that get stuck to the street name?
The prompt extracts units into a separate column even when there is no delimiter (e.g., '123 Main St Apt 4' or '123 Main St #4'). Specify 'Unit' as an output column and include one example with a glued unit so the model knows the pattern.
Is it safe to send customer addresses to an AI model?
It depends on your provider and your compliance requirements. Check whether the vendor trains on your inputs (OpenAI, Anthropic, and Google offer no-training enterprise tiers). For HIPAA or GDPR-sensitive data, use a provider with a signed DPA or BAA, or run a local model.