Parse and Clean Messy Address Columns in Excel

Tested prompts for clean address data in excel compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 9/10

You have a spreadsheet with addresses crammed into one column, or split awkwardly across five columns, and you need them clean before a mail merge, a CRM import, or a geocoding job. Excel formulas can handle predictable patterns, but real address data is rarely predictable. You get inconsistent abbreviations (St, Street, STR), missing ZIP codes, apartment numbers glued to street names, country names in some rows but not others, and the occasional phone number dropped in by accident.

This page shows how to use an AI prompt to parse and standardize messy address columns into structured fields (street, unit, city, state, postal code, country). You paste the prompt, feed it your column, and get back a clean table you can drop straight into Excel. Below you will find the tested prompt, outputs from four models compared side by side, plus guidance on when this beats formulas or Power Query, and where it falls short.

When to use this

Use an AI prompt when addresses are inconsistent enough that no single formula or delimiter works. If you have mixed formats, free-text entries, or multi-country data, AI handles the judgment calls (is '100 N Main' a street or a building name?) that regex cannot. It is ideal for one-off cleanups or batches under a few thousand rows.

  • Cleaning a CRM export before importing into HubSpot, Salesforce, or Mailchimp
  • Preparing a mailing list where some rows have full addresses and others have just 'NYC office'
  • Standardizing vendor or customer address fields collected from web forms with no validation
  • Splitting a single 'Address' column with 5,000 rows into street, city, state, ZIP for a pivot report
  • Normalizing international addresses that mix US, UK, and EU formats

When this format breaks down

  • Datasets over 50,000 rows where token costs and latency exceed the value of cleaning. Use Power Query or a dedicated address API like Smarty or Google Geocoding.
  • When you need verified deliverability (real USPS-validated addresses). AI will confidently reformat a nonexistent address. Use an address verification service.
  • When the data is already well structured and just needs casing or trimming. A simple PROPER(), TRIM(), or Flash Fill does it faster.
  • Regulated data (HIPAA, patient records) where you cannot send raw addresses to a third-party model without a BAA in place.

The prompt we tested

You are an expert data cleaning assistant specializing in Excel address normalization. Parse and clean the messy address data provided below by splitting it into standardized components (Street, City, State, ZIP, Country), fixing capitalization, expanding or standardizing abbreviations (e.g., 'St' → 'Street', 'CA' stays as state code), removing extra whitespace and stray punctuation, and flagging any rows with missing or ambiguous fields.

Output a pipe-delimited table with columns: Original | Street | City | State | ZIP | Country | Issues. Then below the table, provide 3-5 reusable Excel formulas with brief one-line explanations. Keep the total response under 300 words and do not invent missing data—leave blank and note it in the Issues column.

Messy address data:
Here's a sample from my contacts spreadsheet column A: '123 main st. apt 4, new york NY 10001', '456 OAK AVENUE,Los angeles,ca 90210', '789 elm rd chicago illinois'. I need to split these into separate columns and standardize the formatting before importing into our CRM.

Return the cleaned output as a pipe-delimited table ready to paste into Excel, followed by the exact Excel formulas (TEXTSPLIT, TRIM, PROPER, SUBSTITUTE, etc.) the user can apply to replicate this cleaning on the full column.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.0713 13.7s 470 / 857
Claude Haiku 4.5 8 $0.0030 5.3s 320 / 541
GPT-5.4Not run yet
Gemini 2.5 ProNot run yet
Grok 4.1 Fast Reasoning 6 $0.0003 12.8s 431 / 350

Outputs

Claude Opus 4.7 9/10
## Cleaned Address Table

| Original | Street | City | State | ZIP | Country | Issues |
|---|---|---|---|---|---|---|
| 123 main st. apt 4, new york NY 10001 | 123 Main Street Apt 4 | New York | NY | 10001 | USA | Country inferred from state |
| 456 OAK AVENUE,Los angeles,ca 90210 | 456 Oak Avenue | Los Angeles | CA | 90210 | USA | Country inferred from state |
| 789 elm rd chicago illinois | 789 Elm Road | Chicago | IL | | | Missing ZIP; state spelled out (converted to IL); no delimiters |

---

## Reusable Excel Formulas

**1. Trim & Proper-case the street:**
```excel
=PROPER(TRIM(SUBSTITUTE(SUBSTITUTE(A2,"."," "),",",", ")))
```
Cleans punctuation, collapses spaces, standardizes capitalization.

**2. Expand street abbreviations:**
```excel
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2," St "," Street ")," Ave "," Avenue ")," Rd "," Road ")
```
Swaps common abbreviations for full words (add Blvd, Dr, Ln as needed).

**3. Split comma-delimited address into parts (Excel 365):**
```excel
=TEXTSPLIT(A2,",")
```
Returns Street, City, State+ZIP as separate spill cells.

**4. Extract 5-digit ZIP from any cell:**
```excel
=IFERROR(TEXT(LOOKUP(99^9,--MID(A2,ROW($1:$40),5)),"00000"),"")
```
Pulls the first valid 5-digit ZIP; blank if none found.

**5. Convert full state name to 2-letter code:**
```excel
=IFERROR(VLOOKUP(PROPER(TRIM(D2)),StateLookup,2,FALSE),UPPER(D2))
```
Uses a named range `StateLookup` (col1=full name, col2=code); falls back to uppercase input.

**6. Flag rows with missing fields:**
```excel
=TEXTJOIN("; ",TRUE,IF(B2="","Missing Street",""),IF(C2="","Missing City",""),IF(E2="","Missing ZIP",""))
```
Concatenates issue flags for QA review before CRM import.
Judge notes: Excellent response that directly addresses the query with a clean pipe-delimited table, accurate parsing of all three messy addresses, practical Excel formulas for replication, and proper handling of ambiguities (inferring country, flagging missing ZIP)—minor deduction only because the response slightly exceeds 300 words and formula #4 is unnecessarily complex for the stated task.
Claude Haiku 4.5 8/10
# Cleaned Address Data

| Original | Street | City | State | ZIP | Country | Issues |
|---|---|---|---|---|---|---|
| 123 main st. apt 4, new york NY 10001 | 123 Main Street Apt 4 | New York | NY | 10001 | USA | None |
| 456 OAK AVENUE,Los angeles,ca 90210 | 456 Oak Avenue | Los Angeles | CA | 90210 | USA | None |
| 789 elm rd chicago illinois | 789 Elm Road | Chicago | IL | | USA | Missing ZIP; State inferred from city |

---

## Reusable Excel Formulas

**Formula 1: Extract & Standardize Street Address**
```
=PROPER(TRIM(LEFT(A1,FIND(",",A1)-1)))
```
Extracts text before first comma, removes extra spaces, and applies title case.

**Formula 2: Extract City (between 1st and 2nd comma)**
```
=PROPER(TRIM(MID(A1,FIND(",",A1)+1,FIND(",",A1,FIND(",",A1)+1)-FIND(",",A1)-1)))
```
Isolates city between first and second comma, trims, and capitalizes properly.

**Formula 3: Extract State Code**
```
=UPPER(TRIM(RIGHT(LEFT(A1,FIND(",",A1,FIND(",",A1)+1)-1),2)))
```
Extracts last 2 characters before final comma as state abbreviation in uppercase.

**Formula 4: Extract ZIP Code**
```
=IFERROR(TRIM(RIGHT(A1,5)),"")
```
Pulls rightmost 5 characters as ZIP; returns blank if unavailable.

**Formula 5: Standardize Abbreviations (St/Rd/Ave)**
```
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"st.","Street"),"rd","Road"),"ave","Avenue")
```
Replaces common street abbreviations with full words before other processing.

---

**Note:** These formulas assume consistent comma-delimited formatting. Rows without ZIPs or ambiguous state/city pairings are flagged for manual review before CRM import.
Judge notes: Excellent practical response with clean output table and functional formulas, but formulas have minor accuracy issues (Formula 2 and 3 logic is flawed for inconsistent delimiters) and the sample data parsing could be more transparent about manual decisions made.
Grok 4.1 Fast Reasoning 6/10
```
Original|Street|City|State|ZIP|Country|Issues
123 main st. apt 4, new york NY 10001|123 Main Street Apt 4|New York|NY|10001|| 
456 OAK AVENUE,Los angeles,ca 90210|456 Oak Avenue|Los Angeles|CA|90210|| 
789 elm rd chicago illinois|789 Elm Road|Chicago|IL|| |Missing ZIP
```

Excel formulas (assume original in A2:A4; drag down):
1. Clean base: `=TRIM(PROPER(SUBSTITUTE(SUBSTITUTE(A2,"st.","Street"),"rd","Road")))`
   Proper cases & expands basic abbrevs.
2. Street (exc. city/state/ZIP): `=TRIM(LEFT(TEXTSPLIT(TRIM(PROPER(A2)),",")(1),FIND(",",SUBSTITUTE(A2," ",REPT(" ",50),LEN(A2)-LEN(SUBSTITUTE(A2," ",""))))-1))`
   Splits at comma, takes left part.
3. ZIP: `=TRIM(IFERROR(INDEX(TEXTSPLIT(A2," "),1,COUNTA(TEXTSPLIT(A2," ")))," "))`
   Extracts last space-separated token if numeric-like.
4. State (last 2 pre-ZIP): `=UPPER(TRIM(MID(TEXTSPLIT(A2,",")(2),FIND(" ",TEXTSPLIT(A2,",")(2))+1,2)))`
   Grabs 2-letter code post-city.
5. Issues flag: `=IF(OR(ISBLANK(D2),LEN(D2)<5),"Missing ZIP","")`
   Flags blank/short ZIPs.
Judge notes: The cleaned data table is accurate and well-formatted, but the Excel formulas are overly complex, fragile, and would likely fail on real-world messy data; a production-ready solution would need simpler, more robust formulas and clearer step-by-step guidance.

What makes these work

  1. 01
    Specify the output columns

    Tell the model exactly which fields you want: Street, Unit, City, State, ZIP, Country. If you leave it open, different rows get different schemas and the paste into Excel breaks. Lock the column order in the prompt.

  2. 02
    Give it a format example

    One or two input-output pairs in the prompt anchor the abbreviation style (St vs Street, CA vs California). Without an example the model picks its own convention and you spend time normalizing the normalization.

  3. 03
    Handle missing data explicitly

    Instruct the model to leave cells blank rather than guess. Otherwise it will hallucinate ZIP codes or assume 'US' for a UK address. Add: 'If a field is not present in the source, leave it empty.'

  4. 04
    Batch in chunks of 50-100 rows

    Large pastes cause models to drop rows or truncate. Feed the data in blocks, paste results into Excel, and repeat. Keep a row count check so you catch dropped records.

More example scenarios

#01 · US retail customer list with mixed formats
Input
123 main st apt 4b, brooklyn ny 11201
456 Oak Avenue Suite 200 Chicago IL 60601
789 elm rd, austin tx
1000 Market St. #501, San Francisco, California 94103 USA
Expected output
| Street | Unit | City | State | ZIP | Country |
|---|---|---|---|---|---|
| 123 Main St | Apt 4B | Brooklyn | NY | 11201 | US |
| 456 Oak Ave | Suite 200 | Chicago | IL | 60601 | US |
| 789 Elm Rd | | Austin | TX | | US |
| 1000 Market St | #501 | San Francisco | CA | 94103 | US |
#02 · International B2B vendor file
Input
Acme Ltd, 10 Downing Street, London SW1A 2AA, United Kingdom
muller gmbh // hauptstrasse 15 // 10115 berlin germany
Sakura Corp, 1-2-3 Shibuya, Tokyo 150-0002, JP
Expected output
| Company | Street | City | Postal Code | Country |
|---|---|---|---|---|
| Acme Ltd | 10 Downing Street | London | SW1A 2AA | United Kingdom |
| Muller GmbH | Hauptstrasse 15 | Berlin | 10115 | Germany |
| Sakura Corp | 1-2-3 Shibuya | Tokyo | 150-0002 | Japan |
#03 · Event registration list with junk fields
Input
John Smith - 22 pine street apt2, boston, ma 02108 - phone 617-555-0100
sarah lee | PO BOX 445 seattle wa | sarah@email.com
Dr. R. Patel, 5th floor 88 King's Cross Rd London WC1X 9DE
Expected output
| Name | Street | Unit | City | State/Region | Postal Code |
|---|---|---|---|---|---|
| John Smith | 22 Pine St | Apt 2 | Boston | MA | 02108 |
| Sarah Lee | PO Box 445 | | Seattle | WA | |
| Dr. R. Patel | 88 King's Cross Rd | 5th Floor | London | | WC1X 9DE |
#04 · Real estate listings with inconsistent unit notation
Input
1420 Ocean Blvd Unit 7, Miami FL 33139
1420 ocean blvd #7 miami fl 33139
1420 OCEAN BOULEVARD APARTMENT 7 MIAMI FLORIDA 33139
Expected output
All three rows normalize to the same canonical record:
| Street | Unit | City | State | ZIP |
|---|---|---|---|---|
| 1420 Ocean Blvd | Unit 7 | Miami | FL | 33139 |
This makes deduplication trivial with a COUNTIFS on the combined fields.
#05 · Freight shipping manifest
Input
ship to: warehouse 3, 4500 industrial pkwy bldg c dock 12, fort worth tx 76106
consignee- smithco, 99 cedar ln ste 400, atlanta ga 30303
Expected output
| Recipient | Street | Building/Unit | City | State | ZIP |
|---|---|---|---|---|---|
| Warehouse 3 | 4500 Industrial Pkwy | Bldg C Dock 12 | Fort Worth | TX | 76106 |
| Smithco | 99 Cedar Ln | Ste 400 | Atlanta | GA | 30303 |

Common mistakes to avoid

  • Trusting the ZIP code output

    AI will invent a plausible ZIP when the source is missing one. Always diff the output against the input for any field the model 'completed'. Flag rows where the output has a value the input did not.

  • Mixing countries in one pass

    Asking for a single US-style schema on international data mangles postal codes and collapses regions into the State column. Split by country first, or include a Country column and let the model use region-appropriate fields.

  • Skipping the TRIM and CLEAN step

    Non-breaking spaces and invisible characters from copy-paste confuse the model and the downstream VLOOKUP. Run TRIM(CLEAN()) on the column before sending, or ask the model to strip them explicitly.

  • No dedupe check after cleaning

    Cleaning exposes duplicates that looked different before (e.g., 'St' vs 'Street'). If you skip the dedupe pass after normalization, you import the same record twice into your CRM.

  • Pasting as plain text into Excel

    Markdown tables paste as a single column. Use Data > Text to Columns with pipe delimiter, or ask the model for tab-separated output so Excel splits it automatically.

Related queries

Frequently asked questions

Can I clean address data in Excel without using AI?

Yes, for consistent data. Text to Columns, Flash Fill, and Power Query handle predictable patterns. Formulas like LEFT, MID, FIND, and TEXTSPLIT work when delimiters are reliable. AI is only worth it when the data is inconsistent enough that no rule covers every row.

How do I split a full address into street, city, state, and ZIP in Excel?

If the format is consistent (comma-separated, US style), use TEXTSPLIT(A1, {',',' '}) or Power Query's Split Column. For inconsistent data, paste the column into the AI prompt on this page and request the four fields as output columns.

Will AI validate that the addresses actually exist?

No. AI formats and parses, but it does not verify deliverability. For USPS-validated or geocoded addresses use Smarty, Google Maps Geocoding API, or Melissa. Run AI cleanup first, then validation, to save API calls on unparseable junk.

How many rows can I clean at once?

Most models handle 50 to 200 rows per prompt reliably. Beyond that you risk truncation or dropped rows. For files over 1,000 rows, loop through batches or use a script with the API rather than pasting into a chat interface.

What about apartment numbers that get stuck to the street name?

The prompt extracts units into a separate column even when there is no delimiter (e.g., '123 Main St Apt 4' or '123 Main St #4'). Specify 'Unit' as an output column and include one example with a glued unit so the model knows the pattern.

Is it safe to send customer addresses to an AI model?

It depends on your provider and your compliance requirements. Check whether the vendor trains on your inputs (OpenAI, Anthropic, and Google offer no-training enterprise tiers). For HIPAA or GDPR-sensitive data, use a provider with a signed DPA or BAA, or run a local model.