# Phone Number Cleanup Results
## Cleaned Table
| Original Value | Cleaned Value |
|---|---|
| `555.123.4567` | `(555) 123-4567` |
| `(555) 123 4567 ext 42` | `(555) 123-4567` |
| `15551234567` | `(555) 123-4567` |
| `1-800-CALL-NOW` | `INVALID` |
| `+44 20 7946 0958` | `+44 20 7946 0958` |
| `555-CALL` | `INVALID` |
## Excel Formula (Microsoft 365 / Excel 2021+)
Paste into **C2** and drag down:
```excel
=LET(
raw, B2,
digits, IFERROR(REGEXREPLACE(raw,"[^\d+]",""), ""),
intl, LEFT(digits,1)="+",
core, IF(intl, MID(digits,2,99), IF(LEN(digits)=11, MID(digits,2,10), digits)),
IF(intl,
"+" & LEFT(core,2) & " " & MID(core,3,2) & " " & MID(core,5,4) & " " & MID(core,9,4),
IF(LEN(core)=10,
"("&LEFT(core,3)&") "&MID(core,4,3)&"-"&RIGHT(core,4),
"INVALID"))
)
```
## Legacy Excel Alternative (no REGEX)
```excel
=IFERROR(
TEXT(VALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"-",""),".","")," ",""),"(",""),")","")),"(000) 000-0000"),
"INVALID")
```
**Notes:** Letters (e.g., `CALL-NOW`) are rejected as ambiguous vanity numbers—flagged `INVALID` for manual review. International numbers keep their `+CC` prefix with spaced grouping.
Format and Clean Phone Numbers in Excel with AI
Tested prompts for clean up phone numbers in excel compared across 5 leading AI models.
You have a spreadsheet full of phone numbers and none of them match. Some have country codes, some don't. Some use dashes, some use dots, some are jammed together as raw digits. A few have extensions tacked on, parentheses around area codes, or sneaky whitespace that breaks your VLOOKUPs. You need them all in one clean, consistent format so you can dedupe contacts, import to a CRM, or send an SMS campaign without errors.
Excel can do this with nested SUBSTITUTE, TEXT, and REGEX formulas, but writing them for every edge case is painful. A few malformed rows and your formula chain collapses. This page shows how to hand the job to an AI model: paste the messy column, describe the target format (E.164, US 10-digit, dashed, whatever), and get back a clean list you can drop straight back into your sheet.
Below you'll find the exact prompt that works, outputs from four different models so you can pick the best fit, and guidance on when this approach beats a formula and when it doesn't.
When to use this
Use AI to clean phone numbers when the mess is varied, the volume is manageable (under a few thousand rows), and you need judgment calls about country codes, extensions, or ambiguous formats. It's faster than writing regex for every variation and handles edge cases a formula would miss.
- Merging contact lists from multiple sources with different formatting conventions
- Preparing a CRM import where the target field requires E.164 format (+14155551234)
- Cleaning a lead list before a Twilio, HubSpot, or Salesforce upload
- Standardizing numbers scraped from forms where users typed freely
- Flagging invalid or incomplete numbers so you can review them before sending
When this format breaks down
- Datasets over 10,000 rows where token costs and latency make a Power Query or Python script cheaper
- Regulated data (HIPAA, financial PII) where sending phone numbers to a third-party API violates policy
- Cases where you need 100% deterministic output for audit trails, since models can occasionally hallucinate digits
- Real-time validation inside a live workflow, where a dedicated library like libphonenumber is faster and more accurate
The prompt we tested
You are a data-cleaning assistant specializing in Excel phone number formatting. Clean and standardize the phone number data provided below. Instructions: Standardize all phone numbers to the format (XXX) XXX-XXXX for US numbers, or +CC XXX XXX XXXX for international. Strip extensions, letters, and extra characters; flag any invalid or incomplete numbers as 'INVALID'. Always include at least one ready-to-use Excel formula (using SUBSTITUTE, TEXT, or REGEXREPLACE) and keep the explanation under 150 words. Phone number data to clean: Here are the phone numbers from my customer list column B: '555.123.4567', '(555) 123 4567 ext 42', '15551234567', '1-800-CALL-NOW', '+44 20 7946 0958', and '555-CALL'. I need them all cleaned up and consistent so I can import them into our CRM. Return a cleaned table with the original value and the cleaned value side by side, then provide the exact Excel formula(s) a user can paste into their spreadsheet to automate this cleanup across a full column.
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 9 | $0.0579 | 11.2s | 384 / 695 |
| Claude Haiku 4.5 | 6 | $0.0054 | 9.0s | 267 / 1032 |
| GPT-5.4 | Not run yet | |||
| Gemini 2.5 Pro | Not run yet | |||
| Grok 4.1 Fast Reasoning | 8 | $0.0003 | 86.8s | 386 / 438 |
Outputs
# Phone Number Cleaning Results
| Original | Cleaned |
|----------|---------|
| 555.123.4567 | (555) 123-4567 |
| (555) 123 4567 ext 42 | (555) 123-4567 |
| 15551234567 | (555) 123-4567 |
| 1-800-CALL-NOW | INVALID |
| +44 20 7946 0958 | +44 20 7946 0958 |
| 555-CALL | INVALID |
---
## Excel Formula
Paste this formula into column C to automate cleanup:
```excel
=IF(OR(ISNUMBER(SEARCH("[A-Z]",B2)),LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""))NOT IN (10,11,12)),"INVALID",IF(LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),1)="+","+"&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),1,2)&" "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),3,2)&" "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),5,4)&" "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),9,4),"("&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(IF(LEFT(B2,1)="1",MID(B2,2,20),B2),"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),1,3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(IF(LEFT(B2,1)="1",MID(B2,2,20),B2),"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),4,3)&"-"&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(IF(LEFT(B2,1)="1",MID(B2,2,20),B2),"+",""),"-","")," ",""),".",""),"(",""),")",""),"ext",""),7,4)))
```
**Simpler Alternative** (for US numbers only):
```excel
=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"-","")," ",""),".",""),"(",""),")",""),"ext",""))<>10,"INVALID","("&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"-","")," ",""),".",""),"(",""),")",""),"ext",""),1,3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"-","")," ",""),".",""),"(",""),")",""),"ext",""),4,3)&"-"&RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"-","")," ",""),".",""),"(",""),")",""),"ext",""),4))
```
The formula strips non-numeric characters, removes extensions, detects letters (invalid), validates length, and reformats accordingly.
### Cleaned Phone Numbers
| Original | Cleaned |
|-----------------------|-------------------|
| 555.123.4567 | (555) 123-4567 |
| (555) 123 4567 ext 42| (555) 123-4567 |
| 15551234567 | (555) 123-4567 |
| 1-800-CALL-NOW | INVALID |
| +44 20 7946 0958 | +44 207 946 0958 |
| 555-CALL | INVALID |
### Excel Formula
Paste this into C1 (assuming data in A1:A6 or column A) and drag down. Requires Excel 365 (REGEX/LET). Handles stripping extensions/letters/extra chars, US normalization (drop leading 1), intl spacing, flags invalid lengths.
```
=LET(
step1, REGEXREPLACE(UPPER(A1), "\s*EXT.*", ""),
step2, REGEXREPLACE(step1, "[A-Z]", ""),
cleaned, REGEXREPLACE(step2, "[^0-9+]", ""),
IF(LEFT(cleaned,1)="+",
IF(LEN(cleaned)<13,"INVALID",
"+" & LEFT(cleaned,3) & " " & MID(cleaned,4,3) & " " & MID(cleaned,7,3) & " " & MID(cleaned,10,4)
),
LET(
us_core, IF(AND(LEN(cleaned)=11,LEFT(cleaned,1)="1"),MID(cleaned,2,10),cleaned),
IF(LEN(us_core)=10,
"(" & LEFT(us_core,3) & ") " & MID(us_core,4,3) & "-" & RIGHT(us_core,4),
"INVALID"
)
)
)
)
```
(78 words: Formula strips invalid chars first, detects + for intl (+CC XXX XXX XXXX assuming 2-digit CC +10 digits), else US 10-digit format. Invalid if wrong length.)
What makes these work
-
01Specify the exact target format
Say 'E.164' or 'XXX-XXX-XXXX' or '+CC XXX XXX XXXX' explicitly. Models default to whatever looks common in your input, which is rarely what you want. Show one example of the desired output in the prompt.
-
02Handle country codes deliberately
Tell the model what to assume when a country code is missing. 'Assume US (+1) if no country code is present' prevents it from guessing or leaving numbers ambiguous. For international lists, point it to a country column.
-
03Ask for a flag column on invalid rows
Request a second column with 'INVALID' or 'REVIEW' for numbers that are too short, spelled out, or contain multiple numbers. This beats having the model silently drop or fabricate digits.
-
04Process in batches of 200-500 rows
Larger batches hit token limits and model attention drops, meaning more errors toward the end. Split the column, run each batch, paste results back. Most models stay accurate at this size.
More example scenarios
Column A has: (415) 555-1234, 415.555.9876, 4155550001, 1-415-555-2222, +1 415 555 3333, 415/555/4444 ext 12. Convert all to E.164 format for a Twilio upload.
+14155551234, +14155559876, +14155550001, +14155552222, +14155553333, +14155554444. Note: row 6 had an extension (ext 12) which was dropped since E.164 does not support extensions. Flag this row if the extension matters for your use case.
A European sales list: 020 7946 0958 (UK), 01 42 86 82 00 (France), +49-30-12345678, 3 5555 1234 (Japan, no code), 415-555-1234 (US). Assume country based on context column B.
+442079460958, +33142868200, +493012345678, +81355551234, +14155551234. Leading zeros from national trunk prefixes were stripped. Row 4 required inferring +81 from the Japan context label before prepending.
User-submitted phone field: 'call me at 555.123.4567 please', '(555) 123 4567 x205', '555-123-4567/555-987-6543', 'five five five one two three four five six seven', '+1 (555) 123-4567'.
5551234567, 5551234567 ext 205, 5551234567 (second number 5559876543 flagged for review), INVALID (spelled out digits, needs manual entry), +15551234567. Two rows need human review before import.
Raw digits: 4155551234, 2125559876, 13105550001, 4155552222extension45. Output format: XXX-XXX-XXXX, drop country code if it's 1, keep extensions in a separate column.
Column A: 415-555-1234, 212-555-9876, 310-555-0001, 415-555-2222. Column B (extension): blank, blank, blank, 45. Country code 1 was stripped from row 3.
Clean these and flag duplicates: +1-415-555-1234, (415) 555-1234, 4155551234, 415.555.1235, 14155551234.
All rows except row 4 normalize to +14155551234 and are duplicates of each other. Row 4 is +14155551235, a distinct number. Keep one instance of +14155551234 and the single +14155551235 entry.
Common mistakes to avoid
-
Not specifying extension handling
E.164 does not support extensions but your users entered them. If you don't tell the model what to do, it may drop them silently or concatenate them into the main number. Always specify: drop, keep in separate column, or flag.
-
Trusting the output without spot-checking
Models occasionally transpose or hallucinate a digit, especially on long or garbled inputs. Spot-check 10-20 rows against the source before importing to a CRM or sending any SMS campaign.
-
Mixing regions without context
Pasting a list with US, UK, and Brazilian numbers and asking for 'E.164' without telling the model which row is which country produces wrong country codes. Include a country column or split the list by region.
-
Ignoring leading zeros
Excel strips leading zeros from numeric cells, which breaks UK and European numbers. Format the column as Text before pasting cleaned output back, or the '+44' becomes '44' and your dialing breaks.
-
Running it on 10,000 rows at once
Most chat models will truncate output, skip rows, or slow down dramatically. You'll think it worked until you scroll to the bottom. Batch it or switch to a script-based approach for large files.
Related queries
Frequently asked questions
What's the best Excel formula to clean phone numbers without AI?
For simple cleanup, =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"(",""),")",""),"-","")," ","") strips common characters. For more control, Excel 365 has REGEXREPLACE: =REGEXREPLACE(A1,"[^0-9]","") keeps only digits. These work for uniform input but struggle with extensions, country codes, and mixed formats.
How do I format phone numbers as (XXX) XXX-XXXX in Excel?
Select the column, open Format Cells (Ctrl+1), choose Special, and pick Phone Number. This only works if the cells contain exactly 10 digits with no other characters. Clean the raw data first (strip non-digits) and then apply the format.
How do I add a country code to phone numbers in Excel?
Use ="+1"&A1 for a single country, assuming A1 already has 10 clean digits. For mixed countries, use an IF or VLOOKUP against a country column: ="+"&VLOOKUP(B1,CountryCodes,2,FALSE)&A1. AI is faster when your country assignments are implicit rather than in a separate column.
Can AI handle international phone numbers correctly?
Yes, if you give it the country context. GPT-4, Claude, and Gemini all know national trunk prefixes (the leading 0 in UK and most European numbers) and will strip them when converting to E.164. Without a country column or explicit instruction, they guess based on length, which is unreliable.
Will AI make up digits when cleaning phone numbers?
It can, rarely. The risk is higher with messy input (spelled-out digits, numbers split across words, OCR artifacts). Always spot-check and ask the model to flag rows it's uncertain about rather than forcing an answer. Never send a cleaned list to an SMS campaign without reviewing the flagged rows.
How do I remove duplicate phone numbers after cleaning?
Once every number is in a single consistent format (E.164 is best), use Excel's Remove Duplicates feature under the Data tab. Normalization is the hard part because (415) 555-1234 and +14155551234 are the same number but Excel sees them as different strings. Clean first, dedupe second.