# Excel Customer List Cleaning Guide
## 1. Data Quality Issues Detected
- **Name column**: Inconsistent capitalization (ALL CAPS, lowercase, mixed)
- **Email column**: Leading/trailing spaces, possible mixed case
- **Phone column**: Inconsistent formats — `(555) 123-4567`, `555-123-4567`, `5551234567`
- **Duplicate rows**: Full or partial duplicates across records
- **Potential whitespace** in all text fields
---
## 2. Step-by-Step Cleaning Actions
### Step 1: Create Helper Columns (E–H)
Insert new headers: `Name_Clean`, `Email_Clean`, `Phone_Clean`, `City_Clean`.
### Step 2: Clean Names (Proper Case)
In **E2**:
```excel
=PROPER(TRIM(CLEAN(A2)))
```
### Step 3: Clean Emails (Lowercase + Trim)
In **F2**:
```excel
=LOWER(TRIM(CLEAN(B2)))
```
### Step 4: Standardize Phone Numbers
Strip all non-digits, then reformat to `(XXX) XXX-XXXX`.
In **G2**:
```excel
=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""))=10,
"("&LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""),3)&") "&
MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""),4,3)&"-"&
RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""),4),
"INVALID")
```
### Step 5: Clean City
In **H2**:
```excel
=PROPER(TRIM(CLEAN(D2)))
```
### Step 6: Fill Down & Convert to Values
Select E2:H2 → double-click fill handle → copy → **Paste Special > Values**.
### Step 7: Remove Duplicates
Select columns E–H → **Data → Remove Duplicates** → check Email_Clean (primary key).
---
## 3. Before / After Sample
| Before (Name) | Before (Email) | Before (Phone) | After (Name) | After (Email) | After (Phone) |
|---|---|---|---|---|---|
| JOHN SMITH | ` John@X.com ` | 5551234567 | John Smith | john@x.com | (555) 123-4567 |
| mary jones | `MARY@y.com` | (555) 987-6543 | Mary Jones | mary@y.com | (555) 987-6543 |
| Bob LEE | `bob@z.com ` | 555.444.3333 | Bob Lee | bob@z.com | (555) 444-3333 |
---
## 4. Reusable All-in-One Solution
### Option A: Single Array Formula (Excel 365)
In **E2** (spills to E:H):
```excel
=LET(
n,A2:A2000, e,B2:B2000, p,C2:C2000, c,D2:D2000,
digits,LAMBDA(x,TEXTJOIN("",TRUE,IFERROR(--MID(x,SEQUENCE(LEN(x)),1),""))),
HSTACK(
PROPER(TRIM(n)),
LOWER(TRIM(e)),
IF(LEN(BYROW(p,digits))=10,
"("&LEFT(BYROW(p,digits),3)&") "&MID(BYROW(p,digits),4,3)&"-"&RIGHT(BYROW(p,digits),4),
"INVALID"),
PROPER(TRIM(c))
)
)
```
### Option B: VBA Macro (any Excel version)
```vba
Sub CleanCustomerList()
Dim r As Range, lastRow As Long, d As String, i As Long
lastRow = Cells(Rows.Count, 1).End(xlUp).Row
For i = 2 To lastRow
Cells(i,1) = Application.Proper(Trim(Cells(i,1)))
Cells(i,2) = LCase(Trim(Cells(i,2)))
Cells(i,4) = Application.Proper(Trim(Cells(i,4)))
d = ""
For Each r In Array(Cells(i,3))
Dim ch As Integer
For ch = 1 To Len(r.Value)
If Mid(r.Value,ch,1) Like "#" Then d = d & Mid(r.Value,ch,1)
Next ch
Next r
If Len(d) = 10 Then Cells(i,3) = "(" & Left(d,3) & ") " & Mid(d,4,3) & "-" & Right(d,4)
Next i
ActiveSheet.Range("A1").CurrentRegion.RemoveDuplicates Columns:=Array(1,2,3,4), Header:=xlYes
End Sub
```
Run via **Alt+F11 → Insert Module → Paste → F5**.
How to Clean Excel Spreadsheets Using ChatGPT Prompts
Tested prompts for use chatgpt to clean excel data compared across 5 leading AI models.
If you've landed here, you probably have a messy Excel file open right now. Inconsistent date formats, names in mixed case, duplicate rows with tiny differences, trailing spaces you can't even see. You've heard ChatGPT can help, but you're not sure whether to paste the data in, ask for a formula, or get a macro. This page shows you all three approaches and when each one wins.
ChatGPT is useful for Excel cleanup in two distinct modes: as a formula and macro generator when your data is too large or sensitive to share, and as a direct data cleaner when you can paste a few hundred rows into the chat. The right mode depends on row count, sensitivity, and whether you need a repeatable process.
Below you'll find a tested prompt, four model outputs side by side, real scenarios (deduping a CRM export, standardizing a product catalog, fixing survey responses), and the mistakes that cause wrong results. Use the comparison table to pick the model, and the prompts here to do the actual work.
When to use this
ChatGPT works best on Excel cleanup when the rules are describable in plain English but tedious to implement by hand. Think standardization, deduplication with fuzzy matching, splitting or combining fields, and generating reusable formulas. It shines when your dataset is under a few thousand rows or when you need a formula you can apply to millions of rows yourself.
- Standardizing inconsistent formats (dates, phone numbers, country names, capitalization)
- Writing complex formulas like nested IFs, TEXTSPLIT, or regex-based extraction
- Deduplicating rows where matches are fuzzy (typos, extra spaces, different casing)
- Splitting unstructured text columns into clean structured fields
- Generating a VBA macro or Office Script to repeat the cleanup on future files
When this format breaks down
- Datasets over ~5,000 rows pasted directly into chat. Ask for a formula or script instead.
- Files containing PII, financial records, or confidential business data that can't leave your systems. Use a formula approach so the data stays in Excel.
- Cleanup that requires verifying against a source of truth (e.g., real customer addresses). ChatGPT will guess plausibly and be wrong.
- Precise numerical transformations on large ranges. ChatGPT makes arithmetic errors; use Excel formulas it writes for you instead of having it compute values.
The prompt we tested
You are a data cleaning assistant specialized in Excel spreadsheets. Analyze the messy Excel data or data description provided and give clear, step-by-step cleaning instructions along with the exact Excel formulas, functions, or Power Query steps needed to clean it. Instructions: Format the response with clear numbered sections and headings. Include exact Excel formulas in code blocks (e.g., =TRIM(A2)) and keep explanations concise and actionable. Limit the response to under 500 words and prioritize the most impactful cleaning steps first. Messy Excel data or description: I have a customer list in Excel with about 2000 rows. The names have inconsistent capitalization (some ALL CAPS, some lowercase), phone numbers are formatted differently (some with dashes, some with parentheses, some just digits), and there are duplicate entries and trailing spaces in the email column. Column headers are: Name, Email, Phone, City. Provide: (1) a list of data quality issues detected, (2) step-by-step cleaning actions with exact Excel formulas or menu paths, (3) a cleaned sample output showing the before/after, and (4) a reusable formula or macro the user can apply to the full dataset.
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 9 | $0.1521 | 25.4s | 411 / 1946 |
| Claude Haiku 4.5 | 9 | $0.0053 | 9.8s | 281 / 1002 |
| GPT-5.4 | Not run yet | |||
| Gemini 2.5 Pro | 7 | $0.0015 | 18.4s | 251 / 242 |
| Grok 4.1 Fast Reasoning | 8 | $0.0004 | 12.1s | 393 / 660 |
Outputs
# Data Cleaning Guide: Customer List
## 1. Data Quality Issues Detected
- **Names**: Inconsistent capitalization (ALL CAPS, lowercase, mixed case)
- **Phone Numbers**: Multiple formats (dashes, parentheses, no formatting)
- **Email**: Trailing/leading spaces
- **Duplicates**: Repeated customer entries
- **General**: Potential blank cells or hidden characters
---
## 2. Step-by-Step Cleaning Actions
### Step 1: Remove Trailing/Leading Spaces from Email
**Action**: Create a helper column with cleaned emails.
```excel
=TRIM(D2)
```
Apply to column E (Email_Clean), then copy and paste values back to column D.
---
### Step 2: Standardize Name Capitalization
**Action**: Convert names to proper case (first letter of each word capitalized).
```excel
=PROPER(A2)
```
Apply to column F (Name_Clean), then paste values back to column A.
---
### Step 3: Standardize Phone Numbers
**Action**: Remove all formatting characters, keep only digits.
```excel
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")
```
This removes dashes and parentheses. Apply to column G (Phone_Clean).
**Optional**: Format cleaned phone numbers consistently using the format `XXX-XXX-XXXX`:
```excel
=TEXT(VALUE(G2),"[<=9999999]###-####;(###) ###-####")
```
---
### Step 4: Remove Duplicate Entries
**Action**: Use Excel's built-in deduplication tool.
1. Select all data (including headers)
2. Go to **Data** → **Remove Duplicates**
3. Select all columns and click OK
4. Excel will remove exact duplicate rows
---
### Step 5: Clean Up Helper Columns
**Action**: Delete columns E, F, G (helper columns) after confirming cleaned data is correct.
---
## 3. Before/After Sample Output
| **Before** | **After** |
|---|---|
| Name: `JOHN SMITH` | Name: `John Smith` |
| Email: `john@example.com ` | Email: `john@example.com` |
| Phone: `(555) 123-4567` | Phone: `555-123-4567` |
| **Duplicate row detected** | **Removed** |
---
## 4. Reusable Formula/Macro for Full Dataset
### Option A: Single Master Formula (Recommended for 2000 rows)
In a new sheet, create a combined formula in one column:
```excel
=PROPER(TRIM(A2)) & "|" & TRIM(D2) & "|" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")
```
This concatenates cleaned Name | Email | Phone in one cell for verification.
### Option B: Quick Macro for Automation
1. Press **Alt + F11** to open VBA Editor
2. Insert a new module and paste:
```vba
Sub CleanCustomerData()
Dim ws As Worksheet
Set ws = ActiveSheet
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
' Clean Names (Column A)
ws.Range("A2:A" & lastRow).Value = Evaluate("=PROPER(" & ws.Range("A2:A" & lastRow).Address & ")")
' Clean Emails (Column D)
ws.Range("D2:D" & lastRow).Value = Evaluate("=TRIM(" & ws.Range("D2:D" & lastRow).Address & ")")
MsgBox "Cleaning complete!"
End Sub
```
3. Run macro via **Alt + F8**
---
## 5. Final Validation Steps
1. Spot-check 10–20 random rows for accuracy
2. Use **Data** → **Text to Columns** if phone numbers still need formatting
3. Sort by Name and Email to visually identify remaining duplicates
4. Save file with backup before applying changes to original
Here is a data cleaning plan for your customer list.
### 1. Data Quality Issues Detected
* **Inconsistent Text Case:** The `Name` column has a mix of uppercase, lowercase, and potentially mixed-case entries.
* **Inconsistent Formatting:** The `Phone` column uses multiple formats (dashes, parentheses, spaces).
* **Leading/Trailing Spaces:** The `Email` column contains extra spaces that affect data integrity and lookups.
* **Duplicate Records:** The dataset contains entire rows that are duplicates.
### 2. Step-by-Step Cleaning Plan
This plan uses helper columns to build the clean data without destroying the original. Assume your data is in columns A:D.
**Step 1: Clean Email and Name Columns**
Insert two new columns (E and F).
* In cell E2, clean the emails by removing extra spaces using the TRIM function.
* In cell F2, standardize the name capitalization to "Proper Case" using the PROPER function.
**Formulas:**
* Clean Email (in cell E2):
```excel
=TRIM
### 1. Data Quality Issues Detected
- Inconsistent name capitalization (e.g., "JOHN DOE" vs. "john doe").
- Varied phone formats (e.g., "(123) 456-7890" vs. "1234567890").
- Trailing/leading spaces in emails (e.g., "user@ex.com ").
- Duplicate rows across Name/Email/Phone.
- Potential spaces in other columns.
### 2. Step-by-Step Cleaning Actions
1. **Trim all text columns**: Select Name:Email:Phone:City > Data > Text to Columns > Finish (trims). Or use helper columns.
2. **Standardize names**: In helper col E (Name cleaned): `=PROPER(TRIM(Name2))`. Copy down, paste values.
3. **Standardize phones** (to digits only): In F (Phone cleaned):
```
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(Phone2),"(",""),")",""),"-","")," ",""),"+","")
```
Copy down, paste values.
4. **Trim emails**: In G (Email cleaned): `=TRIM(Email2)`. Copy down, paste values.
5. **Remove duplicates**: Select all data > Data > Remove Duplicates (check all columns).
6. **Power Query for full clean** (reusable): Data > From Table/Range > Power Query Editor:
- Add Column > Custom: `Text.Proper(Text.Trim([Name]))` → Name Clean.
- Phone: `Text.Select(Text.Replace(Text.Replace([Phone],"-"," "), " ",""), {"0".."9"})` → Phone Clean.
- Email: `Text.Trim([Email])`.
- Remove originals > Close & Load.
### 3. Before/After Sample
| Name (Before) | Email (Before) | Phone (Before) | Name (After) | Email (After) | Phone (After) |
|---------------|----------------|----------------|--------------|---------------|---------------|
| JOHN DOE | user@ex.com | (123)456-7890 | John Doe | user@ex.com | 1234567890 |
| john doe | user@ex.com | 1234567890 | John Doe | user@ex.com | 1234567890 |
| JANE SMITH | jane@ex.com | 123-456-7890 | Jane Smith | jane@ex.com | 1234567890 |
### 4. Reusable Formula for Full Dataset
Paste in new sheet Row 2 (data Row 2+):
```
Name: =PROPER(TRIM(A2))
Email: =TRIM(B2)
Phone: =TEXTJOIN("",TRUE,IF(ISNUMBER(--MID(C2,ROW(INDIRECT("1:"&LEN(C2))),1)),MID(C2,ROW(INDIRECT("1:"&LEN(C2))),1),"")) //Array-enter if pre-365
```
Copy down to 2000 rows > Paste Special Values > Remove Dups. (187 words)
What makes these work
-
01Paste a sample, not the whole file
Give ChatGPT 10-20 representative rows including your edge cases (the weird ones that break normal rules). Ask it to return a formula or script you apply to the full file. This avoids token limits and keeps sensitive data out of the chat.
-
02State the output format explicitly
Say 'return as a pipe-delimited table' or 'return only the formula, no explanation'. Without this, ChatGPT adds commentary that breaks copy-paste workflows. Specify column order too if it matters.
-
03Show before and after for one row
The fastest way to communicate a cleanup rule is a worked example. Write 'Input: john SMITH → Output: John Smith' before listing your data. This disambiguates rules that English descriptions leave fuzzy.
-
04Ask for the formula, then verify on edge cases
When ChatGPT gives you a TEXTSPLIT or regex formula, test it on your weirdest three rows before applying to 50,000. Ask 'what inputs would break this formula?' as a follow-up to surface problems before they hit production.
More example scenarios
Clean this contact list. Standardize names to Title Case, normalize phone numbers to +1-XXX-XXX-XXXX format, and lowercase all emails. Flag any rows with missing phone or email. Name, Phone, Email john SMITH, 5551234567, JOHN@ACME.com Mary o'brien, (555) 987-6543, mary@acme bob lee, , bob@acme.com
Name, Phone, Email, Flag John Smith, +1-555-123-4567, john@acme.com, OK Mary O'Brien, +1-555-987-6543, (invalid email), Missing valid email Bob Lee, (missing), bob@acme.com, Missing phone Notes: 'mary@acme' is missing a TLD; recommend verifying. Apostrophe preserved in O'Brien.
Find duplicates in this SKU list. Treat rows as duplicates if product names match after removing punctuation, extra spaces, and case differences. SKU, Product Name A100, Blue Widget - Large A101, blue widget large A102, Blue Widget, Large B200, Red Gadget B201, Red Gadget (2024)
Duplicate group 1: A100, A101, A102 (normalized: 'blue widget large'). Keep A100, mark others as duplicates. Duplicate group 2: none. B200 and B201 differ by year qualifier and should be treated as distinct. Suggested formula for column D: =LOWER(TRIM(SUBSTITUTE(SUBSTITUTE(B2,"-"," "),","," ")))
Split this Address column into Street, City, State, ZIP. Write an Excel formula I can drag down, not a one-time result. Sample: 123 Main St, Springfield, IL 62701 45 Oak Ave Apt 2, Chicago, IL 60614
Use these formulas (assuming address in A2): Street: =TRIM(TEXTBEFORE(A2,",",1)) City: =TRIM(TEXTBEFORE(TEXTAFTER(A2,",",1),",")) State: =TRIM(LEFT(TEXTAFTER(A2,",",2),3)) ZIP: =TRIM(RIGHT(A2,5)) These work for US addresses in the 'street, city, ST ZIP' pattern. Validate the ZIP column as text to preserve leading zeros.
Categorize these free-text 'reason for cancellation' responses into: Price, Product Quality, Support, Other. Keep the original text. Too expensive for what it is kept crashing on my mac No one answered my ticket for a week Found a cheaper option idk, just not using it
Response, Category Too expensive for what it is, Price kept crashing on my mac, Product Quality No one answered my ticket for a week, Support Found a cheaper option, Price idk just not using it, Other
Write a VBA macro I can save in my Personal Macro Workbook. It should: trim all text cells, convert column A to Title Case, and remove fully blank rows in the active sheet.
Sub CleanSheet()
Dim c As Range, lastRow As Long
lastRow = ActiveSheet.UsedRange.Rows.Count
For Each c In ActiveSheet.UsedRange
If VarType(c.Value) = vbString Then c.Value = Trim(c.Value)
Next c
For Each c In Range("A2:A" & lastRow)
If c.Value <> "" Then c.Value = Application.WorksheetFunction.Proper(c.Value)
Next c
For i = lastRow To 2 Step -1
If Application.WorksheetFunction.CountA(Rows(i)) = 0 Then Rows(i).Delete
Next i
End Sub
Common mistakes to avoid
-
Pasting 10,000 rows into chat
ChatGPT will silently truncate or summarize large inputs and return results for only the first chunk. You think you cleaned the whole file; you cleaned 200 rows. Use formulas or scripts for anything above a few hundred rows.
-
Trusting the math without checking
ChatGPT is a language model, not a calculator. When asked to sum, average, or recompute values directly, it will produce confident wrong answers. Have it write the formula and let Excel do the math.
-
Vague rules like 'clean this up'
Without specific rules, ChatGPT invents its own standards (which date format? which case?). Results look clean but aren't consistent with your other data. Always spell out the target format.
-
Skipping the reusability question
A one-time paste-and-clean is fine for today. Next month when a new export arrives, you're doing it all over again. Ask for a macro, Office Script, or formula set the first time so you have a repeatable pipeline.
-
Ignoring hidden characters
Non-breaking spaces, zero-width characters, and smart quotes cause matches to fail silently. Tell ChatGPT explicitly 'also strip non-breaking spaces (CHAR(160)) and normalize smart quotes to straight quotes' or your dedupe will miss rows that look identical.
Related queries
Frequently asked questions
Can ChatGPT read my Excel file directly?
Yes, if you use ChatGPT with Advanced Data Analysis (Plus/Team/Enterprise), you can upload .xlsx files and it will run Python on them. The free version cannot read files; you need to paste data as text or ask for a formula.
Is it safe to paste company data into ChatGPT?
Not if the data is confidential, regulated, or contains PII. ChatGPT may use inputs for training on consumer plans. Use a Team/Enterprise plan with training disabled, or have ChatGPT write a formula/macro so the data never leaves Excel.
How many rows can ChatGPT clean at once?
Practically, a few hundred rows pasted as text before context limits or truncation become issues. For larger files, upload via Advanced Data Analysis or have ChatGPT generate a formula or VBA macro you run locally.
What's better for Excel cleanup: ChatGPT or Power Query?
Power Query is better for repeatable, auditable transforms on large files and for scheduled refreshes. ChatGPT is faster for one-off cleanup, fuzzy matching, and writing the Power Query M code or formulas you'd otherwise have to research. Combine them: use ChatGPT to write the M code, run it in Power Query.
Which ChatGPT model is best for Excel tasks?
For formula writing and VBA, GPT-4 class models (GPT-4o, GPT-4.1) are noticeably more accurate than smaller models. For large file analysis, use a model with Advanced Data Analysis enabled. The comparison table above shows each model's output on the same prompt.
Can ChatGPT write Office Scripts instead of VBA?
Yes. Just ask: 'Write this as an Office Script in TypeScript for Excel on the web.' Office Scripts work in Excel Online and can be triggered from Power Automate, which makes them better than VBA for cloud workflows.