How to Clean Excel Spreadsheets Using ChatGPT Prompts

Tested prompts for use chatgpt to clean excel data compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Haiku 4.5 9/10

If you've landed here, you probably have a messy Excel file open right now. Inconsistent date formats, names in mixed case, duplicate rows with tiny differences, trailing spaces you can't even see. You've heard ChatGPT can help, but you're not sure whether to paste the data in, ask for a formula, or get a macro. This page shows you all three approaches and when each one wins.

ChatGPT is useful for Excel cleanup in two distinct modes: as a formula and macro generator when your data is too large or sensitive to share, and as a direct data cleaner when you can paste a few hundred rows into the chat. The right mode depends on row count, sensitivity, and whether you need a repeatable process.

Below you'll find a tested prompt, four model outputs side by side, real scenarios (deduping a CRM export, standardizing a product catalog, fixing survey responses), and the mistakes that cause wrong results. Use the comparison table to pick the model, and the prompts here to do the actual work.

When to use this

ChatGPT works best on Excel cleanup when the rules are describable in plain English but tedious to implement by hand. Think standardization, deduplication with fuzzy matching, splitting or combining fields, and generating reusable formulas. It shines when your dataset is under a few thousand rows or when you need a formula you can apply to millions of rows yourself.

  • Standardizing inconsistent formats (dates, phone numbers, country names, capitalization)
  • Writing complex formulas like nested IFs, TEXTSPLIT, or regex-based extraction
  • Deduplicating rows where matches are fuzzy (typos, extra spaces, different casing)
  • Splitting unstructured text columns into clean structured fields
  • Generating a VBA macro or Office Script to repeat the cleanup on future files

When this format breaks down

  • Datasets over ~5,000 rows pasted directly into chat. Ask for a formula or script instead.
  • Files containing PII, financial records, or confidential business data that can't leave your systems. Use a formula approach so the data stays in Excel.
  • Cleanup that requires verifying against a source of truth (e.g., real customer addresses). ChatGPT will guess plausibly and be wrong.
  • Precise numerical transformations on large ranges. ChatGPT makes arithmetic errors; use Excel formulas it writes for you instead of having it compute values.

The prompt we tested

You are a data cleaning assistant specialized in Excel spreadsheets. Analyze the messy Excel data or data description provided and give clear, step-by-step cleaning instructions along with the exact Excel formulas, functions, or Power Query steps needed to clean it.

Instructions: Format the response with clear numbered sections and headings. Include exact Excel formulas in code blocks (e.g., =TRIM(A2)) and keep explanations concise and actionable. Limit the response to under 500 words and prioritize the most impactful cleaning steps first.

Messy Excel data or description:
I have a customer list in Excel with about 2000 rows. The names have inconsistent capitalization (some ALL CAPS, some lowercase), phone numbers are formatted differently (some with dashes, some with parentheses, some just digits), and there are duplicate entries and trailing spaces in the email column. Column headers are: Name, Email, Phone, City.

Provide: (1) a list of data quality issues detected, (2) step-by-step cleaning actions with exact Excel formulas or menu paths, (3) a cleaned sample output showing the before/after, and (4) a reusable formula or macro the user can apply to the full dataset.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.1521 25.4s 411 / 1946
Claude Haiku 4.5 9 $0.0053 9.8s 281 / 1002
GPT-5.4Not run yet
Gemini 2.5 Pro 7 $0.0015 18.4s 251 / 242
Grok 4.1 Fast Reasoning 8 $0.0004 12.1s 393 / 660

Outputs

Claude Opus 4.7 9/10
# Excel Customer List Cleaning Guide

## 1. Data Quality Issues Detected

- **Name column**: Inconsistent capitalization (ALL CAPS, lowercase, mixed)
- **Email column**: Leading/trailing spaces, possible mixed case
- **Phone column**: Inconsistent formats — `(555) 123-4567`, `555-123-4567`, `5551234567`
- **Duplicate rows**: Full or partial duplicates across records
- **Potential whitespace** in all text fields

---

## 2. Step-by-Step Cleaning Actions

### Step 1: Create Helper Columns (E–H)
Insert new headers: `Name_Clean`, `Email_Clean`, `Phone_Clean`, `City_Clean`.

### Step 2: Clean Names (Proper Case)
In **E2**:
```excel
=PROPER(TRIM(CLEAN(A2)))
```

### Step 3: Clean Emails (Lowercase + Trim)
In **F2**:
```excel
=LOWER(TRIM(CLEAN(B2)))
```

### Step 4: Standardize Phone Numbers
Strip all non-digits, then reformat to `(XXX) XXX-XXXX`.
In **G2**:
```excel
=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""))=10,
  "("&LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""),3)&") "&
  MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""),4,3)&"-"&
  RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")," ",""),".",""),4),
  "INVALID")
```

### Step 5: Clean City
In **H2**:
```excel
=PROPER(TRIM(CLEAN(D2)))
```

### Step 6: Fill Down & Convert to Values
Select E2:H2 → double-click fill handle → copy → **Paste Special > Values**.

### Step 7: Remove Duplicates
Select columns E–H → **Data → Remove Duplicates** → check Email_Clean (primary key).

---

## 3. Before / After Sample

| Before (Name) | Before (Email) | Before (Phone) | After (Name) | After (Email) | After (Phone) |
|---|---|---|---|---|---|
| JOHN SMITH | `  John@X.com ` | 5551234567 | John Smith | john@x.com | (555) 123-4567 |
| mary jones | `MARY@y.com` | (555) 987-6543 | Mary Jones | mary@y.com | (555) 987-6543 |
| Bob LEE | `bob@z.com ` | 555.444.3333 | Bob Lee | bob@z.com | (555) 444-3333 |

---

## 4. Reusable All-in-One Solution

### Option A: Single Array Formula (Excel 365)
In **E2** (spills to E:H):
```excel
=LET(
  n,A2:A2000, e,B2:B2000, p,C2:C2000, c,D2:D2000,
  digits,LAMBDA(x,TEXTJOIN("",TRUE,IFERROR(--MID(x,SEQUENCE(LEN(x)),1),""))),
  HSTACK(
    PROPER(TRIM(n)),
    LOWER(TRIM(e)),
    IF(LEN(BYROW(p,digits))=10,
       "("&LEFT(BYROW(p,digits),3)&") "&MID(BYROW(p,digits),4,3)&"-"&RIGHT(BYROW(p,digits),4),
       "INVALID"),
    PROPER(TRIM(c))
  )
)
```

### Option B: VBA Macro (any Excel version)
```vba
Sub CleanCustomerList()
    Dim r As Range, lastRow As Long, d As String, i As Long
    lastRow = Cells(Rows.Count, 1).End(xlUp).Row
    For i = 2 To lastRow
        Cells(i,1) = Application.Proper(Trim(Cells(i,1)))
        Cells(i,2) = LCase(Trim(Cells(i,2)))
        Cells(i,4) = Application.Proper(Trim(Cells(i,4)))
        d = ""
        For Each r In Array(Cells(i,3))
            Dim ch As Integer
            For ch = 1 To Len(r.Value)
                If Mid(r.Value,ch,1) Like "#" Then d = d & Mid(r.Value,ch,1)
            Next ch
        Next r
        If Len(d) = 10 Then Cells(i,3) = "(" & Left(d,3) & ") " & Mid(d,4,3) & "-" & Right(d,4)
    Next i
    ActiveSheet.Range("A1").CurrentRegion.RemoveDuplicates Columns:=Array(1,2,3,4), Header:=xlYes
End Sub
```

Run via **Alt+F11 → Insert Module → Paste → F5**.
Judge notes: Excellently addresses the query with comprehensive, actionable guidance including exact formulas, step-by-step instructions, before/after examples, and multiple reusable solutions (array formula and VBA macro), with only minor room for improvement in formula complexity documentation.
Claude Haiku 4.5 9/10
# Data Cleaning Guide: Customer List

## 1. Data Quality Issues Detected

- **Names**: Inconsistent capitalization (ALL CAPS, lowercase, mixed case)
- **Phone Numbers**: Multiple formats (dashes, parentheses, no formatting)
- **Email**: Trailing/leading spaces
- **Duplicates**: Repeated customer entries
- **General**: Potential blank cells or hidden characters

---

## 2. Step-by-Step Cleaning Actions

### Step 1: Remove Trailing/Leading Spaces from Email
**Action**: Create a helper column with cleaned emails.

```excel
=TRIM(D2)
```

Apply to column E (Email_Clean), then copy and paste values back to column D.

---

### Step 2: Standardize Name Capitalization
**Action**: Convert names to proper case (first letter of each word capitalized).

```excel
=PROPER(A2)
```

Apply to column F (Name_Clean), then paste values back to column A.

---

### Step 3: Standardize Phone Numbers
**Action**: Remove all formatting characters, keep only digits.

```excel
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")
```

This removes dashes and parentheses. Apply to column G (Phone_Clean).

**Optional**: Format cleaned phone numbers consistently using the format `XXX-XXX-XXXX`:

```excel
=TEXT(VALUE(G2),"[<=9999999]###-####;(###) ###-####")
```

---

### Step 4: Remove Duplicate Entries
**Action**: Use Excel's built-in deduplication tool.

1. Select all data (including headers)
2. Go to **Data** → **Remove Duplicates**
3. Select all columns and click OK
4. Excel will remove exact duplicate rows

---

### Step 5: Clean Up Helper Columns
**Action**: Delete columns E, F, G (helper columns) after confirming cleaned data is correct.

---

## 3. Before/After Sample Output

| **Before** | **After** |
|---|---|
| Name: `JOHN SMITH` | Name: `John Smith` |
| Email: `john@example.com ` | Email: `john@example.com` |
| Phone: `(555) 123-4567` | Phone: `555-123-4567` |
| **Duplicate row detected** | **Removed** |

---

## 4. Reusable Formula/Macro for Full Dataset

### Option A: Single Master Formula (Recommended for 2000 rows)

In a new sheet, create a combined formula in one column:

```excel
=PROPER(TRIM(A2)) & "|" & TRIM(D2) & "|" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"-",""),"(",""),")","")
```

This concatenates cleaned Name | Email | Phone in one cell for verification.

### Option B: Quick Macro for Automation

1. Press **Alt + F11** to open VBA Editor
2. Insert a new module and paste:

```vba
Sub CleanCustomerData()
    Dim ws As Worksheet
    Set ws = ActiveSheet
    Dim lastRow As Long
    lastRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
    
    ' Clean Names (Column A)
    ws.Range("A2:A" & lastRow).Value = Evaluate("=PROPER(" & ws.Range("A2:A" & lastRow).Address & ")")
    
    ' Clean Emails (Column D)
    ws.Range("D2:D" & lastRow).Value = Evaluate("=TRIM(" & ws.Range("D2:D" & lastRow).Address & ")")
    
    MsgBox "Cleaning complete!"
End Sub
```

3. Run macro via **Alt + F8**

---

## 5. Final Validation Steps

1. Spot-check 10–20 random rows for accuracy
2. Use **Data** → **Text to Columns** if phone numbers still need formatting
3. Sort by Name and Email to visually identify remaining duplicates
4. Save file with backup before applying changes to original
Judge notes: The output excellently satisfies the query intent with comprehensive step-by-step cleaning instructions, exact formulas, before/after examples, and reusable solutions; the only minor limitation is the macro code being incomplete (missing phone standardization logic) and the concatenation formula in Option A being less practical than presented.
Gemini 2.5 Pro 7/10
Here is a data cleaning plan for your customer list.

### 1. Data Quality Issues Detected

*   **Inconsistent Text Case:** The `Name` column has a mix of uppercase, lowercase, and potentially mixed-case entries.
*   **Inconsistent Formatting:** The `Phone` column uses multiple formats (dashes, parentheses, spaces).
*   **Leading/Trailing Spaces:** The `Email` column contains extra spaces that affect data integrity and lookups.
*   **Duplicate Records:** The dataset contains entire rows that are duplicates.

### 2. Step-by-Step Cleaning Plan

This plan uses helper columns to build the clean data without destroying the original. Assume your data is in columns A:D.

**Step 1: Clean Email and Name Columns**

Insert two new columns (E and F).

*   In cell E2, clean the emails by removing extra spaces using the TRIM function.
*   In cell F2, standardize the name capitalization to "Proper Case" using the PROPER function.

**Formulas:**
*   Clean Email (in cell E2):
    ```excel
    =TRIM
Judge notes: Response directly addresses the query with relevant formulas and a structured approach, but is incomplete (cuts off mid-formula) and lacks the promised before/after sample output and reusable macro that were explicitly requested.
Grok 4.1 Fast Reasoning 8/10
### 1. Data Quality Issues Detected
- Inconsistent name capitalization (e.g., "JOHN DOE" vs. "john doe").
- Varied phone formats (e.g., "(123) 456-7890" vs. "1234567890").
- Trailing/leading spaces in emails (e.g., "user@ex.com ").
- Duplicate rows across Name/Email/Phone.
- Potential spaces in other columns.

### 2. Step-by-Step Cleaning Actions
1. **Trim all text columns**: Select Name:Email:Phone:City > Data > Text to Columns > Finish (trims). Or use helper columns.
2. **Standardize names**: In helper col E (Name cleaned): `=PROPER(TRIM(Name2))`. Copy down, paste values.
3. **Standardize phones** (to digits only): In F (Phone cleaned):  
   ```
   =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(Phone2),"(",""),")",""),"-","")," ",""),"+","")
   ```
   Copy down, paste values.
4. **Trim emails**: In G (Email cleaned): `=TRIM(Email2)`. Copy down, paste values.
5. **Remove duplicates**: Select all data > Data > Remove Duplicates (check all columns).
6. **Power Query for full clean** (reusable): Data > From Table/Range > Power Query Editor:  
   - Add Column > Custom: `Text.Proper(Text.Trim([Name]))` → Name Clean.  
   - Phone: `Text.Select(Text.Replace(Text.Replace([Phone],"-"," "), " ",""), {"0".."9"})` → Phone Clean.  
   - Email: `Text.Trim([Email])`.  
   - Remove originals > Close & Load.

### 3. Before/After Sample
| Name (Before) | Email (Before) | Phone (Before) | Name (After) | Email (After) | Phone (After) |
|---------------|----------------|----------------|--------------|---------------|---------------|
| JOHN DOE     | user@ex.com   | (123)456-7890 | John Doe    | user@ex.com  | 1234567890   |
| john doe     | user@ex.com   | 1234567890    | John Doe    | user@ex.com  | 1234567890   |
| JANE SMITH   | jane@ex.com  | 123-456-7890  | Jane Smith  | jane@ex.com  | 1234567890   |

### 4. Reusable Formula for Full Dataset
Paste in new sheet Row 2 (data Row 2+):  
```
Name: =PROPER(TRIM(A2))
Email: =TRIM(B2)
Phone: =TEXTJOIN("",TRUE,IF(ISNUMBER(--MID(C2,ROW(INDIRECT("1:"&LEN(C2))),1)),MID(C2,ROW(INDIRECT("1:"&LEN(C2))),1),""))  //Array-enter if pre-365
```
Copy down to 2000 rows > Paste Special Values > Remove Dups. (187 words)
Judge notes: Excellently addresses the query with clear structure, practical formulas, and reusable solutions, though the Power Query syntax has minor inaccuracies and the phone extraction formula could be slightly more accessible for non-advanced users.

What makes these work

  1. 01
    Paste a sample, not the whole file

    Give ChatGPT 10-20 representative rows including your edge cases (the weird ones that break normal rules). Ask it to return a formula or script you apply to the full file. This avoids token limits and keeps sensitive data out of the chat.

  2. 02
    State the output format explicitly

    Say 'return as a pipe-delimited table' or 'return only the formula, no explanation'. Without this, ChatGPT adds commentary that breaks copy-paste workflows. Specify column order too if it matters.

  3. 03
    Show before and after for one row

    The fastest way to communicate a cleanup rule is a worked example. Write 'Input: john SMITH → Output: John Smith' before listing your data. This disambiguates rules that English descriptions leave fuzzy.

  4. 04
    Ask for the formula, then verify on edge cases

    When ChatGPT gives you a TEXTSPLIT or regex formula, test it on your weirdest three rows before applying to 50,000. Ask 'what inputs would break this formula?' as a follow-up to surface problems before they hit production.

More example scenarios

#01 · Standardizing a messy CRM contact export
Input
Clean this contact list. Standardize names to Title Case, normalize phone numbers to +1-XXX-XXX-XXXX format, and lowercase all emails. Flag any rows with missing phone or email.

Name, Phone, Email
john SMITH, 5551234567, JOHN@ACME.com
Mary o'brien, (555) 987-6543, mary@acme
bob lee, , bob@acme.com
Expected output
Name, Phone, Email, Flag
John Smith, +1-555-123-4567, john@acme.com, OK
Mary O'Brien, +1-555-987-6543, (invalid email), Missing valid email
Bob Lee, (missing), bob@acme.com, Missing phone

Notes: 'mary@acme' is missing a TLD; recommend verifying. Apostrophe preserved in O'Brien.
#02 · Deduplicating a product catalog with near-matches
Input
Find duplicates in this SKU list. Treat rows as duplicates if product names match after removing punctuation, extra spaces, and case differences.

SKU, Product Name
A100, Blue Widget - Large
A101, blue widget large
A102, Blue Widget, Large
B200, Red Gadget
B201, Red Gadget (2024)
Expected output
Duplicate group 1: A100, A101, A102 (normalized: 'blue widget large'). Keep A100, mark others as duplicates.

Duplicate group 2: none. B200 and B201 differ by year qualifier and should be treated as distinct.

Suggested formula for column D: =LOWER(TRIM(SUBSTITUTE(SUBSTITUTE(B2,"-"," "),","," ")))
#03 · Splitting a full-address column
Input
Split this Address column into Street, City, State, ZIP. Write an Excel formula I can drag down, not a one-time result. Sample:

123 Main St, Springfield, IL 62701
45 Oak Ave Apt 2, Chicago, IL 60614
Expected output
Use these formulas (assuming address in A2):

Street: =TRIM(TEXTBEFORE(A2,",",1))
City: =TRIM(TEXTBEFORE(TEXTAFTER(A2,",",1),","))
State: =TRIM(LEFT(TEXTAFTER(A2,",",2),3))
ZIP: =TRIM(RIGHT(A2,5))

These work for US addresses in the 'street, city, ST ZIP' pattern. Validate the ZIP column as text to preserve leading zeros.
#04 · Cleaning open-ended survey responses
Input
Categorize these free-text 'reason for cancellation' responses into: Price, Product Quality, Support, Other. Keep the original text.

Too expensive for what it is
kept crashing on my mac
No one answered my ticket for a week
Found a cheaper option
idk, just not using it
Expected output
Response, Category
Too expensive for what it is, Price
kept crashing on my mac, Product Quality
No one answered my ticket for a week, Support
Found a cheaper option, Price
idk just not using it, Other
#05 · Generating a VBA macro for repeated cleanup
Input
Write a VBA macro I can save in my Personal Macro Workbook. It should: trim all text cells, convert column A to Title Case, and remove fully blank rows in the active sheet.
Expected output
Sub CleanSheet()
  Dim c As Range, lastRow As Long
  lastRow = ActiveSheet.UsedRange.Rows.Count
  For Each c In ActiveSheet.UsedRange
    If VarType(c.Value) = vbString Then c.Value = Trim(c.Value)
  Next c
  For Each c In Range("A2:A" & lastRow)
    If c.Value <> "" Then c.Value = Application.WorksheetFunction.Proper(c.Value)
  Next c
  For i = lastRow To 2 Step -1
    If Application.WorksheetFunction.CountA(Rows(i)) = 0 Then Rows(i).Delete
  Next i
End Sub

Common mistakes to avoid

  • Pasting 10,000 rows into chat

    ChatGPT will silently truncate or summarize large inputs and return results for only the first chunk. You think you cleaned the whole file; you cleaned 200 rows. Use formulas or scripts for anything above a few hundred rows.

  • Trusting the math without checking

    ChatGPT is a language model, not a calculator. When asked to sum, average, or recompute values directly, it will produce confident wrong answers. Have it write the formula and let Excel do the math.

  • Vague rules like 'clean this up'

    Without specific rules, ChatGPT invents its own standards (which date format? which case?). Results look clean but aren't consistent with your other data. Always spell out the target format.

  • Skipping the reusability question

    A one-time paste-and-clean is fine for today. Next month when a new export arrives, you're doing it all over again. Ask for a macro, Office Script, or formula set the first time so you have a repeatable pipeline.

  • Ignoring hidden characters

    Non-breaking spaces, zero-width characters, and smart quotes cause matches to fail silently. Tell ChatGPT explicitly 'also strip non-breaking spaces (CHAR(160)) and normalize smart quotes to straight quotes' or your dedupe will miss rows that look identical.

Related queries

Frequently asked questions

Can ChatGPT read my Excel file directly?

Yes, if you use ChatGPT with Advanced Data Analysis (Plus/Team/Enterprise), you can upload .xlsx files and it will run Python on them. The free version cannot read files; you need to paste data as text or ask for a formula.

Is it safe to paste company data into ChatGPT?

Not if the data is confidential, regulated, or contains PII. ChatGPT may use inputs for training on consumer plans. Use a Team/Enterprise plan with training disabled, or have ChatGPT write a formula/macro so the data never leaves Excel.

How many rows can ChatGPT clean at once?

Practically, a few hundred rows pasted as text before context limits or truncation become issues. For larger files, upload via Advanced Data Analysis or have ChatGPT generate a formula or VBA macro you run locally.

What's better for Excel cleanup: ChatGPT or Power Query?

Power Query is better for repeatable, auditable transforms on large files and for scheduled refreshes. ChatGPT is faster for one-off cleanup, fuzzy matching, and writing the Power Query M code or formulas you'd otherwise have to research. Combine them: use ChatGPT to write the M code, run it in Power Query.

Which ChatGPT model is best for Excel tasks?

For formula writing and VBA, GPT-4 class models (GPT-4o, GPT-4.1) are noticeably more accurate than smaller models. For large file analysis, use a model with Advanced Data Analysis enabled. The comparison table above shows each model's output on the same prompt.

Can ChatGPT write Office Scripts instead of VBA?

Yes. Just ask: 'Write this as an Office Script in TypeScript for Excel on the web.' Office Scripts work in Excel Online and can be triggered from Power Automate, which makes them better than VBA for cloud workflows.