Use Microsoft Copilot to Clean Excel Data Automatically

Tested prompts for excel copilot data cleanup compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 9/10

If you're searching for 'excel copilot data cleanup', you probably have a spreadsheet with inconsistent formatting, duplicate rows, mixed date formats, trailing spaces, or mismatched casing, and you want Microsoft Copilot in Excel to fix it without writing formulas or Power Query steps yourself. The good news: Copilot can identify dirty data patterns and either suggest formulas, trigger built-in cleanup actions, or generate Power Query M code for repeatable cleanup.

The catch is that Copilot's quality depends heavily on how you describe the mess. 'Clean this data' gets you vague suggestions. 'Standardize the Phone column to (XXX) XXX-XXXX format, remove rows where Email is blank, and trim whitespace from all text columns' gets you working output. This page shows tested prompts and compares how four models handle the same cleanup task, so you can see which one actually ships usable results for your sheet.

Below you'll find when this workflow fits, when you should reach for Power Query or a Python script instead, realistic cleanup examples across different data types, and the mistakes that waste the most time.

When to use this

Use Copilot in Excel for data cleanup when your dataset sits in a formatted Excel table, the problems are describable in plain English, and you want results in minutes rather than building a repeatable pipeline. It shines for one-off cleanups, exploratory tidying before analysis, and generating formulas you can reuse.

  • Standardizing inconsistent text entries (casing, whitespace, abbreviations) across a column
  • Finding and flagging duplicates based on fuzzy criteria like name plus email
  • Splitting or merging columns when the pattern is describable (full name into first/last, address into parts)
  • Converting mixed date or number formats into a single standard
  • Generating formulas or conditional formatting rules to catch future dirty entries

When this format breaks down

  • Datasets over ~1.5 million rows or anything Copilot refuses to process, where Power Query or Python pandas is faster and more reliable
  • Cleanup that needs to run on a schedule or against new files weekly, where a Power Query or Office Script is more maintainable
  • Highly regulated data (PHI, financial records) where sending content to cloud AI violates your org's policy
  • Cleanup that requires external lookup, like validating addresses against USPS or matching company names to a CRM

The prompt we tested

You are an expert Microsoft Excel Copilot assistant specializing in data cleanup. A user will describe their messy Excel dataset and what they need fixed. Your job is to produce a clear, step-by-step Copilot workflow they can follow inside Excel to clean the data.

Follow these instructions strictly:
Respond with a numbered step-by-step guide (5-10 steps) that includes the exact Copilot prompts to type into the Excel Copilot pane (in quotes), plus any manual fallback actions if Copilot cannot perform the step. Cover issues like duplicates, inconsistent formatting, missing values, split/merged columns, and trimming whitespace when relevant. Keep each step under 2 sentences and end with a short 'Verification checklist' of 3 bullet points.

User's data situation:
I have an Excel sheet with 4,000 rows of customer contacts exported from our CRM. The Name column mixes 'Last, First' and 'First Last' formats, phone numbers have random dashes, spaces, and parentheses, and there are obvious duplicate emails with different capitalization. I need it cleaned up for a mail merge.

Produce the Copilot data cleanup guide now.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.0832 16.9s 407 / 1028
Claude Haiku 4.5 8 $0.0046 9.6s 281 / 864
GPT-5.4Not run yet
Gemini 2.5 Pro 8 $0.0027 17.7s 240 / 487
Grok 4.1 Fast Reasoning 8 $0.0003 9.0s 383 / 446

Outputs

Claude Opus 4.7 9/10
# Excel Copilot Cleanup Guide: CRM Contact List

1. **Back up your data first.** Duplicate the sheet (right-click tab → Move or Copy → check "Create a copy") so Copilot edits happen on a working copy, not your original export.

2. **Convert the range to a Table.** Select your data and press `Ctrl+T`; Copilot works far more reliably on structured tables than raw ranges.

3. **Standardize the Name column into First and Last.** Open the Copilot pane and prompt: *"Split the Name column into two new columns called First Name and Last Name. If a value contains a comma, treat it as 'Last, First'; otherwise treat it as 'First Last'. Trim any extra spaces."* Manual fallback: insert two columns and use `=TRIM(IFERROR(MID(A2,FIND(", ",A2)+2,99),LEFT(A2,FIND(" ",A2)-1)))` for First and a mirrored formula for Last.

4. **Normalize phone numbers to a clean format.** Prompt Copilot: *"In the Phone column, remove all parentheses, dashes, spaces, and dots, then reformat every value as (XXX) XXX-XXXX. Leave blanks as blanks."* Manual fallback: use a helper column with `=TEXT(VALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"(",""),")",""),"-","")," ","")),"(000) 000-0000")`.

5. **Lowercase all emails and trim whitespace.** Prompt: *"Convert every value in the Email column to lowercase and trim leading or trailing spaces."* Manual fallback: helper column with `=LOWER(TRIM(C2))`, then paste-values back over the original.

6. **Remove duplicate emails.** Prompt: *"Remove duplicate rows based on the Email column, keeping the first occurrence."* Manual fallback: Data tab → Remove Duplicates → check only the Email column.

7. **Flag missing critical fields for mail merge.** Prompt: *"Add a column called Issues that lists any of the following problems per row: missing first name, missing last name, missing or invalid email, phone not 10 digits."* Review and fix flagged rows manually.

8. **Sort and finalize for mail merge.** Prompt: *"Sort the table alphabetically by Last Name, then First Name, and hide any helper columns."* Save the file as a new `.xlsx` specifically for the merge.

---

**Verification checklist**
- ✅ Spot-check 10 random rows: names split correctly regardless of original format, and phone numbers all match `(XXX) XXX-XXXX`.
- ✅ Run `=COUNTA(Email)` vs `=SUMPRODUCT(1/COUNTIF(Email,Email))` — the two numbers should now be equal (no duplicates).
- ✅ Confirm the Issues column is empty (or resolved) for every row before starting the mail merge.
Judge notes: Excellent response that directly fulfills the query intent with a comprehensive, actionable step-by-step Copilot workflow including exact prompts, manual fallbacks, and a practical verification checklist tailored to the user's specific data problems.
Claude Haiku 4.5 8/10
# Excel Copilot Data Cleanup Guide: Customer Contacts

## Step-by-Step Workflow

**Step 1: Assess and Organize Your Data**
First, select your entire dataset (Ctrl+A) and apply AutoFilter (Data > Filter) to see column headers clearly. Type this prompt into the Copilot pane: *"Show me a summary of the data quality issues in my customer contacts sheet—count duplicates, inconsistent formats, and missing values."*

**Step 2: Standardize Name Format**
Create two helper columns next to your Name column labeled "First" and "Last." Type: *"Split the Name column into First and Last name, handling both 'Last, First' and 'First Last' formats. Put results in columns B and C."* If Copilot struggles, manually use Data > Text to Columns on a subset first to show the pattern.

**Step 3: Merge Names Back to Consistent Format**
In a new column, type: *"Create a formula that combines the First and Last name columns into 'First Last' format for all 4,000 rows, then convert the formulas to values."* Copy the new column and paste as values over your original Name column, then delete the helper columns.

**Step 4: Standardize Phone Numbers**
Select your Phone column and prompt: *"Remove all dashes, spaces, and parentheses from phone numbers and format them consistently as (XXX) XXX-XXXX for a US phone list."* If Copilot uses formulas, convert results to values before deleting the original column.

**Step 5: Remove Duplicate Emails with Case-Insensitive Matching**
Highlight your Email column and type: *"Identify and flag duplicate emails regardless of uppercase/lowercase differences, keeping only the first occurrence of each unique email address."* Manually review flagged rows before deletion, then use Data > Remove Duplicates if needed for final cleanup.

**Step 6: Trim Whitespace from All Text Columns**
Select all text columns (Name, Email, Phone, Address) and prompt: *"Remove leading and trailing spaces from all cells in these columns across all 4,000 rows."* Verify the action completed by clicking a few cells to confirm no extra spaces remain.

**Step 7: Handle Missing Values**
Type: *"Show me which rows have blank Email or Phone fields and highlight them for review."* Manually decide whether to delete incomplete records or request missing data from your CRM; mark any retained blanks with "N/A" for consistency.

**Step 8: Convert Email to Lowercase**
Select the Email column and prompt: *"Convert all email addresses to lowercase to ensure consistency for mail merge."* Convert formulas to values once complete.

**Step 9: Final Deduplication and Sort**
Type: *"Remove any remaining duplicate rows based on Email address, and sort the final dataset alphabetically by Last Name."* Review the row count before/after to confirm duplicates were removed.

**Step 10: Verification and Export**
Prompt: *"Verify data quality: confirm no blank emails exist, all phone numbers follow (XXX) XXX-XXXX format, and all names are in 'First Last' format."* Save as .xlsx and test a sample mail merge with 10 rows to confirm formatting works.

---

## Verification Checklist

- ✓ **Name Format**: All names display as "First Last" with consistent capitalization and no commas
- ✓ **Phone Numbers**: All phones match (XXX) XXX-XXXX format with no dashes, spaces, or parentheses irregularities
- ✓ **Email Duplicates Removed**: No duplicate emails exist (case-insensitive), and all emails are lowercase with no leading/trailing spaces
Judge notes: Excellent step-by-step guide with specific Copilot prompts and fallback actions that directly addresses all mentioned data issues; minor deduction only because some prompts assume Copilot capabilities that may vary by Excel version and the guide could benefit from explicit warning about testing on sample data first.
Gemini 2.5 Pro 8/10
Of course. Here is a step-by-step Copilot workflow to clean your customer contact data for a mail merge.

***

1.  **Select your data and trim whitespace.**
    First, click on any cell within your data range and press Ctrl+A to select it, then click "Format as Table" on the Home tab. Next, ask Copilot: `"Trim whitespace from all columns in the table."` This removes leading or trailing spaces that can cause issues with duplicates and formatting.

2.  **Standardize and deduplicate emails.**
    Ask Copilot to normalize the email addresses to prevent case-sensitive duplicates. Prompt: `"Create a new column named 'Clean_Email' that converts the 'Email' column to all lowercase, then remove duplicate rows based on this new column, keeping the first instance."`

3.  **Split the mixed-format Name column.**
    Copilot can recognize different name patterns and separate them correctly. Prompt: `"Split the 'Name' column into two new columns, 'First_Name' and 'Last_Name'."` If Copilot struggles, try giving it an example: `"Split the 'Name' column like this: 'Smith, John' becomes 'John' in 'First_Name' and 'Smith' in 'Last_Name'."`

4.  **Standardize all phone numbers.**
    This step will strip out all formatting, leaving only digits for a consistent phone list. Ask Copilot: `"Create a new 'Clean_Phone' column by removing all non-numeric characters from the 'Phone' column."`

5.  **Identify any missing contact information.**
    Find empty cells that could cause errors in your mail merge before you proceed. Prompt: `"Highlight all blank cells in the table in yellow."` For a manual fallback, use the Home tab > Find & Select > Go To Special > Blanks.

6.  **Finalize your clean dataset.**
    To finish, hide the old, messy columns, leaving only the clean data for your mail merge. Prompt: `"Hide the original 'Name', 'Email', and 'Phone' columns."`

***

### Verification Checklist:

*   The `First_Name` and `Last_Name` columns are correctly separated for all rows.
*   The `Clean_Phone` column
Judge notes: The output delivers a well-structured, practical step-by-step Copilot workflow with specific prompts and fallback actions that directly addresses the user's data cleanup needs; it's only slightly hindered by the incomplete verification checklist and lacks explicit confirmation that the approach handles the "Last, First" vs "First Last" name format ambiguity with full reliability.
Grok 4.1 Fast Reasoning 8/10
1. **Trim whitespace across all columns**: In the Copilot pane, type: "Trim leading and trailing spaces from all text columns in my table." If Copilot doesn't apply to all, manually select data > Data tab > Text to Columns > Finish (as a delimiter reset), then use Find & Replace for spaces.

2. **Standardize emails to lowercase**: In Copilot, type: "Create a new column converting the Email column to lowercase using LOWER function, then copy values back to original." Verify by sorting the Email column for consistency.

3. **Remove duplicate emails**: In Copilot, type: "Remove duplicate rows based on the Email column, keeping the first occurrence." Manually fallback: Data tab > Remove Duplicates > select Email column only.

4. **Clean phone numbers**: In Copilot, type: "Standardize the Phone column by removing dashes, spaces, parentheses, and formatting as (XXX) XXX-XXXX using SUBSTITUTE and TEXT functions in a new column." Paste values back and delete original if needed.

5. **Split Name column into First and Last**: In Copilot, type: "Split the Name column into First Name and Last Name: if comma present, split after comma and reverse; otherwise split on last space. Create new columns." Manually fallback: Add columns, use formulas like =TRIM(LEFT(Name,FIND(",",Name)-1)) for Last if comma.

6. **Fill missing values (if any)**: In Copilot, type: "Identify rows with missing values in key columns (Name, Email, Phone) and suggest filling with 'Unknown' or averages where appropriate." Manually: Use Go To Special > Blanks, then type formula like ="Unknown".

7. **Final deduplication and sort**: In Copilot, type: "Remove any remaining duplicates based on Email and Phone, then sort by Last Name ascending." Convert table to range if needed for mail merge.

**Verification checklist**:
- All emails are lowercase with no duplicates (check unique count).
- Phones match (XXX) XXX-XXXX format (spot-check 10 rows).
- Names split correctly into First/Last with no 'Last, First' leftovers (sort and scan).
Judge notes: Excellent step-by-step guide that directly addresses the user's data cleanup needs with specific Copilot prompts and practical fallbacks; minor weakness is that some prompts assume Copilot capabilities that may vary by Excel version, and the guide could explicitly mention testing on a sample first.

What makes these work

  1. 01
    Format your range as a table first

    Copilot works dramatically better on Excel tables (Ctrl+T) than on raw ranges because it can reference columns by name. Convert before prompting, and give columns clear headers like 'CustomerEmail' instead of 'Column B'.

  2. 02
    Describe the mess, not just the goal

    Say what's wrong ('some entries have trailing spaces, some use abbreviations like Corp. vs Corporation') before saying what you want. Copilot's formula choices depend on knowing the specific failure modes, not just the target state.

  3. 03
    Ask for a preview column, not in-place edits

    Have Copilot generate a new column with the cleaned value so you can compare side by side. Once you verify the output matches expectations across edge cases, copy-paste-values over the original. This prevents silent data loss.

  4. 04
    Request Power Query for anything recurring

    If you'll clean this same export next month, explicitly ask 'give me the Power Query M code for this' rather than a formula. You get a refreshable pipeline instead of a one-shot fix.

More example scenarios

#01 · CRM export with inconsistent company names
Input
A 4,000-row Excel table from HubSpot with a Company column containing entries like 'acme corp', 'ACME Corporation', 'Acme Corp.', 'acme  corp' (double space). I want to normalize these so all Acme variants become 'Acme Corp' and all other companies use Title Case with trailing punctuation removed.
Expected output
Copilot suggests adding a column with =PROPER(TRIM(SUBSTITUTE([@Company],".",""))) for general normalization, then a nested IF or SWITCH to map known variants: =IF(ISNUMBER(SEARCH("acme",[@Company])),"Acme Corp",PROPER(TRIM(SUBSTITUTE([@Company],".","")))). It also offers a Power Query alternative using Text.Trim, Text.Proper, and a Replace Values step.
#02 · Survey responses with mixed date formats
Input
Column B has 2,300 submission dates in formats like '3/5/2024', '2024-03-05', 'March 5, 2024', and '5-Mar-24'. Convert all to YYYY-MM-DD so I can sort chronologically.
Expected output
Copilot recommends a helper column with =TEXT(DATEVALUE([@SubmittedAt]),"yyyy-mm-dd") and flags rows where DATEVALUE fails so you can fix them manually. For the problem rows, it suggests an IFERROR wrapper and a separate 'NeedsReview' column marking which entries to inspect.
#03 · Deduplicating customer list on fuzzy match
Input
3,100 rows with Name, Email, and Phone. Same customer appears multiple times with slight variations: 'John Smith' vs 'john smith' vs 'J. Smith', same email with different capitalization. Flag duplicates based on normalized email, keep the row with the most complete data.
Expected output
Copilot adds a NormalizedEmail column with =LOWER(TRIM([@Email])), then a CompletenessScore column counting non-blank fields, then uses COUNTIF on NormalizedEmail to flag duplicates and RANK on score within each duplicate group. It suggests filtering where Rank>1 and deleting, or using Remove Duplicates on NormalizedEmail after sorting by score descending.
#04 · Financial export with currency strings
Input
Column D contains values like '$1,234.56', '1234.56 USD', '€1.234,56', and '-$500.00'. I need a clean numeric column so I can sum and pivot.
Expected output
Copilot generates =IFERROR(VALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE([@Amount],"$",""),",","")," USD","")),0) for USD-formatted values, and flags the Euro-formatted rows (which use comma as decimal) for separate handling with a conditional formula that swaps comma and period. It also suggests adding a Currency column to preserve the original unit.
#05 · Product catalog with malformed SKUs
Input
SKU column has 12,000 entries that should all be format 'ABC-12345' but many are 'abc12345', 'ABC 12345', 'ABC_12345', or have leading zeros missing. Standardize to uppercase letters, dash, 5-digit number with leading zeros.
Expected output
Copilot suggests a regex-style cleanup using multiple nested SUBSTITUTE calls plus UPPER, then TEXT to pad the numeric portion: =UPPER(LEFT([@SKU],3))&"-"&TEXT(VALUE(RIGHT(SUBSTITUTE(SUBSTITUTE([@SKU],"-",""),"_",""),5)),"00000"). For edge cases it recommends flagging rows where the letter prefix isn't exactly 3 characters.

Common mistakes to avoid

  • Prompting on unformatted ranges

    Copilot often refuses or gives generic advice when data isn't in a table. You lose 10 minutes trying to debug prompts when the real fix is one Ctrl+T away.

  • Trusting the first output without spot-checks

    Copilot formulas frequently work on the first 50 rows and silently fail on edge cases (blank cells, non-English characters, negative numbers). Always sort by the cleaned column ascending and descending and scan the extremes.

  • Asking for too many cleanups in one prompt

    'Clean dates, fix names, dedupe emails, standardize phones' produces mediocre output on all four. Split into separate prompts and chain the results, verifying each step before moving on.

  • Ignoring the locale problem

    European vs US number and date formats look similar but parse differently. If you tell Copilot 'these are dates' without specifying the source locale, you'll swap months and days on ambiguous entries like 03/05/2024 and not notice for weeks.

  • Overwriting the source column

    Telling Copilot to 'replace the values in column C' means you lose the original. If the cleanup is wrong, you can't recover without re-importing. Always output to a new column first.

Related queries

Frequently asked questions

Does Excel Copilot actually clean data or just suggest formulas?

Both, depending on your prompt. For simple operations like 'remove duplicates' or 'trim whitespace' it can execute directly. For anything requiring logic it generates formulas, Power Query steps, or conditional formatting rules that you apply. It doesn't reach into arbitrary cells and overwrite them without showing you what it's doing.

Do I need a Microsoft 365 Copilot license to clean Excel data this way?

Yes, the in-app Copilot pane in Excel requires a Microsoft 365 Copilot add-on license (around $30/user/month as of 2024). The free Copilot web chat can help you write formulas if you paste sample data in, but it can't act on your workbook directly.

Can Copilot handle cleanup on 100,000+ row spreadsheets?

It can suggest formulas and Power Query steps for large tables, but direct actions may time out or be refused on very large ranges. For anything above a few hundred thousand rows, have Copilot generate Power Query M code and run the transformation through the Query Editor instead.

How do I clean data with Copilot without exposing it to Microsoft's cloud?

You can't, fully. Copilot in Excel sends your prompt and relevant data context to Microsoft's cloud endpoints. If you're handling regulated data, either use Power Query with manual transformations, write VBA or Office Scripts, or process the file locally with Python and openpyxl.

What's better for recurring data cleanup, Copilot or Power Query?

Power Query for recurring jobs, every time. Copilot is faster for one-off cleanup and exploring what's wrong with a dataset, but Power Query gives you a refreshable, auditable pipeline. The best workflow is using Copilot to draft the M code, then saving it as a Power Query step.

Why does Copilot sometimes say it can't process my table?

Usually one of three reasons: the range isn't formatted as a table, the sheet has merged cells or blank header rows, or the data exceeds its processing limit for that operation. Convert to a table, unmerge cells, ensure every column has a header, and try again.