How to Remove Duplicate Rows in Excel Using AI Tools

Tested prompts for how to remove duplicates in excel using ai compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 9/10

You have an Excel file with duplicate rows and you want AI to handle the cleanup instead of wrestling with Remove Duplicates dialogs, COUNTIF formulas, or Power Query steps. Maybe the duplicates aren't exact matches. Maybe 'john@acme.com' and 'John@Acme.com ' should be treated as the same contact, or 'Acme Inc.' and 'Acme, Inc' are the same vendor. Standard Excel tools can't reason about that. AI can.

This page shows you how to remove duplicates in Excel using AI tools like ChatGPT, Claude, Gemini, and Copilot. You paste your data (or a sample), describe what counts as a duplicate, and the model returns a deduplicated version or a formula, macro, or Power Query script you can run yourself. It works for fuzzy matches, inconsistent formatting, and conditional dedup rules like 'keep the most recent row per customer'.

Below you'll find the exact prompt we tested, the outputs from four models side by side, plus guidance on when this approach is the right call and when you should stick with Excel's built-in features.

When to use this

Use AI for duplicate removal when the definition of 'duplicate' requires judgment, when data is messy across columns, or when you want a reusable formula or script rather than a one-time manual click. It also shines when your file mixes exact and near-duplicates, or when you need to keep specific rows based on a rule Excel's native dedup can't express.

  • Fuzzy duplicates: 'Jon Smith' vs 'John Smith', 'acme.com' vs 'ACME.COM'
  • Conditional dedup: keep the most recent transaction per customer ID
  • Multi-column logic: same email but different phone, pick the row with more complete data
  • Generating a reusable Power Query or VBA script you can run on future files
  • Cleaning a CRM export where duplicates hide behind inconsistent capitalization and whitespace

When this format breaks down

  • Files with sensitive data (PII, financial records, HIPAA) that can't be pasted into a public AI tool
  • Very large datasets over ~10,000 rows where pasting into a chat window is impractical; use Copilot in Excel or a generated script instead
  • Simple exact-match dedup on one column, where Data > Remove Duplicates takes 3 seconds
  • When you need guaranteed deterministic output for audit purposes and can't verify every AI-produced row

The prompt we tested

You are an Excel expert helping a user remove duplicate rows from their spreadsheet using AI-assisted methods. Based on the user's situation below, provide clear step-by-step instructions covering: (1) how to use ChatGPT or Copilot to identify and remove duplicates, (2) the exact prompts to use, and (3) how to apply the result back in Excel. Also briefly mention when built-in Excel features (Remove Duplicates, UNIQUE function) may be faster.

Format the response with numbered steps and short code or prompt blocks where relevant. Keep it under 400 words, use plain language, and tailor the advice to the user's specific columns and duplicate definition.

User's situation:
I have a customer list in Excel with about 8,000 rows and columns for Name, Email, Phone, and Signup Date. Some customers appear multiple times with slightly different capitalization in their email (e.g., John@Gmail.com vs john@gmail.com). I want to keep only one row per unique email and prefer the most recent signup date.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.0784 16.7s 348 / 976
Claude Haiku 4.5 8 $0.0035 8.8s 240 / 644
GPT-5.4Not run yet
Gemini 2.5 Pro 6 $0.0019 20.6s 225 / 323
Grok 4.1 Fast Reasoning 7 $0.0002 8.7s 363 / 339

Outputs

Claude Opus 4.7 9/10
# Removing Duplicates from Your Customer List

Since your duplicates involve **case differences in email** and you want to **keep the most recent signup date**, Excel's built-in Remove Duplicates won't work directly (it's case-insensitive but can't pick "newest"). Here's how to do it with AI assistance.

## Option A: Use Copilot in Excel (fastest if you have it)

1. Open your file and select your data range.
2. Open the Copilot pane and paste this prompt:

```
Deduplicate this table by Email (case-insensitive). 
For each unique email, keep only the row with the 
most recent Signup Date. Return the cleaned table.
```

3. Click **Keep** to insert the result into a new sheet.

## Option B: Use ChatGPT to generate a formula

1. In a new column E, normalize the email:
   ```
   =LOWER(TRIM(B2))
   ```
   Fill down through all 8,000 rows.

2. Sort the data by **Signup Date** (column D) **descending** so the newest row for each email is on top.

3. In column F, flag the first occurrence (which is now the newest):
   ```
   =IF(COUNTIF($E$2:E2,E2)=1,"Keep","Remove")
   ```
   Fill down.

4. Filter column F for "Remove" and delete those rows. Then delete helper columns E and F.

## Option C: Ask ChatGPT for a single UNIQUE-based formula

Paste this prompt into ChatGPT:

```
I have an Excel table A1:D8001 with headers Name, Email, 
Phone, Signup Date. Give me one dynamic array formula 
that returns unique rows by lowercased Email, keeping 
the row with the latest Signup Date for each email.
```

A working formula (Microsoft 365) looks like:

```
=LET(e,LOWER(B2:B8001),u,UNIQUE(e),
 idx,MAP(u,LAMBDA(x,XMATCH(MAX(IF(e=x,D2:D8001)),
 IF(e=x,D2:D8001)))),
 CHOOSE({1,2,3,4},INDEX(A:A,idx+1),INDEX(B:B,idx+1),
 INDEX(C:C,idx+1),INDEX(D:D,idx+1)))
```

Place it in an empty cell and it will spill the cleaned list.

## When built-in features are faster

- **Remove Duplicates** (Data tab): Great if case doesn't matter AND you don't care which row survives. Sort by Signup Date descending first, then use it on the Email column only — Excel keeps the first occurrence.
- **UNIQUE function**: Best for a quick list of distinct emails, but doesn't bring along the other columns with "newest" logic.

For your case, **sort-then-Remove-Duplicates** (on Email only) is actually the quickest manual method.
Judge notes: Excellently addresses the specific query with practical AI-assisted methods, clear step-by-step instructions, multiple viable options tailored to the user's exact situation (case-insensitive duplicates + keeping newest date), and appropriately mentions when built-in features suffice—only minor room for improvement would be showing actual ChatGPT output examples or addressing potential formula errors.
Claude Haiku 4.5 8/10
# Removing Duplicates from Your Customer List

## Quick Option: Use Excel's Built-In Features First
For your case, **try Excel's UNIQUE function first** (Excel 365):
```
=UNIQUE(A:D, 3, FALSE)
```
This removes duplicates based on column 3 (Email) and is case-insensitive. However, it won't automatically keep the most recent signup date—you'd need to sort first.

**Better built-in option:** Use Data → Remove Duplicates after sorting by Email (descending) and Signup Date (newest first). This keeps the first occurrence, which would be your most recent record.

---

## Using AI (ChatGPT/Copilot) for Custom Logic

If you want AI to handle the "keep most recent" requirement with mixed capitalization, follow these steps:

### Step 1: Prepare Your Data
- Copy your 8,000 rows (Name, Email, Phone, Signup Date) into a plain text file or paste directly into ChatGPT
- Include 20-30 sample rows as an example

### Step 2: Use This Prompt

**Prompt for ChatGPT:**
```
I have a customer list with columns: Name, Email, Phone, Signup Date.
Some emails have different capitalization (john@gmail.com vs John@Gmail.com).
I need to remove duplicates keeping only ONE row per unique email 
(case-insensitive), preferring the row with the most recent Signup Date.

Here's my sample data:
[PASTE 20-30 ROWS]

Please provide:
1. A Python script to deduplicate this
2. Instructions for applying it in Excel
```

### Step 3: Apply the Result
- Copy the Python script ChatGPT provides
- Run it via an online Python editor (replit.com) or your local Python
- Upload your full CSV file
- Download the cleaned output
- Paste back into Excel

---

## Recommended Approach for Your Situation

1. **Export** your data as CSV
2. **Use the Python method** (via ChatGPT's script) — it's fastest for 8,000 rows with conditional logic
3. **Re-import** the cleaned file into Excel

**Estimated time:** 5 minutes with AI vs. 20+ minutes manually sorting and removing duplicates.

---

## Why Not Just Use Excel?
- Remove Duplicates is case-sensitive (won't catch `john@gmail.com` vs `John@Gmail.com`)
- UNIQUE function is simpler but doesn't intelligently choose the most recent record
- AI handles complex deduplication logic better than Excel formulas for large datasets

**Stick with Excel if:** Your data is under 500 rows or you don't need the "most recent" logic.
Judge notes: The response directly addresses the query with practical step-by-step instructions for both built-in Excel features and AI-assisted methods, includes specific prompts and code examples, and accurately tailors advice to the user's duplicate-definition needs (case-insensitive email matching with date preference), though it could have included more explicit Excel formula alternatives or Copilot-specific guidance for slightly broader coverage.
Gemini 2.5 Pro 6/10
Of course. Here is a step-by-step guide to removing duplicates from your customer list using an AI assistant, tailored to your specific rules.

### Using AI to Remove Duplicates and Keep the Newest Record

This AI-assisted method is ideal for your situation because it can easily handle both the case-insensitive email matching and the logic of keeping the most recent signup date in a single step.

1.  **Prepare Your Data**
    Select and copy your entire customer list from Excel, including the header row (`Name`, `Email`, `Phone`, `Signup Date`).

2.  **Use the AI Prompt**
    Paste your copied data into ChatGPT or a similar AI tool and use the following prompt. This prompt is specific, telling the AI exactly how to define a duplicate and which record to keep.

    > I have pasted a table of customer data with the columns: Name, Email, Phone, and Signup Date.
    >
    > Please process this data to remove duplicate rows based on the 'Email' column. When comparing emails, ignore capitalization (e.g., 'John@Gmail.com' is the same as 'john@gmail.com').
    >
    > For any group of duplicate emails, please keep only the single row with the most recent 'Signup Date'.
    >
    > Provide the final, cleaned data as a tab-separated table that I can paste back into Excel.

3.  **Apply the Result in Excel**
    The AI will process your request and generate a clean, de
Judge notes: The response directly addresses the query with a practical AI-assisted approach and tailored prompt, but it's incomplete (cuts off mid-sentence), lacks mention of built-in Excel alternatives as promised, and doesn't discuss limitations or verification steps for handling 8,000 rows through AI.
Grok 4.1 Fast Reasoning 7/10
### Quick Note on Built-in Excel Features
For faster results without custom logic, use **Power Query** (Data > Get Data > From Table): Lowercase emails, group by email, aggregate max Signup Date/First Name/Phone. **Remove Duplicates** or **UNIQUE** are case-sensitive and ignore date prefs, so less ideal here.

### AI-Assisted Steps (Using ChatGPT or Excel Copilot)
1. **Copy your data**: Select all rows (e.g., A1:D8001 with headers: Name, Email, Phone, Signup Date). Copy (Ctrl+C). For large files, use ChatGPT Plus or Copilot for better handling.

2. **Paste into AI & run this exact prompt**:
   ```
   Here is my Excel customer data (columns: Name, Email, Phone, Signup Date). Clean it to:
   - Keep 1 row per unique email (case-insensitive, e.g., John@Gmail.com = john@gmail.com).
   - For duplicates, prefer the row with the most recent Signup Date.
   - Output as tab-separated values (TSV) or CSV with headers for easy Excel paste.
   [Paste your data here]
   ```
   AI will process and return a cleaned table (e.g., ~unique emails rows).

3. **Copy AI output**: Select the full cleaned table/CSV from AI response.

4. **Paste back to Excel**: In a new sheet (Ctrl+V). Use **Data > Text to Columns** if needed (Delimited > Tab/Comma). Save as new file.

This handles 8k rows reliably; test on a sample first. Results in seconds! (248 words)
Judge notes: The response directly addresses the user's specific scenario with practical AI-assisted steps and relevant prompts, but lacks detail on Copilot integration, doesn't explain potential AI limitations with 8k rows, and could better clarify the trade-offs between the mentioned approaches.

What makes these work

  1. 01
    Define 'duplicate' explicitly

    Don't say 'remove duplicates'. Say 'treat rows as duplicates if Email matches after lowercasing and trimming whitespace'. The AI can't read your mind, and vague prompts produce wrong dedup rules. List the columns that matter and any normalization steps.

  2. 02
    Specify which row to keep

    When duplicates exist, which one survives? Most recent date, most complete, first occurrence, highest value? State this directly. 'Keep the row with the latest OrderDate, break ties by highest Amount' is specific enough to get a correct answer.

  3. 03
    Ask for a script, not just cleaned data

    If your file has more than a few hundred rows, ask for a Power Query M script, VBA macro, or Python pandas code. You run it yourself on the full file. This avoids token limits and gives you a repeatable tool for next month's export.

  4. 04
    Provide a 10-row sample with edge cases

    Include rows that show the tricky patterns: different capitalizations, trailing spaces, the tied-but-different case. The AI calibrates its logic to what it sees. A clean sample gives you clean-only logic.

More example scenarios

#01 · Deduping a CRM contact export with inconsistent formatting
Input
I have 500 rows with columns: Name, Email, Company, Phone. Many contacts appear twice with slight variations (e.g. 'john@acme.com' vs 'John@Acme.com', 'Acme Inc' vs 'Acme, Inc.'). Treat emails as the same if they match after lowercasing and trimming. Keep the row with the most filled-in fields. Return cleaned CSV.
Expected output
The AI returns a CSV with ~380 unique rows. Each kept row has the most complete data for that normalized email. It also lists 12 ambiguous cases (same email, conflicting phone numbers) flagged for manual review, and provides the normalization rules it applied so you can reproduce the logic in Power Query.
#02 · Keeping latest transaction per customer
Input
Sheet with columns: CustomerID, OrderDate, Amount, Status. Some CustomerIDs appear 5+ times. I want only the most recent OrderDate per CustomerID. Give me a Power Query M script I can paste into Excel.
Expected output
The AI returns an M script that groups by CustomerID, sorts descending by OrderDate, takes the first row of each group, and expands back to full columns. Includes step-by-step instructions: Data > Get Data > From Table/Range, open Advanced Editor, paste, click Close & Load.
#03 · Fuzzy vendor name matching in AP data
Input
Accounts payable list of 2,000 vendors. Names include 'AT&T', 'AT and T', 'AT&T Inc.', 'at&t corp'. Identify duplicates with fuzzy matching at 90% similarity and suggest a canonical name for each cluster.
Expected output
The AI returns clusters like: Cluster 1 canonical 'AT&T Inc.' containing 4 variants; Cluster 2 canonical 'Verizon Communications' containing 3 variants. It provides a helper column formula using a similarity approach and a VBA function for Levenshtein distance you can drop into the workbook.
#04 · Deduping survey responses with partial duplicates
Input
Survey export with 1,200 rows. Some respondents submitted twice. Same email means duplicate, but keep the response with more answered questions (count non-empty cells across columns F to AZ). Return the row numbers to delete.
Expected output
The AI returns a list of 47 row numbers to delete, grouped by email, showing the completion count for each kept vs deleted row. It also provides an Excel formula using COUNTA combined with MAXIFS so you can verify the logic before deleting anything.

Common mistakes to avoid

  • Pasting the whole file blindly

    Large pastes get truncated and the AI silently processes only part of your data. Either sample and ask for a script, or use Copilot in Excel which sees the full sheet. Always spot-check the row count of the output.

  • Not checking what got deleted

    AI dedup is not deterministic. Always ask for a 'rows removed' log or diff, not just the cleaned file. Otherwise you won't catch cases where the AI merged rows it shouldn't have.

  • Ignoring whitespace and case

    'john@acme.com' and 'john@acme.com ' look identical but Excel treats them as different. If you don't tell the AI to trim and lowercase, it may preserve those 'duplicates' or remove them inconsistently.

  • Trusting fuzzy matches without review

    90% similarity will group 'Smith LLC' with 'Smithson LLC'. Always review fuzzy clusters before accepting the merge. Ask the AI to output the cluster with member counts, not a pre-merged file.

  • Using public AI on regulated data

    Pasting customer PII, PHI, or financial records into a consumer chatbot may violate your company's policies or GDPR. Use Microsoft 365 Copilot, a private deployment, or have the AI write a script you run locally.

Related queries

Frequently asked questions

Can Microsoft Copilot remove duplicates in Excel?

Yes. In Excel with a Microsoft 365 Copilot license, open the Copilot pane and type 'remove duplicate rows based on the Email column, keeping the most recent entry'. Copilot generates the transformation and applies it to a new sheet. It works directly on your open workbook, so you don't paste data anywhere.

How do I remove fuzzy duplicates in Excel?

Excel has no native fuzzy match, but Power Query's Merge feature supports fuzzy matching with a similarity threshold. Alternatively, ask ChatGPT or Claude to generate a VBA function using Levenshtein distance or to cluster your list using a similarity score, then review the clusters manually.

Is it safe to paste Excel data into ChatGPT?

Only if the data contains no PII, confidential business info, or regulated data. For sensitive files, either anonymize the data first, use an enterprise AI tool with a data processing agreement, or ask the AI to write a script you run locally on the real file.

What's the best AI for cleaning Excel data?

For in-app cleanup, Microsoft 365 Copilot is the most direct because it sees your sheet. For complex logic or generating scripts, Claude and ChatGPT produce higher-quality Power Query M and VBA. Gemini works well if your file is already in Google Sheets format.

How many rows can AI handle for deduplication?

Pasted directly into a chat, expect reliable results up to around 500-2,000 rows depending on column width. Beyond that, ask the AI to generate a Power Query, VBA, or Python script and run it yourself. Copilot in Excel handles full workbooks since it doesn't require pasting.

Can AI dedupe across multiple sheets or files?

Yes, but it's easier to consolidate first. Use Power Query to append sheets into one table, then apply AI-generated dedup logic. Or ask the AI to write a script that loads multiple sheets, normalizes the key columns, and outputs a single unique list.