# Excel Cleanup Plan: Sales CSV (8,000 Records) ## Step 1: Back up your file Save a copy as `SalesData_RAW.xlsx` before editing. Work on a duplicate sheet so you can revert if needed. ## Step 2: Split column A by semicolons (Text to Columns) 1. Select column A → **Data → Text to Columns** 2. Choose **Delimited** → Next 3. Check **Semicolon** (uncheck others) → Next 4. For the date column, click it in the preview and set **Column data format → Date: YMD** (or leave as Text if dates are mixed — see Step 5) 5. Click **Finish** *Alternative (recommended for repeat imports):* **Data → Get Data → From Text/CSV**, set delimiter to Semicolon in Power Query, then Close & Load. ## Step 3: Strip leading/trailing spaces and hidden characters In a helper column next to each text field (e.g., names): ``` =TRIM(CLEAN(B2)) ``` Drag down, then **Copy → Paste Special → Values** over the original column and delete the helper. *Power Query equivalent:* Right-click column → **Transform → Trim** and **Transform → Clean**. ## Step 4: Standardize date formats Because dates are mixed (`3/5/24` and `2024-03-05`), Excel likely stores some as text. In a helper column: ``` =IFERROR(DATEVALUE(C2), C2) ``` Then: 1. Select the helper column → format cells as **Date (YYYY-MM-DD)** 2. Paste values back over the original column 3. Spot-check ambiguous dates like `3/5/24` — confirm whether your source is US (M/D/Y) or EU (D/M/Y) before trusting DATEVALUE. *Power Query method (more reliable):* Select column → **Transform → Data Type → Using Locale → Date → English (United States)** or matching locale. ## Step 5: Remove duplicate order IDs 1. Select the full data range (include headers) 2. **Data → Remove Duplicates** 3. Uncheck all columns, then check **only Order ID** 4. Click OK — Excel reports how many duplicates were removed *If you need to review before deleting:* Apply **Conditional Formatting → Highlight Cell Rules → Duplicate Values** on the Order ID column first. ## Step 6: Convert to an Excel Table Select data → **Ctrl + T** → confirm headers. Tables auto-expand, enable filters, and make future formulas cleaner. ## Step 7: Final validation checks - Sort/filter each column to spot blanks or outliers - Verify row count matches expectations (~8,000 minus duplicates) - Check totals with `SUM` or a quick PivotTable to confirm numeric columns parsed correctly --- ### 💡 Tip to prevent this next time **Use Power Query instead of opening CSVs directly.** Go to **Data → Get Data → From Text/CSV**, and Excel will let you set the delimiter, trim whitespace, fix date locales, and remove duplicates *before* loading. Save the query — next month's import becomes a one-click **Refresh**.
Fix Messy CSV Data After Importing to Excel
Tested prompts for clean messy csv data in excel compared across 5 leading AI models.
You opened a CSV in Excel and it's a mess. Dates showing as numbers, leading zeros stripped from ZIP codes, names split across wrong columns, trailing whitespace everywhere, merged cells where fields had commas inside quotes, and half your rows shoved into column A. You need to clean it without writing VBA or learning Power Query from scratch.
This page shows how to use an AI prompt to generate the exact cleaning steps, formulas, or Power Query script you need for your specific file. Paste a sample of your messy rows, describe the target structure, and the model returns a step-by-step fix you can apply in Excel right now. The tested prompt and four model outputs below let you compare which LLM handles your type of mess best.
Use this when your CSV has inconsistent formatting, encoding glitches, or structural problems that Excel's built-in Text to Columns and Find & Replace can't solve in one pass.
When to use this
Use an AI-assisted approach when your CSV has multiple overlapping problems that would take 30+ minutes to fix manually, or when you need a repeatable process for files that arrive weekly. It's also ideal when you know the end state you want but don't know the Excel function or Power Query step to get there.
- Dates stored as text or serial numbers mixed with real dates in the same column
- Delimiter collisions where commas inside quoted fields broke the import
- Mixed encodings showing garbled characters like é instead of é
- Inconsistent casing, extra spaces, or stray line breaks inside cells
- Recurring weekly exports from a system you can't fix at the source
When this format breaks down
- Files over 1 million rows, which exceed Excel's row limit and need a database or Python instead
- Data with sensitive PII you can't paste into a third-party LLM
- Simple fixes like trimming whitespace or removing duplicates, which are faster with built-in Excel features
- Structural corruption where the CSV is missing delimiters entirely and cannot be parsed without the source
The prompt we tested
You are an Excel data-cleaning assistant. A user has imported a messy CSV file into Excel and needs step-by-step guidance to clean it. Analyze the issues they describe and provide a clear, prioritized cleanup plan using built-in Excel features (Text to Columns, Power Query, TRIM, CLEAN, Find & Replace, Flash Fill, Remove Duplicates, formulas, etc.). Follow these rules: Return a numbered step-by-step guide (5-10 steps) with exact Excel menu paths, formulas, or Power Query actions for each issue. Group related fixes together, use short explanations, and end with a brief tip to prevent the same issues on future imports. User's situation: I imported a CSV of 8,000 sales records into Excel and everything landed in column A separated by semicolons. There are also leading/trailing spaces in names, inconsistent date formats (some 3/5/24, some 2024-03-05), and duplicate order IDs. How do I clean this up?
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 9 | $0.0869 | 19.5s | 343 / 1090 |
| Claude Haiku 4.5 | 9 | $0.0056 | 11.6s | 233 / 1080 |
| GPT-5.4 | Not run yet | |||
| Gemini 2.5 Pro | 8 | $0.0020 | 19.5s | 220 / 339 |
| Grok 4.1 Fast Reasoning | 9 | $0.0003 | 12.5s | 358 / 493 |
Outputs
# Excel Data-Cleaning Plan for Your Sales Records ## Step-by-Step Cleanup Guide ### **Step 1: Separate Semicolon-Delimited Data into Columns** 1. Select column A (click the column header) 2. Go to **Data > Text to Columns** 3. Choose **Delimited** → **Next** 4. Check **Semicolon** as the delimiter → **Next** 5. Verify column formats (set date columns to "Text" for now) → **Finish** *This splits your 8,000 records across separate columns.* --- ### **Step 2: Trim Leading/Trailing Spaces from Names** 1. Insert a helper column next to your names column (e.g., column B if names are in A) 2. Enter the formula: `=TRIM(A2)` 3. Press **Enter**, then double-click the fill handle (small square at cell bottom-right) to auto-fill down all 8,000 rows 4. Copy the trimmed column → Paste Special > **Values only** back into the original column 5. Delete the helper column *Repeat for any other text columns with spacing issues.* --- ### **Step 3: Standardize Date Formats** 1. Select your date column 2. Insert a helper column and use this formula: ``` =TEXT(DATEVALUE(A2),"YYYY-MM-DD") ``` *(Adjust `A2` to your date column reference)* 3. Auto-fill down with the fill handle 4. Copy → Paste Special > **Values only** back to the original column 5. Delete the helper column 6. Right-click the date column → **Format Cells > Date** → choose a single consistent format (e.g., YYYY-MM-DD) *This converts both 3/5/24 and 2024-03-05 formats to a standard format.* --- ### **Step 4: Remove Duplicate Order IDs** 1. Select your entire data range (or click any cell in the data) 2. Go to **Data > Remove Duplicates** 3. Check only the **Order ID** column 4. Click **OK** *Excel will highlight and remove rows where Order IDs are identical. Review the deletion summary.* **Alternative (if you need to keep one copy and flag duplicates):** - Insert a helper column with formula: `=COUNTIF($A$2:$A2,A2)>1` - Filter for `TRUE` values to review before deleting manually --- ### **Step 5: Clean Any Remaining Special Characters (Optional)** If you notice stray quotes, line breaks, or odd characters: 1. Select affected columns 2. Go to **Home > Find & Replace** (Ctrl+H) 3. Find: `"` (or problematic character) → Replace with: (leave blank) → **Replace All** 4. Repeat for other characters (e.g., line breaks: find `^j`, replace with space) --- ### **Step 6: Verify Data Quality** 1. Scroll through a sample of rows to spot-check formatting 2. Use **Data > AutoFilter** to check for blanks or obvious errors: - Click column headers → filter to see unique values 3. Check Order ID column for remaining duplicates using **Remove Duplicates** again if needed --- ### **Step 7: Save Your Cleaned File** 1. Go to **File > Save As** 2. Choose **Excel Workbook (.xlsx)** format (not CSV) to preserve formatting 3. Name it something like `Sales_Records_Cleaned.xlsx` --- ## **Prevention Tip for Future Imports** When importing CSVs in the future: - **Use Power Query** (Data > Get Data > From File > From CSV) instead of direct CSV import—it auto-detects delimiters and handles formatting better - Or: Set column widths and formats *before* importing to guide the import wizard - Always import to a temporary sheet, clean, then move to your main workbook --- ## **Quick Summary** | Issue | Solution | Steps | |-------|----------|-------| | Semicolon-separated data | Text to Columns | 1 | | Spaces in names | TRIM formula | 2 | | Mixed date formats | TEXT/DATEVALUE + Format Cells | 3 | | Duplicate Order IDs | Remove Duplicates | 4 | | Special characters | Find & Replace | 5 | **Estimated time:** 15–20 minutes for 8,000 rows using these built-in tools.
Of course. Here is a step-by-step guide to clean your sales data using Power Query, which is the most efficient and repeatable way to handle all the issues you've described.
### **Your 6-Step Data Cleanup Plan**
This plan uses the Power Query Editor, which is built into Excel. It's the best tool because it records your steps, allowing you to refresh the data with one click if the source file is ever updated.
---
#### **Step 1: Load the Messy Data into Power Query**
First, we need to load your data into the Power Query Editor correctly. This is where you can tell Excel that your data is separated by semicolons.
* **Action:**
1. Select the single cell containing your messy data (e.g., A1).
2. Go to the **Data** tab on the ribbon.
3. In the "Get & Transform Data" group, click **From Table/Range**.
4. Excel will open the Power Query Editor with your single column of data.
#### **Step 2: Split the Single Column into Multiple Columns**
This will fix the core problem of all your data being stuck in Column A.
* **Action:**
1. In the Power Query Editor, select the column header.
2. Go to the **Home** tab.
3. Click **Split Column** > **By Delimiter**.
4. A new window will pop up. Power Query is usually smart enough to detect the semicolon, but if not,
1. **Prepare data as a table**: Select all data in column A (Ctrl+A if it's the only data). Go to **Insert > Table** (or Ctrl+T). Ensure "My table has headers" is checked if row 1 has headers; click OK. This enables Power Query. 2. **Load into Power Query**: With the table selected, go to **Data > From Table/Range**. Power Query Editor opens with your data in one column. 3. **Split columns by semicolon**: In Power Query, select column A. Go to **Home > Split Column > By Delimiter**. Choose **Semicolon** as delimiter, select **Split at Each Occurrence** and **Columns**, then OK. This creates separate columns (e.g., OrderID, Name, Date, etc.). 4. **Trim spaces from name column**: Click the header of the name column (e.g., Column2). Go to **Transform > Format > Trim**. Repeat for any other text columns with spaces. (Use **Transform > Replace Values** with " " (space) to nothing if extra internal spaces.) 5. **Standardize date column**: Click the date column header (e.g., Column3). Go to **Home > Data Type > Date** (or **Using Locale** > select English(US) for MM/DD/YY or appropriate for YYYY-MM-DD). Power Query auto-parses most formats; preview and adjust locale if needed. 6. **Remove duplicate order IDs**: Click the Order ID column header (e.g., Column1). Go to **Home > Remove Rows > Remove Duplicates**. This keeps first unique instance per Order ID across all columns. 7. **Verify and finalize**: Scroll/review rows (use **Home > Keep Rows > Keep Top Rows** for sample). Rename columns via double-click headers. Go to **Home > Close & Load** to output cleaned table to a new sheet. 8. **Optional: Clean non-printable chars**: If needed post-load, select affected columns, go to **Data > Text to Columns > Delimited > Next > uncheck all > Finish** (triggers CLEAN equivalent), or use formula `=CLEAN(TRIM(A1))` in new column and copy down. **Future import tip**: When opening CSV, use **Data > From Text/CSV**, select semicolon delimiter upfront, and load directly to Power Query for auto-cleaning—avoids single-column mess.
What makes these work
-
01Show the model 5 real rows
Paste actual messy rows from your file, not a paraphrase. The model catches pattern variations like trailing spaces or mixed date formats only when it sees them. Include a header row and 3-5 data rows covering the worst cases.
-
02State the target output explicitly
Say 'I want column B as MM/DD/YYYY dates, column D as 5-digit text ZIPs, column F trimmed and proper case.' Ambiguous goals produce generic advice. A concrete target schema produces formulas you can paste.
-
03Ask for Power Query for recurring files
If the same messy export arrives weekly, ask specifically for Power Query M code or click-path steps. You set it up once and refresh. Formulas work for one-off cleanups but break when column positions shift.
-
04Specify your Excel version
TEXTSPLIT, TEXTBEFORE, and LET only work in Excel 365 and 2021+. If you're on Excel 2019 or earlier, tell the model so it returns compatible formulas using LEFT, MID, FIND, and helper columns instead.
More example scenarios
I have a CSV column 'OrderDate' with values like '2024-03-15', '3/15/2024', '15-Mar-24', and '45366'. I need them all as proper Excel dates in MM/DD/YYYY format. About 8,000 rows.
Add a helper column with this formula: =IFERROR(DATEVALUE(A2), IF(ISNUMBER(A2), A2, TEXT(A2,"mm/dd/yyyy"))). Format the column as Date MM/DD/YYYY. For the serial number 45366, it resolves to 3/15/2024. Copy the helper column, paste as values over the original, then delete the helper.
CSV has a ZIP column where '02101' became '2101' and '00501' became '501' after Excel opened the file. 12,000 rows. I need all ZIPs back to 5 digits.
Use Data > From Text/CSV to re-import and set the ZIP column type to Text before loading. If you've already saved, add column: =TEXT(B2,"00000"). This pads 2101 back to 02101. For 9-digit ZIPs use "00000-0000". Paste as values over the original column.
Product names show 'Café Latté' as 'Café Latté' and 'Naïve' as 'Naïve'. The CSV is UTF-8 but Excel opened it as Windows-1252. 3,500 rows.
Don't fix with formulas. Close the file without saving. Open Excel, go to Data > From Text/CSV, select the file, and in the preview dialog set File Origin to '65001: Unicode (UTF-8)'. The preview will show correct characters. Click Load. If you already saved, reopen the original CSV in Notepad and re-save as UTF-8 with BOM, then reimport.
Free-text survey column has responses like 'Great service, fast delivery' that split into two columns, and multi-line responses that broke rows. Roughly 2,000 responses, now scrambled across 4,700 rows.
Use Power Query. Data > From Text/CSV > Transform Data. Set delimiter to Comma and enable 'Quote character: "' with 'Ignore quoted line breaks' unchecked so quoted newlines stay intact. If rows are already split, you'll need the original file. Close and Load back to a sheet with clean rows.
Every Monday I get a CSV where 'Category' has values like ' electronics', 'ELECTRONICS ', 'Electronics', and 'electronics '. Need them standardized to Proper Case with no padding. 15,000 rows weekly.
Record this once in Power Query: Load the CSV, right-click the Category column, Transform > Trim, then Transform > Capitalize Each Word. Save the query. Next week, replace the source file and click Refresh All. Formula alternative: =PROPER(TRIM(C2)). Power Query is better for the recurring case.
Common mistakes to avoid
-
Saving before fixing imports
Once you save a CSV in Excel, leading zeros, long numbers converted to scientific notation, and date conversions are baked in. Always fix the import via Data > From Text/CSV on the original file before any save.
-
Using Find and Replace on dates
Replacing '/' with '-' on a date column converts real dates to text strings that look right but won't sort or filter correctly. Use TEXT() or reformat the cell instead of text substitution.
-
Trusting autodetected column types
Excel guesses column types from the first 100 rows. If row 500 has a ZIP with a leading zero or a product code that looks numeric, it gets corrupted silently. Force column types to Text during import for any ID-like field.
-
Cleaning in place without a backup
One bad Find & Replace or a misfired formula overwrites the original data. Always duplicate the sheet or save a copy of the raw CSV before applying transformations, especially on files you can't re-download.
-
Ignoring invisible characters
Non-breaking spaces (CHAR(160)), zero-width characters, and BOM markers survive TRIM() and cause VLOOKUP failures. Use =CLEAN(SUBSTITUTE(A2,CHAR(160)," ")) to strip them, or inspect with =CODE(MID(A2,1,1)).
Related queries
Frequently asked questions
Why does Excel change my numbers to scientific notation when I open a CSV?
Excel auto-formats any 12+ digit number as scientific (e.g., 1.23457E+14). To prevent this, don't double-click the CSV. Use Data > From Text/CSV, then set the affected column's type to Text in the preview dialog before loading. Once saved with scientific notation, the trailing digits are lost and cannot be recovered from the file.
How do I fix a CSV where all data is in column A?
The file uses a delimiter Excel didn't detect (often semicolon or tab). Select column A, go to Data > Text to Columns, choose Delimited, and pick the correct separator. For recurring files, use Data > From Text/CSV which lets you preview and set the delimiter before loading.
What's the fastest way to remove duplicate rows from a messy CSV?
Select your data range, go to Data > Remove Duplicates, and check the columns that define a duplicate. For fuzzy duplicates (trailing spaces, case differences), clean first with TRIM and LOWER, then run Remove Duplicates. Power Query's Remove Duplicates step is better for recurring imports.
Can I clean a CSV without opening it in Excel first?
Yes, and often you should. Use Data > Get Data > From Text/CSV, which loads the file through Power Query without corrupting dates, ZIPs, or long numbers. You clean in the Power Query editor and only load the final result into a sheet.
How do I split a full name column into first and last name?
In Excel 365, use =TEXTBEFORE(A2," ") and =TEXTAFTER(A2," "). In older versions, first name is =LEFT(A2,FIND(" ",A2)-1) and last name is =MID(A2,FIND(" ",A2)+1,100). For names with middle initials or suffixes, use Text to Columns with space delimiter and review the output.
Why do my cleaned values still fail VLOOKUP?
Usually invisible characters or a type mismatch. Check for trailing spaces with =LEN(A2) versus =LEN(TRIM(A2)), and for non-breaking spaces with =CODE(RIGHT(A2,1)). If one side is a text-formatted number and the other is a real number, wrap the lookup value with VALUE() or multiply by 1.