Automatically Categorize Free-Text Columns in Excel

Tested prompts for auto categorize text data in excel compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Haiku 4.5 9/10

You have a column of free-text in Excel: support tickets, survey responses, product reviews, expense descriptions, or job titles. You need each row tagged with a category so you can pivot, filter, or chart it. Manually reading thousands of rows is not an option, and Excel's built-in formulas only work if you can write exact rules for every variation of wording.

That is the gap AI fills. A language model can read the full sentence, understand intent, and assign a category even when the text is messy, misspelled, or phrased five different ways. You paste your column, give the model a category list, and get a clean labeled column back that you can drop into a PivotTable.

This page shows the exact prompt to use, four model outputs side by side so you can pick the one that fits your data, and the patterns that make auto-categorization reliable at scale. If you just want a working recipe, scroll to the prompt. If you want to understand why certain phrasings fail, read on.

When to use this

Use AI categorization when your text is unstructured, your categories require understanding context, and rule-based formulas like IF/SEARCH would need dozens of nested conditions. It works best when you have a defined category list, or want the model to propose one, and your row count is in the hundreds to low tens of thousands.

  • Tagging support tickets into Billing, Bug, Feature Request, How-To, Churn Risk
  • Classifying survey open-ends into themes like Pricing, Onboarding, Performance
  • Sorting expense line items into GL accounts based on vendor and description
  • Grouping job titles into standard roles (Engineer, Designer, Manager, Sales)
  • Mapping product reviews to sentiment plus a topic such as Shipping or Quality

When this format breaks down

  • Regulated classifications where every label needs a human audit trail, like medical coding or legal case type
  • Columns with fewer than 20 rows where manual tagging is faster than prompt setup
  • Highly ambiguous text where even a human cannot agree on a single category without more context
  • Datasets containing PII or confidential data that cannot leave your environment without approval

The prompt we tested

You are an expert data analyst helping categorize free-text data from an Excel column. Analyze the text entries provided and assign each one to a clear, consistent category.

Rules:
Infer 4-8 distinct, mutually exclusive categories from the data itself (do not invent categories that don't fit). Keep category names short (1-3 words), Title Case, and reuse the same category name for similar entries to ensure consistency. Output a tab-separated table with a header row so it pastes cleanly into Excel.

Text entries to categorize:
Here are customer support ticket descriptions from column B of my spreadsheet:
1. Can't log into my account, password reset email never arrived
2. Charged twice for my March subscription, need a refund
3. How do I export my data to CSV?
4. App crashes every time I open the reports tab on iPhone
5. Want to upgrade from Basic to Pro plan
6. Invoice shows wrong company name, please fix
7. Two-factor authentication code isn't working

Return the results as a two-column table (Original Text | Category) that can be pasted directly into Excel, followed by a brief summary of the categories you identified and the count of entries in each.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.0306 4.8s 392 / 329
Claude Haiku 4.5 9 $0.0014 2.3s 276 / 228
GPT-5.4 9 $0.0493 20.1s 250 / 1171
Gemini 2.5 Pro 9 $0.0017 15.4s 256 / 278
Grok 4.1 Fast Reasoning 9 $0.0001 13.4s 398 / 138

Outputs

Claude Opus 4.7 9/10
Original Text	Category
Can't log into my account, password reset email never arrived	Account Access
Charged twice for my March subscription, need a refund	Billing Issue
How do I export my data to CSV?	How-To Question
App crashes every time I open the reports tab on iPhone	Technical Bug
Want to upgrade from Basic to Pro plan	Plan Change
Invoice shows wrong company name, please fix	Billing Issue
Two-factor authentication code isn't working	Account Access

**Category Summary:**

- **Account Access** (2): Issues with logging in, password resets, or authentication.
- **Billing Issue** (2): Problems related to charges, refunds, or invoice details.
- **How-To Question** (1): Requests for guidance on using product features.
- **Technical Bug** (1): Reports of app crashes, errors, or malfunctioning features.
- **Plan Change** (1): Requests to upgrade, downgrade, or modify subscription tiers.
Judge notes: Excellent execution—cleanly formatted for Excel, logically inferred categories, proper consistency in labeling, and helpful summary; only minor imperfection is that 5 categories were created when the rule suggested 4-8 (though this is well within bounds and appropriate for the data).
Claude Haiku 4.5 9/10
# Categorized Support Tickets

Original Text | Category
---|---
Can't log into my account, password reset email never arrived | Account Access
Charged twice for my March subscription, need a refund | Billing Issue
How do I export my data to CSV? | Feature Question
App crashes every time I open the reports tab on iPhone | Technical Bug
Want to upgrade from Basic to Pro plan | Plan Change
Invoice shows wrong company name, please fix | Billing Issue
Two-factor authentication code isn't working | Account Access

## Summary

**Categories Identified (5 total):**

- **Account Access** (2 entries) – Login, authentication, and password-related issues
- **Billing Issue** (2 entries) – Charges, refunds, and invoice problems
- **Technical Bug** (1 entry) – App crashes and functional errors
- **Feature Question** (1 entry) – How-to questions about product features
- **Plan Change** (1 entry) – Subscription upgrade/downgrade requests
Judge notes: Excellent execution with proper categorization logic, clear Excel-ready formatting, and accurate summary; only minor deduction for using markdown table syntax instead of pure tab-separated plain text for maximum Excel compatibility.
GPT-5.4 9/10
Original Text	Category
Can't log into my account, password reset email never arrived	Authentication
Charged twice for my March subscription, need a refund	Billing
How do I export my data to CSV?	Data Export
App crashes every time I open the reports tab on iPhone	App Crash
Want to upgrade from Basic to Pro plan	Plan Upgrade
Invoice shows wrong company name, please fix	Invoice Issue
Two-factor authentication code isn't working	Authentication

Summary of categories and counts:
- Authentication: 2
- Billing: 1
- Data Export: 1
- App Crash: 1
- Plan Upgrade: 1
- Invoice Issue: 1
Judge notes: Excellent execution with proper tab-separated Excel-ready format, accurate categorization logic, and helpful summary; only minor room for improvement in category naming consistency (mixing compound words like "App Crash" with phrases like "Plan Upgrade").
Gemini 2.5 Pro 9/10
Of course. As a data analyst, I've analyzed the text entries and grouped them into five distinct categories. Here is the categorized data in a tab-separated table, ready to be pasted into Excel.

**Categorized Data**

```
Original Text	Category
Can't log into my account, password reset email never arrived	Account Access
Charged twice for my March subscription, need a refund	Billing Issue
How do I export my data to CSV?	Feature Question
App crashes every time I open the reports tab on iPhone	Bug Report
Want to upgrade from Basic to Pro plan	Account Management
Invoice shows wrong company name, please fix	Billing Issue
Two-factor authentication code isn't working	Account Access
```

***

**Category Summary**

Here is a brief summary of the categories identified from your data and the count of entries for each:

*   **Account Access (2):** Issues related to logging in, password resets, or authentication.
*   **Billing Issue (2):** Problems concerning payments, invoices, refunds, or subscription charges.
*   **Account Management (1):** Requests to change account details, such as upgrading or downgrading a plan.
*   **Bug Report (1):** Reports of the application not working as expected, such as crashing or freezing.
*
Judge notes: Excellent execution with proper tab-separated formatting for Excel, clear inferred categories, consistent categorization logic, helpful summary with counts, and fully addresses the auto-categorization query intent with only minor incompleteness (missing one category definition in the summary).
Grok 4.1 Fast Reasoning 9/10
Original Text	Category
Can't log into my account, password reset email never arrived	Authentication Issues
Charged twice for my March subscription, need a refund	Billing Issues
How do I export my data to CSV?	Usage Questions
App crashes every time I open the reports tab on iPhone	Bug Reports
Want to upgrade from Basic to Pro plan	Plan Upgrades
Invoice shows wrong company name, please fix	Billing Issues
Two-factor authentication code isn't working	Authentication Issues

**Summary:**  
- Authentication Issues: 2  
- Billing Issues: 2  
- Usage Questions: 1  
- Bug Reports: 1  
- Plan Upgrades: 1
Judge notes: Excellent execution with clear, consistent categorization, proper Excel-friendly formatting, accurate summary counts, and all instructions followed; only minor deduction for not explicitly stating the inferred category logic upfront.

What makes these work

  1. 01
    Give a fixed category list

    Free-form categorization produces inconsistent labels like 'Billing', 'Payment', 'Invoice Issue' for the same concept. Always supply the exact list of allowed labels in the prompt and tell the model to pick only from that list. This makes PivotTables clean on the first try.

  2. 02
    Include an Other bucket

    Force-fitting every row into your list creates bad data. Add 'Other' or 'Unclear' as an allowed label and instruct the model to use it when confidence is low. Review the Other rows to discover missing categories you should add.

  3. 03
    Number your input rows

    Prefixing each row with a number (1., 2., 3.) gives the model an anchor so outputs stay aligned with inputs. When you paste 500 rows and get 498 labels back, numbering makes the mismatch obvious and recoverable.

  4. 04
    Batch in chunks of 50 to 200

    Very long inputs cause models to drift, truncate, or hallucinate labels. Split your column into batches of 50 to 200 rows, run each batch, and concatenate the results. This also keeps you under token limits on most models.

More example scenarios

#01 · Support ticket subject lines
Input
1. Cannot log in after password reset
2. How do I export my invoices to CSV
3. Charged twice for October subscription
4. Dashboard loads forever on Safari
5. Can you add dark mode to the mobile app
Expected output
1. Bug
2. How-To
3. Billing
4. Bug
5. Feature Request

Each label maps to the provided category list. Rows 1 and 4 describe broken functionality, row 2 is a usage question, row 3 is a payment issue, row 5 is a product suggestion.
#02 · Employee expense descriptions
Input
1. Uber to client meeting SFO
2. Dinner with prospect at Quince
3. Zoom Pro annual renewal
4. United flight DEN to ORD
5. Staples printer paper and toner
Expected output
1. Ground Transportation
2. Client Entertainment
3. Software Subscriptions
4. Air Travel
5. Office Supplies

Labels follow a standard expense chart of accounts. Vendor names like Uber and United drive travel categories, while context words like prospect and client disambiguate meals.
#03 · Product review themes
Input
1. Arrived two weeks late and box was crushed
2. Fabric feels cheap and the seams came apart after one wash
3. Love the color but the size runs small
4. Customer service ignored my three emails
5. Exactly as described, fast shipping
Expected output
1. Shipping
2. Quality
3. Fit/Sizing
4. Customer Service
5. Shipping (Positive)

Each review is tagged with the dominant topic. Row 5 is flagged positive because multi-topic reviews would otherwise muddle a negative-skewed report.
#04 · Raw job titles to standard roles
Input
1. Sr. Full-Stack Developer II
2. VP of People & Culture
3. Growth Marketing Lead
4. Chief of Staff to the CEO
5. Account Executive, Mid-Market
Expected output
1. Engineering
2. HR
3. Marketing
4. Operations
5. Sales

Messy title variants collapse into five standard functions suitable for headcount reporting. Seniority and specialty are stripped because the category list asked only for function.
#05 · Survey open-ends with sentiment
Input
1. Onboarding took forever and nobody followed up
2. Best tool we have bought this year
3. Pricing jumped 40 percent at renewal with no warning
4. Integration with Salesforce saved my team hours
5. UI is confusing for new hires
Expected output
1. Onboarding | Negative
2. Overall | Positive
3. Pricing | Negative
4. Integrations | Positive
5. Usability | Negative

Each response gets a theme and a sentiment separated by a pipe so you can split into two Excel columns with Text to Columns.

Common mistakes to avoid

  • Vague category definitions

    Labels like 'Issue' or 'Question' mean different things to different models. Define each category in one sentence inside the prompt, for example 'Bug: something that used to work is now broken'. Ambiguity is the single biggest source of mislabeling.

  • Mixing tone and topic in one label

    Asking for 'Negative Billing' as a single label collapses two dimensions. Split sentiment and topic into two output columns so you can slice either independently in Excel.

  • Skipping a spot check

    Models can be 95 percent accurate and still produce 50 wrong labels in a 1000-row file. Always sample 20 to 30 rows manually before trusting the whole column, especially on the first run with a new category list.

  • Pasting results without validation

    If the output has more or fewer rows than the input, alignment breaks silently and every downstream pivot is wrong. Use a COUNTA check on the output column before running analysis.

  • Forgetting edge cases in the prompt

    Empty cells, non-English text, and single-word entries like 'ok' often get miscategorized. Tell the model explicitly how to handle these: return 'Blank', 'Non-English', or 'Insufficient Info' instead of guessing.

Related queries

Frequently asked questions

Can Excel do this without AI using formulas?

You can use nested IF with SEARCH or a lookup table of keywords, and this works for simple cases like tagging emails that contain 'invoice'. It breaks down on free text with synonyms, typos, or context-dependent meaning. AI is the right tool once your rules would exceed a dozen conditions.

How many rows can I categorize at once?

Most models handle 50 to 200 rows per request reliably. For larger files, split into batches using Excel's filter or a helper column, run each batch separately, and paste the results back. Some tools and Excel add-ins automate this batching for you.

Will the categories be consistent across runs?

Only if you provide a fixed category list in the prompt and use a low temperature setting. Without a list, the model will invent slightly different label names each run, which makes PivotTables messy. Always define your allowed labels explicitly.

What if my data is confidential?

Check your company's AI policy before pasting customer data, PII, or financial records into a public model. Options include using an enterprise-approved model, running a local model, or anonymizing the data first by removing names and identifiers.

Can I use this with Excel's built-in Copilot?

Yes, Microsoft 365 Copilot in Excel can categorize a column if you describe the task in natural language and provide the category list. Results quality is similar to using a standalone model, and the advantage is that data stays inside your Microsoft tenant.

How do I handle rows the AI gets wrong?

Add a review column where you manually correct labels, then use those corrections as examples in your prompt for the next batch. Showing the model 5 to 10 correctly labeled rows from your own data (few-shot prompting) usually lifts accuracy by 10 to 20 percent.