Automatically Categorize Free-Text Columns in Excel

Tested prompts for auto categorize text data in excel compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Haiku 4.5 9/10

You have a column of free-text in Excel: support tickets, survey responses, product reviews, expense descriptions, or job titles. You need each row tagged with a category so you can pivot, filter, or chart it. Manually reading thousands of rows is not an option, and Excel's built-in formulas only work if you can write exact rules for every variation of wording.

That is the gap AI fills. A language model can read the full sentence, understand intent, and assign a category even when the text is messy, misspelled, or phrased five different ways. You paste your column, give the model a category list, and get a clean labeled column back that you can drop into a PivotTable.

This page shows the exact prompt to use, four model outputs side by side so you can pick the one that fits your data, and the patterns that make auto-categorization reliable at scale. If you just want a working recipe, scroll to the prompt. If you want to understand why certain phrasings fail, read on.

When to use this

Use AI categorization when your text is unstructured, your categories require understanding context, and rule-based formulas like IF/SEARCH would need dozens of nested conditions. It works best when you have a defined category list, or want the model to propose one, and your row count is in the hundreds to low tens of thousands.

Tagging support tickets into Billing, Bug, Feature Request, How-To, Churn Risk
Classifying survey open-ends into themes like Pricing, Onboarding, Performance
Sorting expense line items into GL accounts based on vendor and description
Grouping job titles into standard roles (Engineer, Designer, Manager, Sales)
Mapping product reviews to sentiment plus a topic such as Shipping or Quality

When this format breaks down

Regulated classifications where every label needs a human audit trail, like medical coding or legal case type
Columns with fewer than 20 rows where manual tagging is faster than prompt setup
Highly ambiguous text where even a human cannot agree on a single category without more context
Datasets containing PII or confidential data that cannot leave your environment without approval

The prompt we tested

You are an expert data analyst helping categorize free-text data from an Excel column. Analyze the text entries provided and assign each one to a clear, consistent category.

Rules:
Infer 4-8 distinct, mutually exclusive categories from the data itself (do not invent categories that don't fit). Keep category names short (1-3 words), Title Case, and reuse the same category name for similar entries to ensure consistency. Output a tab-separated table with a header row so it pastes cleanly into Excel.

Text entries to categorize:
Here are customer support ticket descriptions from column B of my spreadsheet:
1. Can't log into my account, password reset email never arrived
2. Charged twice for my March subscription, need a refund
3. How do I export my data to CSV?
4. App crashes every time I open the reports tab on iPhone
5. Want to upgrade from Basic to Pro plan
6. Invoice shows wrong company name, please fix
7. Two-factor authentication code isn't working

Return the results as a two-column table (Original Text | Category) that can be pasted directly into Excel, followed by a brief summary of the categories you identified and the count of entries in each.

Model comparison

Model	Quality	Cost	Latency	Tokens
Claude Opus 4.7	9	$0.0306	4.8s	392 / 329
Claude Haiku 4.5	9	$0.0014	2.3s	276 / 228
GPT-5.4	9	$0.0493	20.1s	250 / 1171
Gemini 2.5 Pro	9	$0.0017	15.4s	256 / 278
Grok 4.1 Fast Reasoning	9	$0.0001	13.4s	398 / 138

Outputs

Claude Opus 4.7 9/10

Original Text	Category
Can't log into my account, password reset email never arrived	Account Access
Charged twice for my March subscription, need a refund	Billing Issue
How do I export my data to CSV?	How-To Question
App crashes every time I open the reports tab on iPhone	Technical Bug
Want to upgrade from Basic to Pro plan	Plan Change
Invoice shows wrong company name, please fix	Billing Issue
Two-factor authentication code isn't working	Account Access

**Category Summary:**

- **Account Access** (2): Issues with logging in, password resets, or authentication.
- **Billing Issue** (2): Problems related to charges, refunds, or invoice details.
- **How-To Question** (1): Requests for guidance on using product features.
- **Technical Bug** (1): Reports of app crashes, errors, or malfunctioning features.
- **Plan Change** (1): Requests to upgrade, downgrade, or modify subscription tiers.

Claude Haiku 4.5 9/10

# Categorized Support Tickets

Original Text | Category
---|---
Can't log into my account, password reset email never arrived | Account Access
Charged twice for my March subscription, need a refund | Billing Issue
How do I export my data to CSV? | Feature Question
App crashes every time I open the reports tab on iPhone | Technical Bug
Want to upgrade from Basic to Pro plan | Plan Change
Invoice shows wrong company name, please fix | Billing Issue
Two-factor authentication code isn't working | Account Access

## Summary

**Categories Identified (5 total):**

- **Account Access** (2 entries) – Login, authentication, and password-related issues
- **Billing Issue** (2 entries) – Charges, refunds, and invoice problems
- **Technical Bug** (1 entry) – App crashes and functional errors
- **Feature Question** (1 entry) – How-to questions about product features
- **Plan Change** (1 entry) – Subscription upgrade/downgrade requests

GPT-5.4 9/10

Original Text	Category
Can't log into my account, password reset email never arrived	Authentication
Charged twice for my March subscription, need a refund	Billing
How do I export my data to CSV?	Data Export
App crashes every time I open the reports tab on iPhone	App Crash
Want to upgrade from Basic to Pro plan	Plan Upgrade
Invoice shows wrong company name, please fix	Invoice Issue
Two-factor authentication code isn't working	Authentication

Summary of categories and counts:
- Authentication: 2
- Billing: 1
- Data Export: 1
- App Crash: 1
- Plan Upgrade: 1
- Invoice Issue: 1

Gemini 2.5 Pro 9/10

Of course. As a data analyst, I've analyzed the text entries and grouped them into five distinct categories. Here is the categorized data in a tab-separated table, ready to be pasted into Excel.

**Categorized Data**

```
Original Text	Category
Can't log into my account, password reset email never arrived	Account Access
Charged twice for my March subscription, need a refund	Billing Issue
How do I export my data to CSV?	Feature Question
App crashes every time I open the reports tab on iPhone	Bug Report
Want to upgrade from Basic to Pro plan	Account Management
Invoice shows wrong company name, please fix	Billing Issue
Two-factor authentication code isn't working	Account Access
```

***

**Category Summary**

Here is a brief summary of the categories identified from your data and the count of entries for each:

*   **Account Access (2):** Issues related to logging in, password resets, or authentication.
*   **Billing Issue (2):** Problems concerning payments, invoices, refunds, or subscription charges.
*   **Account Management (1):** Requests to change account details, such as upgrading or downgrading a plan.
*   **Bug Report (1):** Reports of the application not working as expected, such as crashing or freezing.
*

Grok 4.1 Fast Reasoning 9/10

Original Text	Category
Can't log into my account, password reset email never arrived	Authentication Issues
Charged twice for my March subscription, need a refund	Billing Issues
How do I export my data to CSV?	Usage Questions
App crashes every time I open the reports tab on iPhone	Bug Reports
Want to upgrade from Basic to Pro plan	Plan Upgrades
Invoice shows wrong company name, please fix	Billing Issues
Two-factor authentication code isn't working	Authentication Issues

**Summary:**  
- Authentication Issues: 2  
- Billing Issues: 2  
- Usage Questions: 1  
- Bug Reports: 1  
- Plan Upgrades: 1

What makes these work

01

Give a fixed category list
Free-form categorization produces inconsistent labels like 'Billing', 'Payment', 'Invoice Issue' for the same concept. Always supply the exact list of allowed labels in the prompt and tell the model to pick only from that list. This makes PivotTables clean on the first try.
02

Include an Other bucket
Force-fitting every row into your list creates bad data. Add 'Other' or 'Unclear' as an allowed label and instruct the model to use it when confidence is low. Review the Other rows to discover missing categories you should add.
03

Number your input rows
Prefixing each row with a number (1., 2., 3.) gives the model an anchor so outputs stay aligned with inputs. When you paste 500 rows and get 498 labels back, numbering makes the mismatch obvious and recoverable.
04

Batch in chunks of 50 to 200
Very long inputs cause models to drift, truncate, or hallucinate labels. Split your column into batches of 50 to 200 rows, run each batch, and concatenate the results. This also keeps you under token limits on most models.

More example scenarios

#01 · Support ticket subject lines

Input

1. Cannot log in after password reset
2. How do I export my invoices to CSV
3. Charged twice for October subscription
4. Dashboard loads forever on Safari
5. Can you add dark mode to the mobile app

Expected output

1. Bug
2. How-To
3. Billing
4. Bug
5. Feature Request

Each label maps to the provided category list. Rows 1 and 4 describe broken functionality, row 2 is a usage question, row 3 is a payment issue, row 5 is a product suggestion.

#02 · Employee expense descriptions

Input

1. Uber to client meeting SFO
2. Dinner with prospect at Quince
3. Zoom Pro annual renewal
4. United flight DEN to ORD
5. Staples printer paper and toner

Expected output

1. Ground Transportation
2. Client Entertainment
3. Software Subscriptions
4. Air Travel
5. Office Supplies

Labels follow a standard expense chart of accounts. Vendor names like Uber and United drive travel categories, while context words like prospect and client disambiguate meals.

#03 · Product review themes

Input

1. Arrived two weeks late and box was crushed
2. Fabric feels cheap and the seams came apart after one wash
3. Love the color but the size runs small
4. Customer service ignored my three emails
5. Exactly as described, fast shipping

Expected output

1. Shipping
2. Quality
3. Fit/Sizing
4. Customer Service
5. Shipping (Positive)

Each review is tagged with the dominant topic. Row 5 is flagged positive because multi-topic reviews would otherwise muddle a negative-skewed report.

#04 · Raw job titles to standard roles

Input

1. Sr. Full-Stack Developer II
2. VP of People & Culture
3. Growth Marketing Lead
4. Chief of Staff to the CEO
5. Account Executive, Mid-Market

Expected output

1. Engineering
2. HR
3. Marketing
4. Operations
5. Sales

Messy title variants collapse into five standard functions suitable for headcount reporting. Seniority and specialty are stripped because the category list asked only for function.

#05 · Survey open-ends with sentiment

Input

1. Onboarding took forever and nobody followed up
2. Best tool we have bought this year
3. Pricing jumped 40 percent at renewal with no warning
4. Integration with Salesforce saved my team hours
5. UI is confusing for new hires

Expected output

1. Onboarding | Negative
2. Overall | Positive
3. Pricing | Negative
4. Integrations | Positive
5. Usability | Negative

Each response gets a theme and a sentiment separated by a pipe so you can split into two Excel columns with Text to Columns.

Common mistakes to avoid

Vague category definitions
Labels like 'Issue' or 'Question' mean different things to different models. Define each category in one sentence inside the prompt, for example 'Bug: something that used to work is now broken'. Ambiguity is the single biggest source of mislabeling.
Mixing tone and topic in one label
Asking for 'Negative Billing' as a single label collapses two dimensions. Split sentiment and topic into two output columns so you can slice either independently in Excel.
Skipping a spot check
Models can be 95 percent accurate and still produce 50 wrong labels in a 1000-row file. Always sample 20 to 30 rows manually before trusting the whole column, especially on the first run with a new category list.
Pasting results without validation
If the output has more or fewer rows than the input, alignment breaks silently and every downstream pivot is wrong. Use a COUNTA check on the output column before running analysis.
Forgetting edge cases in the prompt
Empty cells, non-English text, and single-word entries like 'ok' often get miscategorized. Tell the model explicitly how to handle these: return 'Blank', 'Non-English', or 'Insufficient Info' instead of guessing.

Related queries

Frequently asked questions

Can Excel do this without AI using formulas?

You can use nested IF with SEARCH or a lookup table of keywords, and this works for simple cases like tagging emails that contain 'invoice'. It breaks down on free text with synonyms, typos, or context-dependent meaning. AI is the right tool once your rules would exceed a dozen conditions.

How many rows can I categorize at once?

Most models handle 50 to 200 rows per request reliably. For larger files, split into batches using Excel's filter or a helper column, run each batch separately, and paste the results back. Some tools and Excel add-ins automate this batching for you.

Will the categories be consistent across runs?

Only if you provide a fixed category list in the prompt and use a low temperature setting. Without a list, the model will invent slightly different label names each run, which makes PivotTables messy. Always define your allowed labels explicitly.

What if my data is confidential?

Check your company's AI policy before pasting customer data, PII, or financial records into a public model. Options include using an enterprise-approved model, running a local model, or anonymizing the data first by removing names and identifiers.

Can I use this with Excel's built-in Copilot?

Yes, Microsoft 365 Copilot in Excel can categorize a column if you describe the task in natural language and provide the category list. Results quality is similar to using a standalone model, and the advantage is that data stays inside your Microsoft tenant.

How do I handle rows the AI gets wrong?

Add a review column where you manually correct labels, then use those corrections as examples in your prompt for the next batch. Showing the model 5 to 10 correctly labeled rows from your own data (few-shot prompting) usually lifts accuracy by 10 to 20 percent.