Pull Email Addresses Out of Plain Text Files

Tested prompts for extract email addresses from txt file compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 9/10

You have a plain text file full of messy content and somewhere buried in it are email addresses you need. Maybe it is a scraped webpage, a CRM export, a log file, or a pile of pasted correspondence. Whatever the source, you need just the emails pulled out cleanly so you can import them into a tool, hand them off to a teammate, or cross-reference them against another list.

The traditional approach is a regex pattern run through a script, which works but requires knowing how to write or find the right expression, run it in a terminal or code editor, and handle edge cases. AI models cut that friction entirely. You paste your text, describe what you want, and get a clean list back in seconds without touching a single line of code.

This page shows you exactly how to prompt an AI to extract email addresses from a text file, compares how different models handle the same input, and covers the edge cases where you need to adjust your approach. Whether your file has a hundred emails or a handful scattered across paragraphs, the method here gets you to a usable list fast.

When to use this

This approach works best when you have unstructured or semi-structured plain text and need email addresses pulled out quickly without writing code. It is ideal for one-off jobs, small-to-medium files, and situations where the text is messy enough that a simple find-and-replace would miss entries.

Extracting emails from a scraped or exported .txt file from a website, forum, or directory
Pulling contact addresses out of pasted email threads or meeting notes saved as plain text
Cleaning up a CRM or newsletter export where emails are mixed in with names, phone numbers, and addresses
Recovering emails from a log file generated by an app or server
Quickly deduplicating and listing emails from a manually compiled text document before importing into Mailchimp, HubSpot, or a spreadsheet

When this format breaks down

Your text file is very large (over 50,000 words or several megabytes). Most AI chat interfaces have context window limits and will truncate the input, causing you to miss emails silently.
You need to process hundreds of files automatically on a schedule. At that scale, a Python script with a regex pattern runs faster, costs nothing per call, and handles batch jobs without manual copy-paste.
Your file contains sensitive personal data governed by GDPR, HIPAA, or similar regulations. Pasting that content into a third-party AI service may violate your data handling obligations.
You need guaranteed completeness with a verifiable audit trail. AI models can occasionally miss a malformed or unusual email format, and there is no built-in way to confirm every address was captured without manually checking.

The prompt we tested

You are an email extraction assistant. Scan the following text and extract every valid email address it contains.

Rules:
Return only a deduplicated list of valid email addresses, one per line, in the order they first appear. Normalize all addresses to lowercase, strip surrounding punctuation or whitespace, and ignore obfuscated forms (like 'name [at] domain dot com') unless they can be unambiguously reconstructed. If no email addresses are found, output exactly: No email addresses found.

Text to scan:
Please contact our support team at Help@Acme.Co or escalations-team@acme.co for urgent issues. You can also reach the founder, Jane Doe, directly at jane.doe+press@acme.co — and yes, help@acme.co works too for general questions.

Model comparison

Model	Quality	Cost	Latency	Tokens
Claude Opus 4.7	9	$0.0065	1.2s	260 / 34
Claude Haiku 4.5	8	$0.0003	679ms	187 / 30
GPT-5.4	7	$0.0008	1.1s	166 / 28
Gemini 2.5 Pro	9	$0.0003	10.7s	166 / 27
Grok 4.1 Fast Reasoning	7	$0.0001	2.8s	315 / 25

Outputs

Claude Opus 4.7 9/10

help@acme.co
escalations-team@acme.co
jane.doe+press@acme.co

Claude Haiku 4.5 8/10

help@acme.co
escalations-team@acme.co
jane.doe+press@acme.co

GPT-5.4 7/10

help@acme.co
escalations-team@acme.co
jane.doe+press@acme.co

Gemini 2.5 Pro 9/10

help@acme.co
escalations-team@acme.co
jane.doe+press@acme.co

Grok 4.1 Fast Reasoning 7/10

help@acme.co
escalations-team@acme.co
jane.doe+press@acme.co

What makes these work

01

Ask for one email per line
Always specify the output format in your prompt. 'Extract all email addresses and return one per line with no other text' prevents the model from adding explanations, bullet points, or numbering that you would need to clean up afterward. A clean line-by-line list pastes directly into Excel, Google Sheets, or any import tool without extra steps.
02

Request deduplication upfront
If your source text might repeat the same address (common in email threads or exported logs), add 'remove duplicates' to your prompt. Models will honor this and return a unique list. This saves you a manual deduplication step in your spreadsheet later.
03

Split large files into chunks
If your text file is too long to paste in one go, split it into sections and run the prompt multiple times, then combine the outputs. Tell the model each time 'extract all emails from this section' and deduplicate the final combined list at the end. This prevents silent truncation where the model stops reading before reaching the end of your input.
04

Flag unusual formats explicitly
If your text contains emails written in obfuscated formats like 'user at domain dot com' to avoid spam scrapers, add a note in your prompt: 'Also extract emails written in plain English format such as user at domain dot com.' Without this instruction, most models will only catch standard formatted addresses with the @ symbol.

More example scenarios

#01 · Event attendee list in plain text

Input

Here is our exported attendee file from the conference registration system:

John Marsh - john.marsh@techventures.io - Table 4
Sarah O'Brien -sobrien@gmail.com - Table 7
Registration contact: events@marriott-downtown.com
Press inquiries: press@techventures.io
Dr. Alan Chu alan.chu@stanfordmed.edu VIP guest

Expected output

john.marsh@techventures.io
sobrien@gmail.com
events@marriott-downtown.com
press@techventures.io
alan.chu@stanfordmed.edu

#02 · Scraped vendor directory with surrounding noise

Input

Acme Supplies | Founded 2004 | Chicago, IL | Contact: sales@acmesupplies.com | (312) 555-0198
BrightPath Logistics | www.brightpathlogistics.com | info@brightpathlogistics.com | Specializes in cold chain
Northern Fab Co. - reach us at northernfab_orders@outlook.com or call 800-555-3344
Global Parts Inc | no public email listed | visit globalparts.com/contact

Expected output

sales@acmesupplies.com
info@brightpathlogistics.com
northernfab_orders@outlook.com

#03 · Pasted email thread from a recruiting pipeline

Input

From: Lisa Tran <lisa.tran@hiringco.com> sent Monday. She CC'd the recruiter mark_jones@hiringco.com and the candidate replied from devcandidate2024@proton.me. HR follow-up came from hr-noreply@hiringco.com. Please add all parties to the tracker.

Expected output

lisa.tran@hiringco.com
mark_jones@hiringco.com
devcandidate2024@proton.me
hr-noreply@hiringco.com

#04 · Server log file with mixed system output

Input

2024-03-12 08:14:22 INFO User login attempt: user=admin@internaltools.net status=success
2024-03-12 08:15:01 WARN Failed delivery to bounce@maildomain.org error=550
2024-03-12 08:16:45 INFO Password reset requested for j.patel@internaltools.net
2024-03-12 08:17:03 ERROR SMTP relay rejected sender spammer_address@suspicious.ru

Expected output

admin@internaltools.net
bounce@maildomain.org
j.patel@internaltools.net
spammer_address@suspicious.ru

#05 · Nonprofit donor outreach list in unformatted text

Input

Spring campaign contacts pulled from old records. Margaret Holloway mholloway@donorbase.org gave in 2022. Twin brothers at the Henderson Foundation contact is giving@hendersonfoundation.org. Anonymous donor prefers contact via anon_donor_ref447@securemail.com. Board liaison copied on all: boardliaison@nonprofithq.net.

Expected output

mholloway@donorbase.org
giving@hendersonfoundation.org
anon_donor_ref447@securemail.com
boardliaison@nonprofithq.net

Common mistakes to avoid

Pasting without specifying output format
When you just say 'find the emails in this text' without specifying format, models often return prose like 'I found the following email addresses:' followed by a numbered list. That output requires cleanup before it is usable. Always specify format in the prompt to save time.
Assuming 100% completeness on large inputs
AI models working near their context limit may stop processing before the end of your file. You will get a partial list with no warning that anything was missed. For files longer than a few thousand words, chunk them manually and verify counts match between input and output.
Ignoring domain-only patterns
Some text contains role-based placeholders like '@companyname.com' without a local part, or partial addresses that are not valid emails. A model may include these in the output. Scan your results quickly for addresses that look malformed before importing them anywhere.
Not removing duplicates before import
If you skip asking for deduplication and your source had repeated addresses, you may import duplicate contacts into your CRM or email tool. This causes double-sending, skewed analytics, and potential unsubscribe complaints. Either ask the AI to deduplicate or run a quick dedup in your spreadsheet before import.
Using AI for regulated or confidential data
Pasting files that contain patient records, legal correspondence, or financial data into a public AI interface may breach your organization's data handling policies or applicable law. For sensitive content, use a local model or a properly vetted enterprise AI deployment with a data processing agreement in place.

Related queries

Frequently asked questions

Can I extract email addresses from a txt file without coding?

Yes. Pasting your text directly into an AI chat tool like ChatGPT, Claude, or Gemini and prompting it to extract emails requires no code at all. For recurring or large-scale jobs, a one-line Python script using the re module is also accessible even for beginners and runs entirely on your own machine.

What is the best regex pattern to extract emails from a text file?

A widely used pattern is [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}. It catches most standard email formats. Run it with Python's re.findall() against your file contents. Be aware it will miss obfuscated formats like 'user at domain dot com' and may catch some malformed strings that look like emails but are not.

How do I extract emails from a txt file using Python?

Open your file with open('yourfile.txt', 'r'), read the contents, then run re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', content) to get a list of matches. Print or write them to a new file. The whole script is about five lines and requires no installed libraries beyond Python's built-in re module.

Will AI miss any email addresses in my text file?

It can, particularly if the file is very long and exceeds the model's context window, or if emails are written in non-standard formats. For critical extractions, cross-check by also running a regex tool or script. Treat AI output as a fast first pass rather than a guaranteed complete result.

How do I remove duplicate emails after extracting them?

If you are working in a spreadsheet, paste the list into a column and use the Remove Duplicates function (Data menu in Excel or Google Sheets). If you used Python, wrap your findall result in set() and convert back to a list. You can also ask the AI to deduplicate in the same prompt by adding 'return unique addresses only'.

Can I extract emails from multiple txt files at once?

AI chat tools handle one paste at a time, so batch processing multiple files requires either combining them first or running the prompt separately for each file. A Python script that loops through a folder of .txt files and applies the regex to each one is much more practical for bulk jobs and takes about ten lines of code.

Try it with a real tool

Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.

Perplexity Pro AI-powered answer engine

Try Perplexity →

CustomGPT ChatGPT trained on your content

Try CustomGPT →