help@acme.co escalations-team@acme.co jane.doe+press@acme.co
Pull Email Addresses Out of Plain Text Files
Tested prompts for extract email addresses from txt file compared across 5 leading AI models.
You have a plain text file full of messy content and somewhere buried in it are email addresses you need. Maybe it is a scraped webpage, a CRM export, a log file, or a pile of pasted correspondence. Whatever the source, you need just the emails pulled out cleanly so you can import them into a tool, hand them off to a teammate, or cross-reference them against another list.
The traditional approach is a regex pattern run through a script, which works but requires knowing how to write or find the right expression, run it in a terminal or code editor, and handle edge cases. AI models cut that friction entirely. You paste your text, describe what you want, and get a clean list back in seconds without touching a single line of code.
This page shows you exactly how to prompt an AI to extract email addresses from a text file, compares how different models handle the same input, and covers the edge cases where you need to adjust your approach. Whether your file has a hundred emails or a handful scattered across paragraphs, the method here gets you to a usable list fast.
When to use this
This approach works best when you have unstructured or semi-structured plain text and need email addresses pulled out quickly without writing code. It is ideal for one-off jobs, small-to-medium files, and situations where the text is messy enough that a simple find-and-replace would miss entries.
- Extracting emails from a scraped or exported .txt file from a website, forum, or directory
- Pulling contact addresses out of pasted email threads or meeting notes saved as plain text
- Cleaning up a CRM or newsletter export where emails are mixed in with names, phone numbers, and addresses
- Recovering emails from a log file generated by an app or server
- Quickly deduplicating and listing emails from a manually compiled text document before importing into Mailchimp, HubSpot, or a spreadsheet
When this format breaks down
- Your text file is very large (over 50,000 words or several megabytes). Most AI chat interfaces have context window limits and will truncate the input, causing you to miss emails silently.
- You need to process hundreds of files automatically on a schedule. At that scale, a Python script with a regex pattern runs faster, costs nothing per call, and handles batch jobs without manual copy-paste.
- Your file contains sensitive personal data governed by GDPR, HIPAA, or similar regulations. Pasting that content into a third-party AI service may violate your data handling obligations.
- You need guaranteed completeness with a verifiable audit trail. AI models can occasionally miss a malformed or unusual email format, and there is no built-in way to confirm every address was captured without manually checking.
The prompt we tested
You are an email extraction assistant. Scan the following text and extract every valid email address it contains. Rules: Return only a deduplicated list of valid email addresses, one per line, in the order they first appear. Normalize all addresses to lowercase, strip surrounding punctuation or whitespace, and ignore obfuscated forms (like 'name [at] domain dot com') unless they can be unambiguously reconstructed. If no email addresses are found, output exactly: No email addresses found. Text to scan: Please contact our support team at Help@Acme.Co or escalations-team@acme.co for urgent issues. You can also reach the founder, Jane Doe, directly at jane.doe+press@acme.co — and yes, help@acme.co works too for general questions.
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 9 | $0.0065 | 1.2s | 260 / 34 |
| Claude Haiku 4.5 | 8 | $0.0003 | 679ms | 187 / 30 |
| GPT-5.4 | 7 | $0.0008 | 1.1s | 166 / 28 |
| Gemini 2.5 Pro | 9 | $0.0003 | 10.7s | 166 / 27 |
| Grok 4.1 Fast Reasoning | 7 | $0.0001 | 2.8s | 315 / 25 |
Outputs
help@acme.co escalations-team@acme.co jane.doe+press@acme.co
help@acme.co escalations-team@acme.co jane.doe+press@acme.co
help@acme.co escalations-team@acme.co jane.doe+press@acme.co
help@acme.co escalations-team@acme.co jane.doe+press@acme.co
What makes these work
-
01Ask for one email per line
Always specify the output format in your prompt. 'Extract all email addresses and return one per line with no other text' prevents the model from adding explanations, bullet points, or numbering that you would need to clean up afterward. A clean line-by-line list pastes directly into Excel, Google Sheets, or any import tool without extra steps.
-
02Request deduplication upfront
If your source text might repeat the same address (common in email threads or exported logs), add 'remove duplicates' to your prompt. Models will honor this and return a unique list. This saves you a manual deduplication step in your spreadsheet later.
-
03Split large files into chunks
If your text file is too long to paste in one go, split it into sections and run the prompt multiple times, then combine the outputs. Tell the model each time 'extract all emails from this section' and deduplicate the final combined list at the end. This prevents silent truncation where the model stops reading before reaching the end of your input.
-
04Flag unusual formats explicitly
If your text contains emails written in obfuscated formats like 'user at domain dot com' to avoid spam scrapers, add a note in your prompt: 'Also extract emails written in plain English format such as user at domain dot com.' Without this instruction, most models will only catch standard formatted addresses with the @ symbol.
More example scenarios
Here is our exported attendee file from the conference registration system: John Marsh - john.marsh@techventures.io - Table 4 Sarah O'Brien -sobrien@gmail.com - Table 7 Registration contact: events@marriott-downtown.com Press inquiries: press@techventures.io Dr. Alan Chu alan.chu@stanfordmed.edu VIP guest
john.marsh@techventures.io sobrien@gmail.com events@marriott-downtown.com press@techventures.io alan.chu@stanfordmed.edu
Acme Supplies | Founded 2004 | Chicago, IL | Contact: sales@acmesupplies.com | (312) 555-0198 BrightPath Logistics | www.brightpathlogistics.com | info@brightpathlogistics.com | Specializes in cold chain Northern Fab Co. - reach us at northernfab_orders@outlook.com or call 800-555-3344 Global Parts Inc | no public email listed | visit globalparts.com/contact
sales@acmesupplies.com info@brightpathlogistics.com northernfab_orders@outlook.com
From: Lisa Tran <lisa.tran@hiringco.com> sent Monday. She CC'd the recruiter mark_jones@hiringco.com and the candidate replied from devcandidate2024@proton.me. HR follow-up came from hr-noreply@hiringco.com. Please add all parties to the tracker.
lisa.tran@hiringco.com mark_jones@hiringco.com devcandidate2024@proton.me hr-noreply@hiringco.com
2024-03-12 08:14:22 INFO User login attempt: user=admin@internaltools.net status=success 2024-03-12 08:15:01 WARN Failed delivery to bounce@maildomain.org error=550 2024-03-12 08:16:45 INFO Password reset requested for j.patel@internaltools.net 2024-03-12 08:17:03 ERROR SMTP relay rejected sender spammer_address@suspicious.ru
admin@internaltools.net bounce@maildomain.org j.patel@internaltools.net spammer_address@suspicious.ru
Spring campaign contacts pulled from old records. Margaret Holloway mholloway@donorbase.org gave in 2022. Twin brothers at the Henderson Foundation contact is giving@hendersonfoundation.org. Anonymous donor prefers contact via anon_donor_ref447@securemail.com. Board liaison copied on all: boardliaison@nonprofithq.net.
mholloway@donorbase.org giving@hendersonfoundation.org anon_donor_ref447@securemail.com boardliaison@nonprofithq.net
Common mistakes to avoid
-
Pasting without specifying output format
When you just say 'find the emails in this text' without specifying format, models often return prose like 'I found the following email addresses:' followed by a numbered list. That output requires cleanup before it is usable. Always specify format in the prompt to save time.
-
Assuming 100% completeness on large inputs
AI models working near their context limit may stop processing before the end of your file. You will get a partial list with no warning that anything was missed. For files longer than a few thousand words, chunk them manually and verify counts match between input and output.
-
Ignoring domain-only patterns
Some text contains role-based placeholders like '@companyname.com' without a local part, or partial addresses that are not valid emails. A model may include these in the output. Scan your results quickly for addresses that look malformed before importing them anywhere.
-
Not removing duplicates before import
If you skip asking for deduplication and your source had repeated addresses, you may import duplicate contacts into your CRM or email tool. This causes double-sending, skewed analytics, and potential unsubscribe complaints. Either ask the AI to deduplicate or run a quick dedup in your spreadsheet before import.
-
Using AI for regulated or confidential data
Pasting files that contain patient records, legal correspondence, or financial data into a public AI interface may breach your organization's data handling policies or applicable law. For sensitive content, use a local model or a properly vetted enterprise AI deployment with a data processing agreement in place.
Related queries
Frequently asked questions
Can I extract email addresses from a txt file without coding?
Yes. Pasting your text directly into an AI chat tool like ChatGPT, Claude, or Gemini and prompting it to extract emails requires no code at all. For recurring or large-scale jobs, a one-line Python script using the re module is also accessible even for beginners and runs entirely on your own machine.
What is the best regex pattern to extract emails from a text file?
A widely used pattern is [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}. It catches most standard email formats. Run it with Python's re.findall() against your file contents. Be aware it will miss obfuscated formats like 'user at domain dot com' and may catch some malformed strings that look like emails but are not.
How do I extract emails from a txt file using Python?
Open your file with open('yourfile.txt', 'r'), read the contents, then run re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', content) to get a list of matches. Print or write them to a new file. The whole script is about five lines and requires no installed libraries beyond Python's built-in re module.
Will AI miss any email addresses in my text file?
It can, particularly if the file is very long and exceeds the model's context window, or if emails are written in non-standard formats. For critical extractions, cross-check by also running a regex tool or script. Treat AI output as a fast first pass rather than a guaranteed complete result.
How do I remove duplicate emails after extracting them?
If you are working in a spreadsheet, paste the list into a column and use the Remove Duplicates function (Data menu in Excel or Google Sheets). If you used Python, wrap your findall result in set() and convert back to a list. You can also ask the AI to deduplicate in the same prompt by adding 'return unique addresses only'.
Can I extract emails from multiple txt files at once?
AI chat tools handle one paste at a time, so batch processing multiple files requires either combining them first or running the prompt separately for each file. A Python script that loops through a folder of .txt files and applies the regex to each one is much more practical for bulk jobs and takes about ten lines of code.
Try it with a real tool
Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.