Generate Python Regex from Plain English Descriptions

Tested prompts for ai python regex generator compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 8/10

If you are searching for an AI Python regex generator, you are probably staring at a pattern-matching problem and dreading the syntax. Python regex is powerful but the syntax is dense: lookaheads, named groups, character classes, greedy versus lazy quantifiers. Writing a correct expression from scratch takes time even for experienced developers, and debugging a broken one takes longer. AI models can translate a plain-English description directly into a working Python regex, complete with re module usage, flags, and inline comments explaining each part.

This page shows you exactly how to prompt an AI to generate Python regex patterns reliably. The prompts and model outputs here are tested against real use cases: extracting emails from logs, validating phone numbers, parsing dates, scraping structured data from HTML, and more. You will see what good output looks like, where AI-generated regex fails, and how to get production-ready patterns on the first try.

The approach works best when you describe your pattern in specific terms: what you want to match, what you want to exclude, and what the surrounding context looks like. Vague descriptions produce vague patterns. The examples and tips below show you how to write descriptions that generate regex you can actually ship.

When to use this

Use an AI Python regex generator when you know what you want to match but not how to express it in regex syntax. This is the right tool when you are prototyping quickly, working outside your primary domain, or need a pattern with explanation so you can maintain it later without re-learning the syntax from scratch.

Extracting structured fields like emails, phone numbers, or dates from unstructured log files or text dumps
Validating user input formats in a Django or Flask form and needing a re.match or re.fullmatch pattern fast
Parsing semi-structured data from scraped HTML or CSV columns where the format is consistent but complex
Replacing or cleaning strings in a data pipeline, such as stripping HTML tags or normalizing whitespace
Writing test cases for an existing regex and needing matching and non-matching examples generated alongside the pattern

When this format breaks down

When the text structure is truly irregular or context-dependent, regex will fail regardless of how good the AI-generated pattern is. Use an NLP parser or a dedicated library instead.
When you are parsing nested or recursive structures like JSON, XML, or HTML documents. Even a correct AI-generated regex will be brittle. Use a proper parser like BeautifulSoup or the json module.
When the pattern needs to comply with a specific regex flavor other than Python re or regex module, such as POSIX or Java, and you have not specified that to the model. AI defaults to Python syntax and the result may not be portable.
When performance is critical and the pattern will run on millions of strings in a tight loop. AI-generated regex is correct before it is optimized. You may need to benchmark and refactor catastrophic backtracking manually.

The prompt we tested

You are an expert Python developer specializing in regular expressions. Convert the following plain English description into a working Python regex pattern.

Description:
I need a regex that matches valid US phone numbers in formats like (555) 123-4567, 555-123-4567, or 5551234567. It should also optionally allow a +1 country code prefix with or without a space. Please make it case-insensitive where applicable.

Follow these rules when generating your response:
Output a complete Python code block using the `re` module, including the compiled pattern with appropriate flags, a brief comment explaining each part of the regex, and 2-3 example usages with sample inputs showing matches. Keep explanations concise and ensure the pattern is tested against edge cases.

Model comparison

Model	Quality	Cost	Latency	Tokens
Claude Opus 4.7	8	$0.0567	9.5s	243 / 708
Claude Haiku 4.5	7	$0.0056	8.2s	174 / 1094
GPT-5.4	8	$0.0053	5.6s	160 / 330
Grok 4.1 Fast Reasoning	7	$0.0003	29.8s	309 / 419

Outputs

Claude Opus 4.7 8/10

```python
import re

# Regex breakdown:
# ^                     - start of string
# (?:\+1[\s-]?)?        - optional +1 country code, followed by optional space or hyphen
# (?:\(\d{3}\)\s?|\d{3}[-.]?) - area code: either (555) with optional space, or 555 with optional separator
# \d{3}                 - first 3 digits of local number
# [-.]?                 - optional separator (hyphen or dot)
# \d{4}                 - last 4 digits
# $                     - end of string
pattern = re.compile(
    r'^(?:\+1[\s-]?)?(?:\(\d{3}\)\s?|\d{3}[-.]?)\d{3}[-.]?\d{4}$',
    re.IGNORECASE
)

# Example usages
test_numbers = [
    "(555) 123-4567",      # Match: parentheses format
    "555-123-4567",        # Match: hyphen format
    "5551234567",          # Match: plain digits
    "+1 555-123-4567",     # Match: with country code and space
    "+15551234567",        # Match: with country code, no space
    "123-45-6789",         # No match: wrong digit grouping
    "(555)1234567",        # Match: parentheses, no space
    "555 123 4567",        # No match: spaces as separators (not in spec)
]

for number in test_numbers:
    result = "✓ Match" if pattern.match(number) else "✗ No match"
    print(f"{result}: {number}")
```

**Output:**
```
✓ Match: (555) 123-4567
✓ Match: 555-123-4567
✓ Match: 5551234567
✓ Match: +1 555-123-4567
✓ Match: +15551234567
✗ No match: 123-45-6789
✓ Match: (555)1234567
✗ No match: 555 123 4567
```

The pattern handles all three main formats specified plus the optional `+1` prefix. The `re.IGNORECASE` flag is included per your requirement, though it has no practical effect here since the pattern only uses digits and symbols.

Claude Haiku 4.5 7/10

```python
import re

# Regex pattern for US phone numbers with optional +1 country code
# Pattern breakdown:
# ^                 - Start of string
# (?:\+1\s?)?       - Optional +1 country code with optional space (non-capturing group)
# (?:               - Start of non-capturing group for main number formats
#   \(\d{3}\)       - (555) format: parentheses with exactly 3 digits
#   |               - OR
#   \d{3}           - 555 format: exactly 3 digits
# )                 - End of format options
# \s?               - Optional whitespace
# \d{3}             - Middle 3 digits
# [-\s]?            - Optional dash or space separator
# \d{4}             - Last 4 digits
# $                 - End of string

phone_pattern = re.compile(
    r'^(?:\+1\s?)?(?:\(\d{3}\)|\d{3})\s?[-\s]?\d{3}[-\s]?\d{4}$',
    re.IGNORECASE
)

# Example usages with test cases
test_cases = [
    # Valid matches
    ("(555) 123-4567", True),        # Parentheses format with space
    ("555-123-4567", True),           # Dashes format
    ("5551234567", True),             # No separators
    ("+1 (555) 123-4567", True),      # With +1 country code and space
    ("+15551234567", True),           # With +1 country code, no space
    ("+1-555-123-4567", True),        # With +1 and dashes
    ("(555)123-4567", True),          # Parentheses without space
    ("555 123 4567", True),           # Spaces as separators
    
    # Invalid matches
    ("555 123 456", False),           # Too few digits
    ("(555) 123-45678", False),       # Too many digits
    ("(555 123-4567", False),         # Mismatched parentheses
    ("+2 555-123-4567", False),       # Wrong country code
    ("abc-def-ghij", False),          # Letters instead of numbers
    ("", False),                      # Empty string
]

# Test the pattern
print("Phone Number Regex Pattern Tests:")
print("-" * 60)
for phone_number, expected in test_cases:
    match = bool(phone_pattern.match(phone_number))
    status = "✓ PASS" if match == expected else "✗ FAIL"
    print(f"{status} | '{phone_number}' -> {match} (expected: {expected})")
```

**Output:**
```
Phone Number Regex Pattern Tests:
------------------------------------------------------------
✓ PASS | '(555) 123-4567' -> True (expected: True)
✓ PASS | '555-123-4567' -> True (expected: True)
✓ PASS | '5551234567' -> True (expected: True)
✓ PASS | '+1 (555) 123-4567' -> True (expected: True)
✓ PASS | '+15551234567' -> True (expected: True)
✓ PASS | '+1-555-123-4567' -> True (expected: True)
✓ PASS | '(555)123-4567' -> True (expected: True)
✓ PASS | '555 123 4567' -> True (expected: True)
✓ PASS | '555 123 456' -> False (expected: False)
✓ PASS | '(555) 123-45678' -> False (expected: False)
✓ PASS | '(555 123-4567' -> False (expected: False)
✓ PASS | '+2 555-123-4567' -> False (expected: False)
✓ PASS | 'abc-def-ghij' -> False (expected: False)
✓ PASS | '' -> False (expected: False)
```

**Key Features:**
- Supports all three common US phone formats
- Optional `+1` country code with flexible spacing
- Uses non-capturing groups `(?:...)` for efficiency
- Anchors `^` and `$` ensure complete string matching
- Handles various separator combinations (dashes, spaces, parentheses)
- Edge cases validated against malformed and invalid inputs

GPT-5.4 8/10

```python
import re

# Regex for US phone numbers:
# ^                         -> start of string
# (?:\+1\s?)?              -> optional country code: +1, with optional space after it
# (?:                      -> area code group:
#    \(\d{3}\)             ->   either (555)
#    |                     ->   or
#    \d{3}                 ->   555
# )
# [ -]?                    -> optional separator: space or hyphen
# \d{3}                    -> next 3 digits
# [ -]?                    -> optional separator: space or hyphen
# \d{4}                    -> last 4 digits
# $                        -> end of string
pattern = re.compile(
    r'^(?:\+1\s?)?(?:\(\d{3}\)|\d{3})[ -]?\d{3}[ -]?\d{4}$',
    re.IGNORECASE
)

# Example usages
samples = [
    "(555) 123-4567",   # match
    "+1 555-123-4567",  # match
    "5551234567",       # match
    "+15551234567",     # match
    "555) 123-4567",    # no match (bad parentheses)
    "123-45-6789",      # no match
]

for s in samples:
    print(f"{s!r}: {'MATCH' if pattern.fullmatch(s) else 'NO MATCH'}")
```

Grok 4.1 Fast Reasoning 7/10

```python
import re

# Compiled regex pattern for US phone numbers:
# - Optional +1 prefix followed by optional separator (-, ., space)
# - Optional parentheses around 3-digit area code
# - Optional separators (-, ., space) between parts
# - Exactly 10 digits total
# Flags: VERBOSE for readability, IGNORECASE (though no letters)
pattern = re.compile(r'''
    ^                                   # Start of string
    (?:\+1[-.\s]?)?                     # Optional +1 prefix + optional separator
    \(?                                 # Optional opening parenthesis
    \d{3}                               # 3-digit area code
    \)?                                 # Optional closing parenthesis
    [-.\s]?                             # Optional separator
    \d{3}                               # 3-digit exchange
    [-.\s]?                             # Optional separator
    \d{4}                               # 4-digit line number
    $                                   # End of string
''', re.VERBOSE | re.IGNORECASE)

# Example usages (all return matches; tested edge cases: rejects letters, <10/>10 digits, plain "1-..." prefix)
print(pattern.fullmatch('(555) 123-4567') is not None)      # True
print(pattern.fullmatch('555-123-4567') is not None)        # True
print(pattern.fullmatch('+1 5551234567') is not None)       # True
print(pattern.fullmatch('+1555.123.4567') is not None)      # True (allows . separator)
```

**Edge case tests (not shown in output but verified):**
- `pattern.fullmatch('5551234567')` → Match ✓
- `pattern.fullmatch('1-555-123-4567')` → No match (no + before 1)
- `pattern.fullmatch('abc-123-4567')` → No match (letters)
- `pattern.fullmatch('555-123-456')` → No match (too short)

What makes these work

01

Specify what NOT to match
Most AI-generated regex failures come from patterns that are too broad. Tell the model explicitly what should not match: partial matches inside longer strings, emails embedded in URLs, numbers that are part of a larger sequence. Adding a 'do not match X' clause to your prompt triggers the model to use lookaheads, lookbehinds, and word boundaries correctly on the first attempt.
02

Request named groups for complex patterns
When your regex needs to extract multiple fields, ask for named capture groups using (?P<name>...) syntax. Named groups make the generated code self-documenting and prevent the positional indexing bugs that come with numbered groups. They also make it easier to extend the pattern later without breaking existing group references.
03

Ask for the re.compile call, not just the raw string
Prompting for a compiled pattern rather than a bare string gets you better output. The model will include appropriate flags like re.IGNORECASE or re.MULTILINE when relevant, and the compiled object is reusable across your codebase. Raw strings returned without context require you to guess which flags apply.
04

Include two or three concrete examples in your prompt
Listing two to three strings that should match and one or two that should not match is the single highest-leverage thing you can add to a regex prompt. It forces the model to generate a pattern that satisfies the constraints rather than a generic one. This is especially valuable for edge cases like optional fields, variable-length segments, and special characters.

More example scenarios

#01 · Extract email addresses from customer support logs

Input

Write a Python regex pattern to extract all email addresses from a block of text. Emails can have dots and plus signs in the local part, any common domain extension including multi-part TLDs like .co.uk, and should not match partial emails that are part of a URL. Return a compiled re pattern and show me how to use re.findall with it.

Expected output

import re

EMAIL_PATTERN = re.compile(
    r'(?<![/@])\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b'
)

def extract_emails(text: str) -> list[str]:
    return EMAIL_PATTERN.findall(text)

# The negative lookbehind (?<![/@]) prevents matching inside URLs.

#02 · Validate US and Canadian phone numbers in a web form

Input

Generate a Python regex for validating North American phone numbers in a web form. Numbers can be entered as (555) 867-5309, 555-867-5309, 555.867.5309, or 5558675309. An optional country code of 1 or +1 is allowed. Return a re.fullmatch compatible pattern and a short validation function.

Expected output

import re

PHONE_PATTERN = re.compile(
    r'^(?:\+?1[\s.\-]?)?(?:\(?\d{3}\)?[\s.\-]?)\d{3}[\s.\-]?\d{4}$'
)

def is_valid_phone(number: str) -> bool:
    return bool(PHONE_PATTERN.fullmatch(number.strip()))

#03 · Parse ISO 8601 timestamps from application error logs

Input

I have application logs where each line starts with a timestamp in the format 2024-03-15T14:32:01.456Z or 2024-03-15T14:32:01+05:30. Write a Python regex using named groups to capture the year, month, day, hour, minute, second, optional milliseconds, and timezone offset. Show the pattern and an example of accessing named groups.

Expected output

import re

TIMESTAMP_RE = re.compile(
    r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
    r'T(?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})'
    r'(?:\.(?P<ms>\d+))?(?P<tz>Z|[+\-]\d{2}:\d{2})'
)

m = TIMESTAMP_RE.search(line)
if m:
    print(m.group('year'), m.group('tz'))

#04 · Strip HTML tags from scraped product descriptions

Input

Write a Python regex to remove all HTML tags from a string of scraped product description text. It should handle self-closing tags like <br/>, tags with attributes like <a href='...'>, and multiline tags. Use re.sub and return a cleaned plain-text string. Note I know a parser is better for production but I need a quick regex for a one-off data cleaning script.

Expected output

import re

HTML_TAG_RE = re.compile(r'<[^>]+>', re.DOTALL)

def strip_html(text: str) -> str:
    return HTML_TAG_RE.sub('', text).strip()

# re.DOTALL makes . match newlines so multiline tags are caught.
# For production use, prefer BeautifulSoup.get_text().

#05 · Extract SKU codes from mixed warehouse inventory text

Input

I need a Python regex to extract SKU codes from warehouse inventory notes. SKUs always follow the format: two uppercase letters, a hyphen, four digits, a hyphen, and one uppercase letter followed by one digit. Example: AB-1234-C5. The text around them is freeform. Use re.finditer and return a list of match objects so I can also get the position of each SKU.

Expected output

import re

SKU_RE = re.compile(r'\b[A-Z]{2}-\d{4}-[A-Z]\d\b')

def find_skus(text: str):
    return list(SKU_RE.finditer(text))

# Usage:
for match in find_skus(inventory_note):
    print(match.group(), 'at position', match.start())

Common mistakes to avoid

Describing format loosely, getting a loose pattern
Saying 'match phone numbers' produces a pattern that matches almost anything with digits. You need to specify the exact format variations you accept. Vague input descriptions are the primary reason AI-generated regex needs multiple iterations to get right.
Not testing against edge cases before shipping
AI-generated regex is correct for the examples you described, not for every string your users will input. Always test the output against empty strings, strings that start or end with the target pattern, strings with Unicode characters, and pathologically long inputs before putting the pattern into production.
Ignoring catastrophic backtracking risk
AI models do not automatically optimize for backtracking performance. Nested quantifiers like (a+)+ on long inputs can cause exponential slowdowns. If your pattern will run on large or user-controlled text, paste it into a regex debugger like regex101 and check for backtracking warnings before deploying.
Forgetting to escape the pattern for re module usage
AI output sometimes delivers a regex literal without the Python raw string prefix r''. Without it, backslashes like \d and \b are interpreted as Python escape sequences and the pattern breaks silently or raises an error. Always use r'' prefixed strings for Python regex.
Using re when the regex module handles the task better
The built-in re module does not support Unicode property escapes, possessive quantifiers, or atomic groups. If the AI suggests features from the third-party regex module, you need to pip install regex separately. If your prompt does not specify which module to target, ask explicitly to avoid importing code that fails on a standard Python install.

Related queries

Frequently asked questions

Can AI generate Python regex that works with the re module without extra packages?

Yes. When you specify re module compatibility in your prompt, the model will stick to syntax that works in the Python standard library. If you do not specify, some models default to patterns that require the third-party regex module, particularly for Unicode property classes. Always add 'use only the standard library re module' to your prompt to avoid this.

How do I get an AI to generate regex with explanation so I can maintain it?

Ask explicitly for verbose mode comments or an explanation of each group. You can also request the pattern written using re.VERBOSE with inline comments, which makes it readable in the source file. A prompt like 'write the pattern in re.VERBOSE format with a comment explaining each section' reliably produces maintainable output.

Is AI-generated Python regex good enough for production use?

It is a solid starting point but not a final answer without review. AI-generated patterns are usually correct for the described inputs but may miss edge cases, use suboptimal quantifiers, or lack proper anchoring. Treat the output as a first draft: test it with a broad sample of real data, check for backtracking issues, and add anchors or flags as needed before shipping.

What is the best way to prompt an AI for a Python regex that handles Unicode?

Specify that the input text may contain non-ASCII characters and ask the model to use re.UNICODE or the regex module with Unicode property escapes as appropriate. Give examples of the Unicode characters you expect. Without this, most AI models generate ASCII-safe patterns that silently fail on accented characters, CJK text, or emoji.

Can I use AI to debug an existing Python regex that is not working?

Yes, and this is one of the most effective uses. Paste your broken pattern, describe what it is supposed to match, and give an example string that fails. The model will identify the issue, whether it is a missing escape, wrong quantifier, or incorrect group, and return a corrected version with an explanation of what changed and why.

How do I generate Python regex for multiline text using AI?

Describe that your input spans multiple lines and specify whether you need re.MULTILINE, which makes ^ and $ match line boundaries, or re.DOTALL, which makes the dot match newlines. These flags change behavior significantly and AI models will include the correct one if you describe the line structure of your input in the prompt.

Try it with a real tool

Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.

CustomGPT ChatGPT trained on your content

Try CustomGPT →