Generate Pytest and Unittest Cases with AI

Tested prompts for ai generate unit tests python compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 8/10

If you typed 'ai generate unit tests python' into Google, you're probably staring at a function you just wrote and dreading the hour it would take to hand-craft every edge case, mock, and assertion. AI models can draft a full pytest or unittest suite in seconds by reading your function signature, docstring, or even just a plain-English description of what the code does.

The practical workflow is straightforward: paste your Python function into an AI prompt, specify whether you want pytest or unittest, describe any edge cases you know matter, and get back a ready-to-run test file. The models on this page were tested against a real-world Python function so you can see exactly how each one handles fixtures, parametrize decorators, mocking, and assertion style.

This page shows you the prompt that was used, four different model outputs side by side, and a comparison table so you can pick the right output for your codebase. Below that you will find practical guidance: when this approach works, when it does not, common mistakes that produce useless tests, and concrete examples across different types of Python code.

When to use this

AI-generated unit tests work best when you have existing Python code with clear inputs and outputs and need coverage fast. This fits new features before code review, legacy functions with no tests, open-source contributions that require test coverage, and any situation where writing boilerplate assertions is slowing you down more than the logic itself.

  • You wrote a new utility function and need baseline pytest coverage before merging a PR
  • You inherited a legacy Python module with zero tests and need to understand its behavior while adding coverage
  • You are contributing to an open-source project that requires a minimum coverage percentage
  • You want to generate parametrized test cases for a function with many valid input combinations
  • You need to mock external API calls or database queries and want a starting structure rather than writing it from scratch

When this format breaks down

  • Tests for code that has complex, stateful side effects across multiple services: the AI will often miss realistic interaction sequences and produce tests that pass in isolation but mean nothing in practice
  • Security or compliance-critical code paths where missed edge cases carry legal or financial risk: treat AI output as a first draft only and have a human engineer verify every assertion
  • Code that is tightly coupled to proprietary internal frameworks the model has never seen: the generated mocks and imports will reference incorrect class names or method signatures
  • Performance or load tests: AI models generate correctness tests well, but they do not reason about timing, throughput, or resource limits without very specific prompting

The prompt we tested

You are an expert Python test engineer. Generate comprehensive unit tests for the Python code provided below.

Instructions:
Use pytest style by default (unittest only if the source clearly uses it), follow AAA (Arrange-Act-Assert) structure with descriptive test names like test_<function>_<scenario>_<expected>, and use parametrize for multiple input cases and pytest.raises for error paths. Mock external dependencies (I/O, network, time, DB) with unittest.mock, aim for high branch coverage without testing implementation details, and output runnable, PEP8-compliant code in a single fenced Python block.

Python code to test:
def calculate_discount(price: float, percent: float) -> float:
    if price < 0 or not 0 <= percent <= 100:
        raise ValueError('Invalid input')
    return round(price * (1 - percent / 100), 2)

Produce the complete test file, including imports, fixtures or setup, well-named test functions covering happy paths, edge cases, boundary conditions, invalid inputs, and expected exceptions. After the code, include a brief bulleted summary of which behaviors are covered and any assumptions you made.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 8 $0.1230 19.0s 394 / 1561
Claude Haiku 4.5 7 $0.0105 10.9s 270 / 2048
GPT-5.4 8 $0.0119 11.2s 242 / 756
Gemini 2.5 Pro 2 $0.0015 18.5s 251 / 238
Grok 4.1 Fast Reasoning 7 $0.0004 52.3s 388 / 601

Outputs

Claude Opus 4.7 8/10
```python
"""Unit tests for calculate_discount."""
import pytest

from solution import calculate_discount


class TestCalculateDiscountHappyPath:
    """Happy path scenarios for valid inputs."""

    @pytest.mark.parametrize(
        "price, percent, expected",
        [
            (100.0, 10.0, 90.0),
            (200.0, 25.0, 150.0),
            (50.0, 50.0, 25.0),
            (99.99, 10.0, 89.99),
            (1000.0, 33.33, 666.7),
        ],
    )
    def test_calculate_discount_valid_inputs_returns_expected(
        self, price, percent, expected
    ):
        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    def test_calculate_discount_rounds_to_two_decimals(self):
        # Arrange
        price = 10.0
        percent = 33.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == 6.7
        # Ensure rounding to 2 decimals (not more precision leaked)
        assert round(result, 2) == result


class TestCalculateDiscountBoundaries:
    """Boundary conditions for the valid input ranges."""

    def test_calculate_discount_zero_percent_returns_full_price(self):
        assert calculate_discount(100.0, 0) == 100.0

    def test_calculate_discount_hundred_percent_returns_zero(self):
        assert calculate_discount(100.0, 100) == 0.0

    def test_calculate_discount_zero_price_returns_zero(self):
        assert calculate_discount(0, 50) == 0.0

    def test_calculate_discount_zero_price_zero_percent_returns_zero(self):
        assert calculate_discount(0, 0) == 0.0

    def test_calculate_discount_zero_price_hundred_percent_returns_zero(self):
        assert calculate_discount(0, 100) == 0.0


class TestCalculateDiscountInvalidInputs:
    """Error paths for out-of-range inputs."""

    @pytest.mark.parametrize("price", [-0.01, -1, -100.0, -1e-9])
    def test_calculate_discount_negative_price_raises_value_error(self, price):
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(price, 10)

    @pytest.mark.parametrize("percent", [-0.01, -1, -50.0, 100.01, 101, 1000])
    def test_calculate_discount_percent_out_of_range_raises_value_error(
        self, percent
    ):
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(100.0, percent)

    def test_calculate_discount_negative_price_and_invalid_percent_raises(self):
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(-10.0, 150)


class TestCalculateDiscountReturnType:
    """Return type guarantees."""

    def test_calculate_discount_returns_float(self):
        result = calculate_discount(100.0, 10.0)
        assert isinstance(result, float)

    def test_calculate_discount_accepts_int_inputs(self):
        # Ints are acceptable since they satisfy the numeric comparisons
        result = calculate_discount(100, 25)
        assert result == 75.0
```

### Coverage summary
- **Happy paths**: multiple valid `(price, percent)` combinations via `parametrize`, including rounding behavior.
- **Boundary conditions**: `percent = 0`, `percent = 100`, `price = 0`, and combinations thereof.
- **Invalid inputs / exceptions**: negative prices, negative percents, and percents > 100 all raise `ValueError` with the expected message.
- **Return type**: confirms `float` output and that integer arguments are accepted (duck typing on numeric comparisons).
- **Rounding**: asserts result is rounded to 2 decimal places.

### Assumptions
- The source module is named `solution.py` (import path `from solution import calculate_discount`); adjust the import to match your project layout.
- Integer arguments are considered valid because they satisfy the numeric comparisons in the guard clause — the signature's type hints are treated as non-enforcing (standard Python behavior).
- No external dependencies (I/O, time, network) exist, so no mocking is required.
- `ValueError` message `"Invalid input"` is part of the contract worth asserting via `match=`.
Judge notes: Excellent coverage with clean structure and parametrize usage, but lacks one edge case (very small positive float like 0.001) and the `test_calculate_discount_rounds_to_two_decimals` assertion `6.7` is actually a floating-point hazard that could fail on some implementations, slightly undermining production-readiness.
Claude Haiku 4.5 7/10
```python
import pytest
from unittest.mock import patch
from your_module import calculate_discount  # Replace 'your_module' with actual module name


class TestCalculateDiscount:
    """Test suite for calculate_discount function."""

    # Happy path tests
    def test_calculate_discount_with_valid_inputs_returns_correct_discount(self):
        """Test discount calculation with standard valid inputs."""
        # Arrange
        price = 100.0
        percent = 10.0
        expected = 90.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    def test_calculate_discount_with_fifty_percent_discount(self):
        """Test discount calculation with 50% discount."""
        # Arrange
        price = 200.0
        percent = 50.0
        expected = 100.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    def test_calculate_discount_with_zero_percent_returns_original_price(self):
        """Test that 0% discount returns original price."""
        # Arrange
        price = 75.5
        percent = 0.0
        expected = 75.5

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    def test_calculate_discount_with_hundred_percent_returns_zero(self):
        """Test that 100% discount returns zero."""
        # Arrange
        price = 100.0
        percent = 100.0
        expected = 0.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    # Edge case tests with parametrization
    @pytest.mark.parametrize("price,percent,expected", [
        (0.0, 0.0, 0.0),
        (0.0, 50.0, 0.0),
        (0.0, 100.0, 0.0),
        (1.0, 1.0, 0.99),
        (0.01, 50.0, 0.01),
        (999999.99, 0.0, 999999.99),
    ])
    def test_calculate_discount_with_edge_case_values(
        self, price, percent, expected
    ):
        """Test discount calculation with edge case values."""
        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    # Rounding tests
    def test_calculate_discount_rounds_to_two_decimal_places(self):
        """Test that result is rounded to 2 decimal places."""
        # Arrange
        price = 10.0
        percent = 33.0
        # 10 * (1 - 33/100) = 10 * 0.67 = 6.7, but should round to 6.7

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == 6.7
        assert len(str(result).split('.')[-1]) <= 2

    def test_calculate_discount_with_floating_point_precision(self):
        """Test rounding with values that cause floating point precision issues."""
        # Arrange
        price = 19.99
        percent = 33.33
        # 19.99 * (1 - 0.3333) = 19.99 * 0.6667 = 13.331...

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert isinstance(result, float)
        assert result == round(19.99 * (1 - 33.33 / 100), 2)

    # Boundary condition tests
    @pytest.mark.parametrize("price,percent", [
        (1000.0, 0.0),
        (1000.0, 0.01),
        (1000.0, 99.99),
        (1000.0, 100.0),
        (0.01, 0.0),
        (0.01, 100.0),
    ])
    def test_calculate_discount_at_boundaries(self, price, percent):
        """Test discount at valid boundary values."""
        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert isinstance(result, float)
        assert result >= 0.0
        assert result <= price

    # Invalid input tests
    def test_calculate_discount_with_negative_price_raises_valueerror(self):
        """Test that negative price raises ValueError."""
        # Arrange
        price = -10.0
        percent = 10.0

        # Act & Assert
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(price, percent)

    def test_calculate_discount_with_very_negative_price_raises_valueerror(self):
        """Test that very negative price raises ValueError."""
        # Arrange
        price = -999999.99
        percent = 50.0

        # Act & Assert
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(price, percent)

    @pytest.mark.parametrize("price,percent", [
        (100.0, -0.01),
        (100.0, -1.0),
        (100.0, -100.0),
        (100.0, 100.01),
        (100.0, 101.0),
        (100.0, 200.0),
    ])
    def test_calculate_discount_with_out_of_range_percent_raises_valueerror(
        self, price, percent
    ):
        """Test that percent outside 0-100 range raises ValueError."""
        # Act & Assert
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(price, percent)

    def test_calculate_discount_with_negative_price_and_percent_raises_valueerror(self):
        """Test that both negative price and invalid percent raise ValueError."""
        # Arrange
        price = -50.0
        percent = 150.0

        # Act & Assert
        with pytest.raises(ValueError, match="Invalid input"):
            calculate_discount(price, percent)

    # Type handling (implicit - pytest will handle type errors naturally)
    def test_calculate_discount_accepts_integer_inputs(self):
        """Test that function works with integer inputs (implicit conversion)."""
        # Arrange
        price = 100  # int instead of float
        percent = 10  # int instead of float
        expected = 90.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    def test_calculate_discount_with_very_small_positive_values(self):
        """Test with very small positive values."""
        # Arrange
        price = 0.001
        percent = 50.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == 0.0  # Rounds to 0.0

    def test_calculate_discount_with_large_values(self):
        """Test with large price values."""
        # Arrange
        price = 1000000.0
        percent = 25.0
        expected = 750000.0

        # Act
        result = calculate_discount(price, percent)

        # Assert
        assert result == expected

    # Return type validation
    def test_calculate_discount_returns_float(self):
        """Test that function always returns a float."""
        # Act
        result = calculate_discount(100.0, 10.0)

        # Assert
        assert isinstance(result, float)

    def test_calculate_discount_result_is_non_negative(self):
        """Test that result is never negative."""
        # Arrange
        test_cases = [
            (100.0, 0.0),
            (100.0, 50.0),
            (100.0, 100.0),
            (0.0, 100.0),
        ]

        # Act & Assert
        for price, percent in test_cases:
            result = calculate_discount(price, percent)
            assert result >= 0.0

    def test_calculate_discount_result_does_not_exceed_original_price(self):
        """Test that discounted price never exceeds original price
Judge notes: The output is comprehensive and well-structured with good AAA patterns and parametrize usage, but the unused `from unittest.mock import patch` import is a production-readiness flaw, the `test_calculate_discount_result_does_not_exceed_original_price` test is cut off mid-sentence, and some tests (like checking `len(str(result).split('.')[-1]) <= 2`) test implementation details rather than behavior.
GPT-5.4 8/10
```python
import pytest

from your_module import calculate_discount


@pytest.mark.parametrize(
    ("price", "percent", "expected"),
    [
        (100.0, 0.0, 100.0),
        (100.0, 10.0, 90.0),
        (100.0, 100.0, 0.0),
        (49.99, 25.0, 37.49),
        (19.995, 50.0, 10.0),
        (0.0, 50.0, 0.0),
        (1234.56, 12.5, 1080.24),
    ],
)
def test_calculate_discount_valid_inputs_returns_rounded_discounted_price(
    price, percent, expected
):
    # Arrange

    # Act
    result = calculate_discount(price, percent)

    # Assert
    assert result == expected


@pytest.mark.parametrize(
    ("price", "percent"),
    [
        (-0.01, 10.0),
        (-100.0, 0.0),
        (100.0, -0.01),
        (100.0, -5.0),
        (100.0, 100.01),
        (100.0, 150.0),
        (-50.0, -10.0),
        (-50.0, 150.0),
    ],
)
def test_calculate_discount_invalid_inputs_raises_value_error(price, percent):
    # Arrange

    # Act / Assert
    with pytest.raises(ValueError, match="Invalid input"):
        calculate_discount(price, percent)


@pytest.mark.parametrize(
    ("price", "percent", "expected"),
    [
        (50.0, 0.0, 50.0),
        (50.0, 100.0, 0.0),
        (0.0, 0.0, 0.0),
        (0.0, 100.0, 0.0),
    ],
)
def test_calculate_discount_boundary_values_returns_expected_result(
    price, percent, expected
):
    # Arrange

    # Act
    result = calculate_discount(price, percent)

    # Assert
    assert result == expected


def test_calculate_discount_rounds_result_to_two_decimal_places():
    # Arrange
    price = 10.0
    percent = 33.333

    # Act
    result = calculate_discount(price, percent)

    # Assert
    assert result == 6.67
    assert result == round(result, 2)


@pytest.mark.parametrize(
    ("price", "percent", "expected"),
    [
        (80, 25, 60.0),
        (200, 12.5, 175.0),
    ],
)
def test_calculate_discount_accepts_numeric_inputs_and_returns_float(
    price, percent, expected
):
    # Arrange

    # Act
    result = calculate_discount(price, percent)

    # Assert
    assert result == expected
    assert isinstance(result, float)
```

- Covers valid discount calculations for typical values, zero discount, full discount, zero price, fractional percentages, and rounding behavior.
- Covers boundary conditions for `percent` at `0` and `100`, and `price` at `0`.
- Covers invalid inputs where `price` is negative or `percent` is outside the inclusive `0..100` range, asserting `ValueError` with the expected message.
- Assumption: replace `from your_module import calculate_discount` with the actual module path where `calculate_discount` is defined.
Judge notes: The tests are comprehensive, well-structured, and production-ready, but the boundary test group redundantly overlaps with the parametrized happy-path group, and the integer-input test adds minimal unique coverage value.
Gemini 2.5 Pro 2/10
Of course. As a Python test engineer, here are the comprehensive unit tests for the provided code, following best practices.

```python
# test_discount_calculator.py

import pytest
from unittest.mock import patch # Though not needed for this simple function, it's good practice to import

# Assume the function is in a file named 'discount_calculator.py'
from discount_calculator import calculate_discount

@pytest.mark.parametrize(
    "price, percent, expected",
    [
        # --- Happy Path Scenarios ---
        (100.0, 10.0, 90.00),   # Basic integer discount
        (99.99, 10.0, 89.99),   # Floating point price
        (120.0, 12.5, 105.00),  # Floating point percentage
        (10, 25, 7.50),         # Integer inputs
        
        # --- Rounding Behavior ---
        (10.55, 
Judge notes: The output is truncated mid-parametrize block and never completes the test file, missing all error path tests, edge cases, boundary conditions, and the post-code summary.
Grok 4.1 Fast Reasoning 7/10
```python
import pytest


def calculate_discount(price: float, percent: float) -> float:
    if price < 0 or not 0 <= percent <= 100:
        raise ValueError('Invalid input')
    return round(price * (1 - percent / 100), 2)


@pytest.mark.parametrize(
    "price, percent, expected",
    [
        (100.0, 10.0, 90.0),      # standard discount
        (100.0, 0.0, 100.0),      # zero discount
        (100.0, 100.0, 0.0),      # full discount
        (0.0, 50.0, 0.0),         # zero price
        (10.5, 20.0, 8.4),        # decimal price and percent
        (1.015, 0.0, 1.02),       # rounding up (half to even)
        (1.025, 0.0, 1.02),       # rounding stays (half to even)
        (50.0, 50.0, 25.0),       # even discount
    ],
)
def test_calculate_discount_valid_inputs_returns_rounded_discount(price, percent, expected):
    # Arrange: inputs provided via parametrization
    # Act
    result = calculate_discount(price, percent)
    # Assert
    assert result == expected


@pytest.mark.parametrize(
    "price, percent",
    [
        (-0.1, 50.0),     # negative price
        (-100.0, 10.0),   # large negative price
        (10.0, -0.1),     # negative percent (price valid)
        (10.0, 101.0),    # percent over 100 (price valid)
        (0.0, 100.1),     # percent slightly over (price boundary)
    ],
)
def test_calculate_discount_invalid_inputs_raises_valueerror(price, percent):
    # Arrange: invalid inputs provided via parametrization
    # Act & Assert
    with pytest.raises(ValueError, match='Invalid input'):
        calculate_discount(price, percent)
```
- **Covered behaviors**: Happy paths (valid price >= 0, percent in [0,100], including boundaries, zero price/full discount, decimal calculations, rounding cases including half-even behavior); error paths (price < 0, percent < 0, percent > 100, covering both sides of the `or` condition for branch coverage); exact ValueError message.
- **Assumptions**: Function inputs are finite floats (no NaN/inf testing as unspecified); float equality `==` safe post-rounding to 2 decimals with chosen inputs; no external dependencies to mock; high branch/statement coverage achieved without implementation-specific tests (e.g., no mocking `round`).
Judge notes: The output is solid and covers the core intent well, but including the source function inside the test file is non-standard (test files import from source modules), and the rounding edge cases (1.015, 1.025) silently rely on float representation quirks without verification, making them potentially brittle and misleading.

What makes these work

  1. 01
    Specify the test framework explicitly

    Always state pytest or unittest in your prompt. Models default to whichever they see most in training data and may mix conventions, producing code that imports pytest but uses self.assertEqual style assertions. Saying 'write pytest tests using only pytest idioms' locks in consistent output.

  2. 02
    Paste the actual function, not a description

    Including the real function body, type hints, and docstring gives the model enough context to infer correct return types, exception names, and argument constraints. A vague description produces generic tests that may not match your actual implementation at all.

  3. 03
    Ask for edge cases and invalid inputs separately

    Models tend to generate happy-path tests first. Explicitly ask for boundary values, empty inputs, type errors, and exception paths in the same prompt. You can phrase it as 'include at least one test for each way this function can fail' to push the model toward complete coverage.

  4. 04
    Request mock structure for external dependencies

    If your function touches a database, HTTP endpoint, or file system, tell the model upfront and ask it to use unittest.mock.patch or pytest-mock. Without this instruction, models often generate tests that make real network calls, which break in CI environments.

More example scenarios

#01 · E-commerce discount calculation function
Input
Generate pytest tests for this Python function: def apply_discount(price: float, discount_pct: float) -> float. It applies a percentage discount to a price, raises ValueError if discount_pct is not between 0 and 100, and rounds the result to 2 decimal places. Include edge cases for boundary values and invalid inputs.
Expected output
AI produces a pytest file with @pytest.mark.parametrize covering normal discounts (10%, 50%), boundary values (0%, 100%), floating-point rounding checks, and two pytest.raises(ValueError) tests for -1 and 101 percent inputs. Assertions use pytest.approx for float comparisons.
#02 · Data pipeline CSV parsing utility
Input
Write unittest cases for a Python function parse_csv_row(row: str, delimiter: str = ',') -> dict that splits a CSV row into a dictionary using the first row as headers. Handle empty strings, mismatched column counts, and custom delimiters. Use unittest.mock where needed.
Expected output
AI generates a TestCase class with setUp providing sample header and data rows, separate test methods for standard parsing, tab-delimited input, empty field handling, and a test that asserts a ValueError is raised when column count does not match header count.
#03 · REST API client with external HTTP calls
Input
Generate pytest tests for a Python function get_user(user_id: int) -> dict that calls requests.get against an internal API and returns parsed JSON. Mock the HTTP layer. Test a successful 200 response, a 404 that should raise UserNotFoundError, and a network timeout that should raise ServiceUnavailableError.
Expected output
AI produces pytest tests using unittest.mock.patch to mock requests.get, sets return_value.status_code and return_value.json() per scenario, and uses pytest.raises to assert correct custom exceptions. The fixture isolates the mock cleanly so each test is independent.
#04 · NLP preprocessing for a machine learning pipeline
Input
Write pytest parametrized tests for tokenize_text(text: str, lowercase: bool = True) -> list that splits text into word tokens, strips punctuation, and optionally lowercases. Cover empty input, punctuation-only input, mixed-case with lowercase=False, and a normal sentence.
Expected output
AI generates a single parametrized test function with four input-output tuples: empty string returns empty list, punctuation-only returns empty list, mixed-case with lowercase=False preserves casing, and a normal sentence returns the expected token list.
#05 · Financial interest rate calculator
Input
Generate pytest tests for calculate_compound_interest(principal: float, rate: float, periods: int) -> float. It must raise ValueError for negative principal or rate, and return 0.0 if periods is 0. Test numerical accuracy to 2 decimal places using known compound interest results.
Expected output
AI produces tests with pytest.approx for known compound interest values (e.g., 1000 at 5% for 3 periods equals 1157.63), parametrized edge cases for zero periods, and two pytest.raises blocks for negative principal and negative rate inputs.

Common mistakes to avoid

  • Accepting tests without running them

    AI-generated tests frequently have import errors, wrong exception names, or method calls that do not match your actual codebase. Always run pytest or python -m unittest immediately after pasting the output. A test file that does not execute provides zero coverage.

  • Trusting assertion correctness blindly

    Models calculate expected values on the fly and sometimes get them wrong, especially for floating-point math or complex logic. Verify the expected values in each assertion manually or against a known-correct reference before committing the test file.

  • Skipping edge cases in your prompt

    If you only describe the happy path, you get happy-path tests. The AI does not know that your function crashes on an empty list unless you tell it. Incomplete prompts produce incomplete coverage that gives false confidence.

  • Using generated tests as a substitute for understanding the code

    AI tests describe behavior, but they do not replace reading and understanding the function under test. Engineers who merge AI-generated tests without reviewing them often miss that the tests are asserting the wrong behavior, effectively locking in bugs.

  • Not specifying mock scope or cleanup

    When asking for mocks, models sometimes apply patches at the wrong scope or forget to stop them, causing test pollution where one test's mock affects the next. Ask explicitly for context managers or autouse fixtures to keep tests isolated.

Related queries

Frequently asked questions

Can AI generate pytest fixtures automatically from my existing code?

Yes, if you paste the function and describe the shared setup data, models will generate conftest.py fixtures with appropriate scope. Be specific about whether the fixture should be function-scoped or session-scoped, because models default to function scope and will not infer database or client setup patterns unless you describe them.

Which is better for AI-generated tests: pytest or unittest?

pytest output from AI models tends to be cleaner and more readable because parametrize and fixture syntax reduces boilerplate. unittest is worth requesting when your codebase already uses it consistently or when you need tests that run in environments without pytest installed. Both frameworks work well with AI generation.

How do I get the AI to generate tests for a whole Python module, not just one function?

Paste the entire module and ask the model to generate one TestCase class or one describe block per public function. For large modules, break this into multiple prompts per class or logical group. Asking for a full module in one prompt often produces truncated output that stops mid-file.

Will AI-generated tests actually improve my code coverage percentage?

They will if the tests run and the assertions are correct. Run coverage.py or pytest-cov after adding generated tests to measure the real impact. Models sometimes generate tests that execute a function but assert only on the return value, missing branch coverage for conditional paths inside the function.

How do I generate tests for async Python functions?

Specify in your prompt that the function is async and ask for tests using pytest-asyncio with the @pytest.mark.asyncio decorator. Without this instruction, models generate synchronous test functions that call the coroutine without awaiting it, causing tests to silently pass without executing any async code.

Can I use AI to generate tests for a Django or Flask application?

Yes, but you need to tell the model which framework you are using and paste relevant model or view code. For Django, ask for tests using Django's TestCase and the test client. For Flask, specify the app fixture pattern. Generic pytest output will not include the framework-specific setup your tests need to run.

Try it with a real tool

Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.