Auto-Generate Python Docstrings with AI Tools

Tested prompts for ai python docstring generator compared across 5 leading AI models.

BEST BY JUDGE SCORE GPT-5.4 9/10

If you are writing Python functions and classes without docstrings, you are creating technical debt that slows down every developer who touches the code after you, including yourself six months from now. Manually writing Google-style, NumPy-style, or Sphinx-compatible docstrings for every function is tedious and easy to skip under deadline pressure. An AI Python docstring generator solves that by reading your existing code and producing structured, accurate documentation in seconds.

The tools and prompts on this page let you paste a raw Python function and get back a complete docstring covering parameters, return values, raised exceptions, and usage examples. The comparison table below shows how GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Mistral Large handle the same input so you can pick the model that fits your style guide and budget.

This is not about replacing engineering judgment. It is about eliminating the mechanical part of documentation so you can focus on the logic that actually needs explanation. Whether you are retroactively documenting a legacy codebase or building a habit of documenting as you write, the prompt-and-model combinations tested here give you a reliable starting point.

When to use this

AI docstring generation is the right approach when you have working Python code that lacks documentation, when you are onboarding new contributors to an existing codebase, or when your team enforces a specific docstring format like Google, NumPy, or reStructuredText but writing it by hand slows down pull request velocity.

  • Retroactively documenting legacy functions across a large Python codebase
  • Enforcing a consistent docstring format (Google, NumPy, Sphinx) across a team without manual style reviews
  • Generating first-draft docstrings during active development so documentation does not fall behind code
  • Preparing open-source libraries for public release where missing docs block adoption
  • Creating docstrings for auto-generated or boilerplate code where manual writing adds no intellectual value

When this format breaks down

  • When the function contains proprietary business logic that should not be sent to a third-party API, use a locally hosted model instead of a cloud service.
  • When the function is genuinely ambiguous or broken, the AI will document what the code does, not what it should do. Fix the logic first or the docstring will codify the bug.
  • When your codebase requires domain-specific terminology that the model has no exposure to, such as internal acronyms or regulatory identifiers, the generated docstring will use generic language that misleads more than it helps.
  • When the function signature alone does not reveal intent and you have not provided surrounding context, the AI will produce a structurally correct but semantically shallow docstring that fails code review anyway.

The prompt we tested

You are an expert Python developer and technical writer specializing in generating clear, accurate, PEP 257-compliant docstrings. Analyze the Python code provided and generate comprehensive docstrings for every function, class, and method.

Rules to follow:
Use Google-style docstrings with sections for Args, Returns, Raises, and Examples where applicable. Infer parameter types and return types from the code, include a concise one-line summary followed by a blank line and a detailed description, and keep line length under 88 characters.

Python code to document:
def calculate_discount(price, discount_percent, min_price=0):
    discounted = price * (1 - discount_percent / 100)
    if discounted < min_price:
        raise ValueError('Discounted price below minimum')
    return round(discounted, 2)

Return the complete code with docstrings inserted in the correct locations. Do not modify the logic of the code.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 8 $0.0411 6.5s 302 / 488
Claude Haiku 4.5 8 $0.0024 3.6s 224 / 439
GPT-5.4 9 $0.0041 3.3s 190 / 241
Gemini 2.5 Pro 8 $0.0019 8.7s 205 / 330
Grok 4.1 Fast Reasoning 8 $0.0002 4.5s 338 / 185

Outputs

Claude Opus 4.7 8/10
Here's the code with comprehensive Google-style docstrings added:

```python
def calculate_discount(price, discount_percent, min_price=0):
    """Calculate the discounted price of an item.

    Applies a percentage-based discount to the given price and validates
    that the resulting price does not fall below a specified minimum
    threshold. The final result is rounded to two decimal places.

    Args:
        price (float): The original price of the item before discount.
            Must be a non-negative numeric value.
        discount_percent (float): The discount percentage to apply,
            expressed as a number between 0 and 100 (e.g., 25 for 25%).
        min_price (float, optional): The minimum allowable price after
            the discount is applied. Defaults to 0.

    Returns:
        float: The discounted price, rounded to two decimal places.

    Raises:
        ValueError: If the calculated discounted price is less than
            ``min_price``.

    Examples:
        >>> calculate_discount(100, 20)
        80.0
        >>> calculate_discount(50, 10, min_price=40)
        45.0
        >>> calculate_discount(100, 90, min_price=20)
        Traceback (most recent call last):
            ...
        ValueError: Discounted price below minimum
    """
    discounted = price * (1 - discount_percent / 100)
    if discounted < min_price:
        raise ValueError('Discounted price below minimum')
    return round(discounted, 2)
```
Judge notes: The output is production-ready and accurate, but the introductory prose ("Here's the code with comprehensive Google-style docstrings added:") is unnecessary noise that would need removal before use.
Claude Haiku 4.5 8/10
```python
def calculate_discount(price, discount_percent, min_price=0):
    """Calculate the discounted price with optional minimum price validation.

    Applies a percentage-based discount to a given price and ensures the
    result meets a minimum price threshold. The final discounted price is
    rounded to two decimal places.

    Args:
        price (float): The original price before discount.
        discount_percent (float): The discount percentage to apply
            (0-100).
        min_price (float, optional): The minimum acceptable discounted
            price. Defaults to 0.

    Returns:
        float: The discounted price rounded to two decimal places.

    Raises:
        ValueError: If the discounted price falls below min_price.

    Examples:
        >>> calculate_discount(100, 10)
        90.0
        >>> calculate_discount(100, 10, min_price=95)
        Traceback (most recent call last):
            ...
        ValueError: Discounted price below minimum
        >>> calculate_discount(99.99, 15, min_price=80)
        84.99
    """
    discounted = price * (1 - discount_percent / 100)
    if discounted < min_price:
        raise ValueError('Discounted price below minimum')
    return round(discounted, 2)
```

**Docstring Explanation:**

- **One-line summary**: Concisely describes the function's primary purpose
- **Detailed description**: Explains the behavior, including discount application and rounding
- **Args section**: Documents all three parameters with inferred types (float) and the optional parameter note
- **Returns section**: Specifies the return type (float) and format (rounded to 2 decimals)
- **Raises section**: Documents the ValueError exception condition
- **Examples section**: Provides three realistic usage examples including the error case
Judge notes: The docstring itself is production-ready and thorough, but the appended "Docstring Explanation" section was not requested and clutters the output, making it not quite flawless.
GPT-5.4 9/10
```python
def calculate_discount(price, discount_percent, min_price=0):
    """Calculate the discounted price and enforce a minimum allowed value.

    Applies a percentage discount to the given price, rounds the result to two
    decimal places, and validates that the discounted price is not below the
    specified minimum price.

    Args:
        price (int | float): The original price before applying the discount.
        discount_percent (int | float): The discount percentage to apply.
        min_price (int | float, optional): The minimum allowed discounted
            price. Defaults to 0.

    Returns:
        float: The discounted price rounded to two decimal places.

    Raises:
        ValueError: If the discounted price is less than `min_price`.

    Examples:
        >>> calculate_discount(100, 20)
        80.0
        >>> calculate_discount(50, 10, min_price=40)
        45.0
    """
    discounted = price * (1 - discount_percent / 100)
    if discounted < min_price:
        raise ValueError('Discounted price below minimum')
    return round(discounted, 2)
```
Judge notes: Nearly flawless — the only nitpick is the Examples section uses `min_price=40` as a keyword argument when the function signature would also accept it positionally, which is a stylistic choice that some style guides would flag as inconsistent with the first example.
Gemini 2.5 Pro 8/10
```python
def calculate_discount(price, discount_percent, min_price=0):
    """Calculates the final price after applying a discount.

    This function applies a percentage-based discount to a given price. It
    also ensures that the final price does not fall below a specified minimum
    threshold.

    Args:
        price (float): The original price of the item.
        discount_percent (float): The discount percentage to apply (e.g., 10 for 10%).
        min_price (float, optional): The minimum allowed price after the discount.
            Defaults to 0.

    Returns:
        float: The price after the discount is applied, rounded to two
            decimal places.

    Raises:
        ValueError: If the calculated discounted price is less than `min_price`.

    Examples:
        >>> calculate_discount(100, 20)
        80.0
        >>> calculate_discount(99.99, 15.5)
        84.49
        >>> calculate_discount(50, 50, min_price=30)
        Traceback (most recent call last):
            ...
        ValueError: Discounted price below minimum
    """
    discounted = price * (1 - discount_percent / 100)
    if discounted < min_price:
        raise ValueError('Discounted price below minimum')
    return round(discounted, 2)

```
Judge notes: The output is production-ready and accurate, but the `discount_percent` line exceeds 88 characters, and the examples could include a non-raising case with `min_price` to better illustrate that parameter.
Grok 4.1 Fast Reasoning 8/10
```python
def calculate_discount(price, discount_percent, min_price=0):
    """Calculate discounted price, ensuring it meets minimum threshold.

    Applies a percentage discount to the given price, rounds the result to two
    decimal places, and validates against a minimum price threshold.

    Args:
        price (float): Original price before discount.
        discount_percent (float): Discount percentage (e.g., 20.0 for 20%).
        min_price (float, optional): Minimum allowable price after discount.
            Defaults to 0.

    Returns:
        float: Discounted price rounded to 2 decimal places.

    Raises:
        ValueError: If discounted price is below min_price.
    """
    discounted = price * (1 - discount_percent / 100)
    if discounted < min_price:
        raise ValueError('Discounted price below minimum')
    return round(discounted, 2)
```
Judge notes: Missing an `Examples:` section, which the rules explicitly require "where applicable" and is clearly applicable here for a utility function like this.

What makes these work

  1. 01
    Specify the docstring format explicitly

    Google, NumPy, and Sphinx reStructuredText formats have meaningfully different syntax. If you do not name the format in your prompt, the model will guess, and it often guesses wrong for your codebase. Add 'Write a Google-style docstring' or 'Use NumPy format' as the first line of your prompt to get consistent output across all generated docs.

  2. 02
    Include the full function signature and type hints

    Type hints give the model information it needs to generate accurate Args and Returns sections. If your function lacks type annotations, paste the function and also describe the expected types in plain language before asking for the docstring. Partial type information produces partial docstrings that still need manual editing.

  3. 03
    Ask for a usage example when the function is non-obvious

    Docstrings with a short Example section dramatically reduce onboarding time for unfamiliar contributors. Explicitly request 'include a one-line usage example in the docstring' for any function whose arguments interact in a non-obvious way. Models reliably produce correct examples when the function logic is visible in the prompt.

  4. 04
    Batch similar functions in one prompt

    If you are documenting a module, paste three to five related functions into a single prompt with the instruction 'Write a Google-style docstring for each of the following functions.' This keeps style consistent across a logical group and reduces the number of API calls you make. Check each output individually since the model may conflate parameter descriptions across similar signatures.

More example scenarios

#01 · Data pipeline: parsing raw CSV rows
Input
def parse_transaction_row(row: dict, currency: str = 'USD') -> dict:
    amount = float(row['amount'])
    if amount < 0:
        raise ValueError('Amount cannot be negative')
    return {'id': row['transaction_id'], 'amount': amount, 'currency': currency, 'date': row['date']}
Expected output
Parses a raw transaction row from a CSV import into a normalized dictionary.

Args:
    row (dict): A single row from the CSV reader containing 'transaction_id', 'amount', and 'date' keys.
    currency (str): ISO 4217 currency code to attach to the transaction. Defaults to 'USD'.

Returns:
    dict: Normalized transaction with keys 'id', 'amount', 'currency', and 'date'.

Raises:
    ValueError: If the amount value in the row is negative.
#02 · ML engineering: custom Scikit-learn transformer
Input
class LogTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, offset=1.0):
        self.offset = offset
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return np.log(X + self.offset)
Expected output
Applies a log transformation to input features for use in Scikit-learn pipelines.

Adds a configurable offset before applying the natural logarithm to avoid log(0) errors on zero-valued features. Compatible with Pipeline and GridSearchCV.

Attributes:
    offset (float): Constant added to all values before log transformation. Defaults to 1.0.

Methods:
    fit(X, y=None): No-op. Returns self for pipeline compatibility.
    transform(X): Applies log(X + offset) element-wise and returns the result.
#03 · Web backend: FastAPI route handler
Input
async def get_user_orders(user_id: int, status: Optional[str] = None, db: AsyncSession = Depends(get_db)) -> List[OrderSchema]:
    query = select(Order).where(Order.user_id == user_id)
    if status:
        query = query.where(Order.status == status)
    result = await db.execute(query)
    return result.scalars().all()
Expected output
Retrieves all orders belonging to a specific user, with optional status filtering.

Args:
    user_id (int): Primary key of the user whose orders are being fetched.
    status (Optional[str]): If provided, filters results to orders matching this status string (e.g., 'pending', 'shipped').
    db (AsyncSession): Injected async database session provided by FastAPI's dependency system.

Returns:
    List[OrderSchema]: A list of order records. Returns an empty list if no matching orders exist.
#04 · DevOps utility: retry decorator
Input
def retry(max_attempts=3, delay=1.0, exceptions=(Exception,)):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    if attempt == max_attempts - 1:
                        raise
                    time.sleep(delay)
        return wrapper
    return decorator
Expected output
Decorator factory that retries a function call on specified exceptions.

Args:
    max_attempts (int): Maximum number of times to call the wrapped function before re-raising. Defaults to 3.
    delay (float): Seconds to wait between retry attempts. Defaults to 1.0.
    exceptions (tuple): Exception types that trigger a retry. Defaults to (Exception,).

Returns:
    Callable: A decorator that wraps the target function with retry logic.

Example:
    @retry(max_attempts=5, delay=2.0, exceptions=(ConnectionError,))
    def fetch_data(url): ...
#05 · Scientific computing: NumPy array normalization
Input
def normalize_array(arr: np.ndarray, axis: int = 0, epsilon: float = 1e-8) -> np.ndarray:
    mean = np.mean(arr, axis=axis, keepdims=True)
    std = np.std(arr, axis=axis, keepdims=True)
    return (arr - mean) / (std + epsilon)
Expected output
Normalizes a NumPy array to zero mean and unit variance along a specified axis.

Adds a small epsilon to the standard deviation to prevent division by zero on constant-valued slices.

Args:
    arr (np.ndarray): Input array of any shape.
    axis (int): Axis along which mean and standard deviation are computed. Defaults to 0.
    epsilon (float): Small constant added to std for numerical stability. Defaults to 1e-8.

Returns:
    np.ndarray: Normalized array with the same shape as the input.

Common mistakes to avoid

  • Pasting only the function name

    The AI cannot infer parameter meaning from a name like 'process_data' without seeing the body. Without the implementation, you get generic placeholder text like 'Processes the data' that fails any code review. Always paste the complete function body, including internal logic, so the model understands what each parameter actually controls.

  • Skipping review of generated Raises sections

    Models frequently miss implicit exceptions, such as a KeyError from an unguarded dictionary lookup or a TypeError from unsanitized input. An incomplete Raises section is worse than none because it gives callers a false sense of the function's failure modes. Always read the generated docstring against the function body and add any missing exception cases manually.

  • Using cloud APIs for proprietary code without review

    Sending internal business logic to GPT-4o or Claude sends that code to a third-party server. Many enterprise security policies prohibit this. If your function contains sensitive algorithms, pricing logic, or customer data patterns, run a local model like Ollama with Code Llama instead of using a hosted API.

  • Accepting the output without updating after refactoring

    Generated docstrings go stale as fast as hand-written ones. If you rename a parameter or add a return value, the existing docstring becomes misleading. Treat AI-generated docstrings as draft content that must be version-controlled and updated with the code, not as permanent artifacts.

  • Ignoring format consistency across a module

    Running docstring generation in separate sessions without a fixed format instruction often produces a mix of styles within the same file. Sphinx will fail to parse a file where some functions use Google format and others use reStructuredText directives. Set a project-level format rule and paste it at the top of every generation prompt.

Related queries

Frequently asked questions

What is the best AI tool to generate Python docstrings automatically?

It depends on your workflow. For in-editor generation, GitHub Copilot and Cursor are fastest because they read your full file context without copy-pasting. For one-off or batch generation via prompt, GPT-4o and Claude 3.5 Sonnet produce the most accurate Args and Returns sections on complex functions. The comparison table on this page shows side-by-side output quality for the same test function across four models.

Can AI generate docstrings in Google style vs NumPy style?

Yes. All major models support both formats when you specify the format explicitly in your prompt. Without that instruction, the output format is unpredictable. Include the phrase 'Write a Google-style docstring' or 'Write a NumPy-style docstring' at the start of your prompt to get consistently formatted output you can drop directly into your code.

Is there a VS Code extension that generates Python docstrings with AI?

Yes. The most widely used options are autoDocstring, which uses template-based generation and works offline, and GitHub Copilot, which uses AI and reads surrounding file context. Cursor (a VS Code fork) also generates docstrings inline when you type a triple-quote below a function signature. For teams standardizing on a specific format, autoDocstring combined with a custom template gives the most control.

How do I generate docstrings for an entire Python file at once?

Paste the full file content into a prompt with the instruction 'Add Google-style docstrings to every public function and class method in this file. Do not change any existing code.' GPT-4o handles files up to roughly 500 lines reliably within a single context window. For larger files, split by module or class and run each section separately, then reassemble.

Does AI-generated docstring output work with Sphinx autodoc?

It works if you generate reStructuredText-format docstrings using the :param:, :type:, :returns:, and :raises: directive syntax. If you prefer Google or NumPy format, install the sphinx-napoleon extension, which translates both formats automatically. Specify the target format in your prompt before generating, and verify the output parses without Sphinx warnings using sphinx-build -b html.

How accurate are AI-generated Python docstrings?

Structurally, they are highly accurate when the full function body is visible in the prompt. Parameter names and return type descriptions match the code in roughly 90 percent of cases in informal testing. The most common errors are incomplete Raises sections and incorrect descriptions for parameters whose purpose is only clear from broader business context. Always review generated output before committing, especially for functions with side effects.

Try it with a real tool

Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.