Parse Bank Statement PDFs into Transaction Data with AI

Tested prompts for extract transactions from bank statement pdf compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Haiku 4.5 10/10

Bank statement PDFs are designed for human reading, not data processing. If you need to get transactions into a spreadsheet, accounting software, or database, you are stuck copying rows by hand or hoping your bank offers a CSV export that actually works. Most do not, or the export is buried behind three menus and a support ticket.

AI models can read a bank statement PDF, identify every transaction row, and return structured data in seconds. You paste the text content of the PDF, give the model clear instructions about the columns you want, and it outputs clean rows you can paste directly into Excel, import into QuickBooks, or feed into a script.

This page shows you exactly how to do that. The prompt, the model outputs, and a comparison of which models handle messy or multi-column bank statement layouts best. Whether you are reconciling one month of business expenses or processing statements from multiple accounts, this approach saves hours of manual work.

When to use this

This approach works best when you have a PDF bank statement you cannot get as a CSV, when the bank's export format is broken or missing fields, or when you need to process statements from multiple banks in a consistent format. It handles personal accounts, business checking, credit cards, and foreign bank statements equally well.

  • Reconciling business expenses when your bank does not offer CSV export or the export omits merchant names
  • Processing historical statements from a closed account where re-downloading in another format is not possible
  • Consolidating transactions from multiple banks into one uniform spreadsheet for bookkeeping or tax prep
  • Extracting transactions from a client's bank statement PDF for forensic accounting or loan underwriting
  • Automating expense categorization by first extracting clean rows, then passing them to a categorization step

When this format breaks down

  • Scanned image PDFs where the text is not selectable: AI models process text, not pixels. A photographed or fax-scanned statement requires OCR first. Run the PDF through a tool like Adobe Acrobat, Tesseract, or AWS Textract before attempting extraction.
  • Statements longer than the model's context window: A 12-month PDF with thousands of transactions may exceed token limits. Split by month or page range before sending.
  • When you need a legally defensible audit trail: AI extraction can introduce small errors on ambiguous rows. For forensic accounting or legal evidence, verify every row against the source PDF.
  • Fully automated high-volume pipelines: Pasting text into a chat model does not scale to hundreds of statements per day. For that volume, use a dedicated document parsing API or a structured extraction library with the AI as a fallback.

The prompt we tested

You are a financial data extraction assistant. Your task is to parse the provided bank statement content and extract every transaction into structured data.

Follow these instructions precisely:
Extract every transaction into a CSV table with columns: Date (YYYY-MM-DD), Description, Category (best-guess: Income, Transfer, Food, Utilities, Shopping, Fees, Other), Amount (negative for debits, positive for credits), Balance. Preserve original order, do not invent or skip rows, and flag any ambiguous entries in a final 'Notes' section below the table.

Bank statement content:
Chase Checking Statement — March 1-31, 2024. Opening balance: $4,230.15. 03/02 ACH DEPOSIT PAYROLL ACME CORP +$2,850.00 bal $7,080.15; 03/05 POS WHOLE FOODS #221 -$87.43 bal $6,992.72; 03/06 ONLINE TRANSFER TO SAVINGS -$500.00 bal $6,492.72; 03/12 CON EDISON AUTOPAY -$142.18 bal $6,350.54; 03/28 ATM WITHDRAWAL -$200.00 bal $6,150.54. Closing balance: $6,150.54.

Return the extracted transactions now.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.0685 10.5s 445 / 824
Claude Haiku 4.5 10 $0.0016 1.9s 312 / 262
GPT-5.4 9 $0.0031 2.9s 280 / 162
Gemini 2.5 Pro 10 $0.0016 10.6s 323 / 233
Grok 4.1 Fast Reasoning 9 $0.0002 8.7s 423 / 181

Outputs

Claude Opus 4.7 9/10
# Extracted Transactions

| Date | Description | Category | Amount | Balance |
|------------|------------------------------------|-----------|-----------|----------|
| 2024-03-01 | Opening Balance | — | — | 4230.15 |
| 2024-03-02 | ACH DEPOSIT PAYROLL ACME CORP | Income | +2850.00 | 7080.15 |
| 2024-03-05 | POS WHOLE FOODS #221 | Food | -87.43 | 6992.72 |
| 2024-03-06 | ONLINE TRANSFER TO SAVINGS | Transfer | -500.00 | 6492.72 |
| 2024-03-12 | CON EDISON AUTOPAY | Utilities | -142.18 | 6350.54 |
| 2024-03-28 | ATM WITHDRAWAL | Other | -200.00 | 6150.54 |
| 2024-03-31 | Closing Balance | — | — | 6150.54 |

## CSV Format

```csv
Date,Description,Category,Amount,Balance
2024-03-02,ACH DEPOSIT PAYROLL ACME CORP,Income,2850.00,7080.15
2024-03-05,POS WHOLE FOODS #221,Food,-87.43,6992.72
2024-03-06,ONLINE TRANSFER TO SAVINGS,Transfer,-500.00,6492.72
2024-03-12,CON EDISON AUTOPAY,Utilities,-142.18,6350.54
2024-03-28,ATM WITHDRAWAL,Other,-200.00,6150.54
```

## Notes

- **ATM WITHDRAWAL (03/28)**: Categorized as "Other" since the category list does not include a dedicated "Cash" category. It could alternatively be considered a Transfer (cash to self).
- **Opening/Closing balances**: Shown in the table for context but not included as transactions in the CSV output, as they represent balance snapshots rather than ledger entries.
- **Year assumption**: The statement header specifies March 2024; all transaction dates have been normalized to 2024-03-DD accordingly.
- **Reconciliation check**: Opening (4,230.15) + sum of transactions (+2,850.00 − 87.43 − 500.00 − 142.18 − 200.00 = +1,920.39) = 6,150.54, which matches the stated closing balance. ✅
Judge notes: Excellent extraction with proper formatting, accurate categorization, and thoughtful handling of edge cases; only minor deduction for including opening/closing balance rows in the table when the instructions implied they should be transactions only.
Claude Haiku 4.5 10/10
# Extracted Transactions

| Date | Description | Category | Amount | Balance |
|------|---|---|---|---|
| 2024-03-02 | ACH DEPOSIT PAYROLL ACME CORP | Income | +$2,850.00 | $7,080.15 |
| 2024-03-05 | POS WHOLE FOODS #221 | Food | -$87.43 | $6,992.72 |
| 2024-03-06 | ONLINE TRANSFER TO SAVINGS | Transfer | -$500.00 | $6,492.72 |
| 2024-03-12 | CON EDISON AUTOPAY | Utilities | -$142.18 | $6,350.54 |
| 2024-03-28 | ATM WITHDRAWAL | Other | -$200.00 | $6,150.54 |

---

## Notes

No ambiguous entries flagged. All transactions are clearly identifiable with explicit dates, descriptions, and balances. The ATM withdrawal is categorized as "Other" due to unknown use of cash.
Judge notes: Output perfectly matches query intent with accurate extraction, correct formatting, proper categorization, preserved order, and appropriate notes section addressing ambiguity.
GPT-5.4 9/10
Date,Description,Category,Amount,Balance
2024-03-02,ACH DEPOSIT PAYROLL ACME CORP,Income,2850.00,7080.15
2024-03-05,POS WHOLE FOODS #221,Food,-87.43,6992.72
2024-03-06,ONLINE TRANSFER TO SAVINGS,Transfer,-500.00,6492.72
2024-03-12,CON EDISON AUTOPAY,Utilities,-142.18,6350.54
2024-03-28,ATM WITHDRAWAL,Other,-200.00,6150.54

Notes:
- ATM WITHDRAWAL categorized as Other because the end use of cash is not specified.
Judge notes: Excellent extraction with correct CSV formatting, accurate transaction parsing, proper categorization logic, and thoughtful notation of ambiguities; only minor improvement possible would be explicitly confirming opening/closing balance reconciliation.
Gemini 2.5 Pro 10/10
```csv
Date,Description,Category,Amount,Balance
2024-03-02,ACH DEPOSIT PAYROLL ACME CORP,Income,2850.00,7080.15
2024-03-05,POS WHOLE FOODS #221,Food,-87.43,6992.72
2024-03-06,ONLINE TRANSFER TO SAVINGS,Transfer,-500.00,6492.72
2024-03-12,CON EDISON AUTOPAY,Utilities,-142.18,6350.54
2024-03-28,ATM WITHDRAWAL,Other,-200.00,6150.54
```

**Notes:**
*   **ATM WITHDRAWAL (2024-03-28):** Categorized as 'Other' as the final purpose of the cash withdrawal is unknown.
Judge notes: Perfectly extracts all transactions with correct formatting, accurate categorization, proper sign conventions, matching balances, and appropriate notation of ambiguous entries.
Grok 4.1 Fast Reasoning 9/10
```csv
Date,Description,Category,Amount,Balance
2024-03-02,ACH DEPOSIT PAYROLL ACME CORP,Income,2850.00,7080.15
2024-03-05,POS WHOLE FOODS #221,Food,-87.43,6992.72
2024-03-06,ONLINE TRANSFER TO SAVINGS,Transfer,-500.00,6492.72
2024-03-12,CON EDISON AUTOPAY,Utilities,-142.18,6350.54
2024-03-28,ATM WITHDRAWAL,Other,-200.00,6150.54
```

**Notes:** No ambiguous entries. Opening balance ($4,230.15) and closing balance ($6,150.54) noted but not included as transactions. All balances verified as consistent.
Judge notes: Excellently structured output that fully satisfies the query with accurate extraction, proper formatting, correct categorization, and helpful verification notes—only minor room for improvement would be optional inclusion of opening balance as a reference row.

What makes these work

  1. 01
    Copy text, do not screenshot

    Open the PDF in a browser or PDF reader, select all text, and paste it into your prompt. AI models read text tokens, not images. A screenshot or image paste forces the model to rely on vision capabilities, which are slower and more error-prone for tabular data with numbers.

  2. 02
    Define your columns explicitly

    Tell the model exactly which columns you want and in what order before giving it the statement text. Vague instructions like 'extract the transactions' produce inconsistent column names across runs. Specifying 'Date (MM/DD/YYYY), Description, Debit, Credit, Balance' gives you output you can paste directly into a pre-built spreadsheet template.

  3. 03
    Handle missing values with a rule

    Bank statements often show either a debit or a credit per row, leaving the other column blank. Tell the model what to do: 'If Debit or Credit is not applicable for a row, leave the cell empty.' Without this instruction, models sometimes fill blank cells with 0 or a dash, which breaks sum formulas in spreadsheets.

  4. 04
    Ask for JSON output for downstream processing

    If you are feeding results into a script or database, ask for JSON instead of CSV. Specify the exact field names you need as keys. JSON handles description strings with commas far better than CSV, which breaks unless every field is quoted correctly. Switch back to CSV only when the final destination is a spreadsheet opened by a human.

More example scenarios

#01 · Small business owner reconciling a Chase checking account
Input
Here is the text from my Chase business checking statement for March 2024. Extract every transaction as a table with these columns: Date, Description, Debit, Credit, Balance. Use MM/DD/YYYY for dates. If a field is blank, leave it empty. TEXT: 03/01 Beginning Balance 4,250.00 / 03/03 AMAZON WEB SERVICES 128.44 / 03/05 ACH DEPOSIT STRIPE PAYOUT 1,840.00 / 03/07 GOOGLE ADS 340.00 / 03/10 COMCAST BUSINESS 189.99
Expected output
Date, Description, Debit, Credit, Balance
03/01/2024, Beginning Balance, , , 4250.00
03/03/2024, AMAZON WEB SERVICES, 128.44, , 
03/05/2024, ACH DEPOSIT STRIPE PAYOUT, , 1840.00, 
03/07/2024, GOOGLE ADS, 340.00, , 
03/10/2024, COMCAST BUSINESS, 189.99, , 
#02 · Freelancer extracting transactions from a UK Barclays statement for tax return
Input
Extract all transactions from this Barclays statement into CSV format with columns: Date (DD/MM/YYYY), Payee, Money Out (GBP), Money In (GBP), Balance (GBP). Statement text: 02 Jan NETFLIX 15.99 / 05 Jan BACS PAYMENT FROM ACME LTD 2500.00 / 09 Jan TESCO STORES 67.43 / 14 Jan HMRC SELF ASSESS 800.00 / 22 Jan PAYPAL TRANSFER IN 430.00
Expected output
Date,Payee,Money Out (GBP),Money In (GBP),Balance (GBP)
02/01/2024,NETFLIX,15.99,,
05/01/2024,BACS PAYMENT FROM ACME LTD,,2500.00,
09/01/2024,TESCO STORES,67.43,,
14/01/2024,HMRC SELF ASSESS,800.00,,
22/01/2024,PAYPAL TRANSFER IN,,430.00,
#03 · Accountant categorizing transactions from a client's credit card statement
Input
Extract these credit card transactions and add a Category column. Assign each to one of: Travel, Software, Meals, Advertising, Office Supplies, Other. Transactions: 04/01 DELTA AIR LINES 389.00 / 04/03 ZOOM VIDEO 15.99 / 04/05 STARBUCKS 12.40 / 04/08 META ADS 250.00 / 04/12 STAPLES 44.99 / 04/15 HILTON HOTELS 210.00
Expected output
Date, Description, Amount, Category
04/01, DELTA AIR LINES, 389.00, Travel
04/03, ZOOM VIDEO, 15.99, Software
04/05, STARBUCKS, 12.40, Meals
04/08, META ADS, 250.00, Advertising
04/12, STAPLES, 44.99, Office Supplies
04/15, HILTON HOTELS, 210.00, Travel
#04 · Loan officer extracting income deposits from an applicant's bank statement
Input
From the bank statement text below, extract only credit transactions (money coming in) with columns: Date, Description, Amount. Flag any recurring deposits that appear monthly. TEXT: 01/02 DIRECT DEP EMPLOYER PAYROLL 3200.00 / 01/05 VENMO PAYMENT 45.00 / 02/02 DIRECT DEP EMPLOYER PAYROLL 3200.00 / 02/18 TAX REFUND IRS 940.00 / 03/02 DIRECT DEP EMPLOYER PAYROLL 3200.00
Expected output
Date, Description, Amount, Recurring
01/02, DIRECT DEP EMPLOYER PAYROLL, 3200.00, Yes - monthly
01/05, VENMO PAYMENT, 45.00, No
02/02, DIRECT DEP EMPLOYER PAYROLL, 3200.00, Yes - monthly
02/18, TAX REFUND IRS, 940.00, No
03/02, DIRECT DEP EMPLOYER PAYROLL, 3200.00, Yes - monthly
#05 · Developer testing an expense pipeline with a multi-currency statement
Input
Extract all transactions from this multi-currency Wise statement. Columns: Date, Description, Currency, Amount, Direction (IN/OUT). TEXT: 2024-03-10 Subscription Netflix USD -15.99 / 2024-03-11 Invoice payment from ClientCo EUR +1200.00 / 2024-03-13 Transfer to GBP account GBP -500.00 / 2024-03-15 Freelance payment USD +850.00
Expected output
Date, Description, Currency, Amount, Direction
2024-03-10, Subscription Netflix, USD, 15.99, OUT
2024-03-11, Invoice payment from ClientCo, EUR, 1200.00, IN
2024-03-13, Transfer to GBP account, GBP, 500.00, OUT
2024-03-15, Freelance payment, USD, 850.00, IN

Common mistakes to avoid

  • Sending image-based PDFs without OCR

    If your PDF was scanned or faxed, the text layer does not exist. Selecting text selects nothing, and what you paste into the prompt is blank. Always confirm you can highlight and copy individual characters before attempting AI extraction. Run scanned PDFs through OCR first.

  • Ignoring page headers and footers

    When you copy-paste a full PDF, you often include repeated bank headers, account number lines, page numbers, and footer disclaimers. These confuse the model and produce garbage rows. Trim the pasted text to only the transaction table section, or explicitly tell the model to skip rows that do not contain a date and an amount.

  • Not verifying the row count

    AI models occasionally merge two adjacent rows into one or skip a row entirely, especially when description text wraps across lines in the original PDF. After extraction, count the output rows and compare to the original statement. A mismatch means at least one transaction was lost or duplicated.

  • Assuming amounts are always correctly signed

    Some bank statements use parentheses for negative numbers, some use a minus sign, and some use a separate Debit column. If you mix these into a single Amount column without telling the model the sign convention, you can end up with positive numbers for expenses. Specify the exact format you expect for negative values in your prompt.

  • Using this for real-time or ongoing transaction feeds

    Copy-pasting PDFs into a chat interface is a manual step. If you need transactions extracted daily or weekly, this workflow does not scale. Build a pipeline using a document AI API for recurring extraction rather than treating a chat model as a permanent ETL layer.

Related queries

Frequently asked questions

Can AI extract transactions from a scanned bank statement PDF?

Not directly. AI language models process text, so if the PDF is a scanned image with no text layer, the model receives no content to parse. You need to run the PDF through OCR software first. Adobe Acrobat, Google Document AI, AWS Textract, and the open-source Tesseract engine can all add a text layer to scanned PDFs before you pass the result to an AI model.

What is the best format to request the output in: CSV, JSON, or a table?

CSV is best if the final destination is Excel or Google Sheets. JSON is best if you are piping the output into code or a database. Markdown tables work well for quick visual review but are harder to import programmatically. Specify the format explicitly in your prompt and the model will follow the instruction consistently.

Will the AI ever get numbers wrong when extracting transactions?

It can, especially when the source PDF has unusual formatting, wrapped text descriptions, or columns that are not clearly separated. Numbers with commas as thousands separators can occasionally be misread. Always do a spot-check: sum the extracted debit column and compare it to the statement's total debits. Any discrepancy signals a row-level error to investigate.

How do I handle a bank statement that is longer than the AI's context limit?

Split the statement by month or by page range before extraction. Most models handle 10 to 30 pages comfortably. For very long documents, extract in chunks and combine the resulting CSVs. Make sure each chunk includes the column header row in your prompt so the output is consistent across all chunks.

Can I extract transactions from multiple bank accounts in one prompt?

You can, but it is risky. Mixing two statements in one prompt often causes the model to blend rows or lose track of which account a transaction belongs to. It is more reliable to extract each statement separately and add an Account column manually or in a second prompt step.

Is it safe to paste my bank statement text into an AI chat tool?

That depends on the tool's data retention policy. Consumer chat interfaces like ChatGPT may use inputs to improve models by default unless you opt out. For sensitive personal or business financial data, use an API call with data processing agreements in place, or run a local model that does not send data to external servers. Always review the privacy terms of any tool before pasting financial records.

Try it with a real tool

Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.