{"merchant":"Whole Foods Market","date":"2024-03-14","currency":"USD","line_items":[{"description":"Organic Bananas","quantity":2,"unit_price":0.79,"total":1.58},{"description":"Almond Milk 64oz","quantity":1,"unit_price":4.99,"total":4.99},{"description":"Sourdough Bread","quantity":1,"unit_price":5.49,"total":5.49},{"description":"Chicken Breast","quantity":1.2,"unit_price":6.99,"total":8.39}],"subtotal":20.45,"tax":1.23,"tip":null,"grand_total":21.68}
Pull Line Items and Totals from Receipt PDFs Using AI
Tested prompts for extract line items from receipt pdf compared across 5 leading AI models.
If you have a stack of receipt PDFs and need the individual line items, quantities, prices, and totals pulled out into usable data, you are dealing with one of the most common document extraction problems in accounting, expense management, and operations. Manually retyping those numbers is slow, error-prone, and does not scale past a handful of receipts.
AI models can read a receipt PDF description or pasted text and return structured data: item names, quantities, unit prices, subtotals, tax, and grand total. The output can be formatted as JSON, CSV-ready rows, or a simple table depending on what your downstream workflow needs.
This page shows you exactly how to prompt an AI to extract line items from receipt PDFs, compares how four leading models handle the same receipt, and covers the edge cases where the approach works well and where it falls short. Whether you are processing one receipt or building a pipeline for hundreds, start here.
When to use this
This approach fits whenever you need structured data from receipt PDFs and the receipts are text-readable or have been run through OCR. It works for one-off extractions you do manually and for automated pipelines where you feed receipt text into an AI API. It is the right tool when accuracy on individual line items matters more than speed alone.
- Reconciling employee expense reports where you need itemized detail, not just the total
- Feeding restaurant or retail receipts into accounting software that requires line-level data
- Auditing purchase receipts to verify that billed items match a purchase order
- Extracting grocery or supply receipts into a spreadsheet for budget tracking
- Building a receipt-processing microservice that normalizes line items across multiple vendor formats
When this format breaks down
- The PDF is a scanned image with no OCR layer. The AI cannot see pixel data directly; you need an OCR step first to convert the image to text before extraction is possible.
- Receipts are handwritten. OCR accuracy on handwriting is low enough that AI extraction will produce frequent errors in item names and prices that require manual correction anyway.
- You need legally defensible extraction with a documented audit trail. AI outputs can hallucinate or misread ambiguous characters; a certified data extraction service with human review is more appropriate for tax or legal filings.
- The receipt is extremely long and complex, such as a multi-page itemized hospital bill or a contractor invoice with hundreds of SKUs. Token limits and formatting inconsistencies across pages increase error rates significantly without a chunking and validation strategy in place.
The prompt we tested
You are a receipt parsing assistant. Extract structured line item data from the receipt content provided.
Rules:
Output a JSON object with fields: merchant, date (YYYY-MM-DD), currency, line_items (array of {description, quantity, unit_price, total}), subtotal, tax, tip, and grand_total. Use null for any field not found, preserve original item descriptions, and return only the JSON with no commentary or markdown fences.
Receipt content:
Receipt from Whole Foods Market, 03/14/2024. Organic Bananas 2lb @ $0.79/lb = $1.58, Almond Milk 64oz $4.99, Sourdough Bread $5.49, Chicken Breast 1.2lb @ $6.99/lb = $8.39. Subtotal $20.45, Tax $1.23, Total $21.68. Paid with Visa ending 4421.
Return the extracted data now.
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 9 | $0.0217 | 4.3s | 314 / 227 |
| Claude Haiku 4.5 | 9 | $0.0017 | 1.6s | 229 / 293 |
| GPT-5.4 | 9 | $0.0028 | 2.3s | 201 / 150 |
| Gemini 2.5 Pro | 9 | $0.0017 | 9.0s | 224 / 292 |
| Grok 4.1 Fast Reasoning | 9 | $0.0001 | 13.9s | 350 / 153 |
Outputs
```json
{
"merchant": "Whole Foods Market",
"date": "2024-03-14",
"currency": null,
"line_items": [
{
"description": "Organic Bananas 2lb",
"quantity": 2,
"unit_price": 0.79,
"total": 1.58
},
{
"description": "Almond Milk 64oz",
"quantity": 1,
"unit_price": 4.99,
"total": 4.99
},
{
"description": "Sourdough Bread",
"quantity": 1,
"unit_price": 5.49,
"total": 5.49
},
{
"description": "Chicken Breast 1.2lb",
"quantity": 1.2,
"unit_price": 6.99,
"total": 8.39
}
],
"subtotal": 20.45,
"tax": 1.23,
"tip": null,
"grand_total": 21.68
}
```
{"merchant":"Whole Foods Market","date":"2024-03-14","currency":"USD","line_items":[{"description":"Organic Bananas","quantity":"2 lb","unit_price":0.79,"total":1.58},{"description":"Almond Milk 64oz","quantity":1,"unit_price":4.99,"total":4.99},{"description":"Sourdough Bread","quantity":1,"unit_price":5.49,"total":5.49},{"description":"Chicken Breast","quantity":"1.2 lb","unit_price":6.99,"total":8.39}],"subtotal":20.45,"tax":1.23,"tip":null,"grand_total":21.68}
```json
{
"merchant": "Whole Foods Market",
"date": "2024-03-14",
"currency": "USD",
"line_items": [
{
"description": "Organic Bananas",
"quantity": 2,
"unit_price": 0.79,
"total": 1.58
},
{
"description": "Almond Milk 64oz",
"quantity": 1,
"unit_price": 4.99,
"total": 4.99
},
{
"description": "Sourdough Bread",
"quantity": 1,
"unit_price": 5.49,
"total": 5.49
},
{
"description": "Chicken Breast",
"quantity": 1.2,
"unit_price": 6.99,
"total": 8.39
}
],
"subtotal": 20.45,
"tax": 1.23,
"tip": null,
"grand_total": 21.68
}
```
{"merchant":"Whole Foods Market","date":"2024-03-14","currency":"USD","line_items":[{"description":"Organic Bananas 2lb","quantity":2,"unit_price":0.79,"total":1.58},{"description":"Almond Milk 64oz","quantity":1,"unit_price":4.99,"total":4.99},{"description":"Sourdough Bread","quantity":1,"unit_price":5.49,"total":5.49},{"description":"Chicken Breast 1.2lb","quantity":1.2,"unit_price":6.99,"total":8.39}],"subtotal":20.45,"tax":1.23,"tip":null,"grand_total":21.68}
What makes these work
-
01Specify your exact output format
Tell the model precisely what format you need before it sees any receipt data: JSON with named keys, CSV with specific column headers, or a markdown table. If you leave the format open, different receipts will come back in inconsistent shapes that break any downstream parsing. Include a short schema or example object in your prompt when consistency really matters.
-
02Ask for math validation in the prompt
Receipts frequently contain printing errors, rounding discrepancies, or OCR artifacts that produce numbers that do not add up. Explicitly instruct the model to calculate each line total from quantity times unit price and flag mismatches. This turns the AI into a lightweight auditing step rather than a passive transcription tool.
-
03Pre-clean your OCR text before prompting
OCR output from scanned receipts often has stray characters, broken column alignment, and line breaks in the middle of item names. Spend thirty seconds removing obvious garbage before pasting into the prompt. The cleaner the input text, the more accurate the extracted structure, especially for price columns that OCR commonly mangles.
-
04Use system-level instructions for batch pipelines
If you are processing receipts programmatically through an API, put your extraction schema and rules in the system message so they apply to every receipt without repeating them in every user message. Reserve the user message for the raw receipt text only. This keeps token usage lower and output format more consistent across a large batch.
More example scenarios
Extract all line items from this restaurant receipt as a JSON array. Each object should have: item_name, quantity, unit_price, line_total. Also include a summary object with subtotal, tax, tip, and grand_total. Receipt text: Grilled Salmon 1x $24.00 Caesar Salad 2x $9.50 Sparkling Water 2x $4.00 Creme Brulee 1x $8.00 Subtotal $55.00 Tax $4.95 Tip $11.00 Total $70.95
{"line_items":[{"item_name":"Grilled Salmon","quantity":1,"unit_price":24.00,"line_total":24.00},{"item_name":"Caesar Salad","quantity":2,"unit_price":9.50,"line_total":19.00},{"item_name":"Sparkling Water","quantity":2,"unit_price":4.00,"line_total":8.00},{"item_name":"Creme Brulee","quantity":1,"unit_price":8.00,"line_total":8.00}],"summary":{"subtotal":55.00,"tax":4.95,"tip":11.00,"grand_total":70.95}}
Parse this grocery receipt into a CSV-formatted table with columns: item, qty, price_each, total. Add a final row for the grand total. Receipt: Organic Whole Milk 64oz x2 @ $5.49 Sourdough Bread x1 @ $4.29 Free Range Eggs 12ct x1 @ $6.99 Cheddar Cheese 8oz x3 @ $3.79 Total: $35.81
item,qty,price_each,total Organic Whole Milk 64oz,2,5.49,10.98 Sourdough Bread,1,4.29,4.29 Free Range Eggs 12ct,1,6.99,6.99 Cheddar Cheese 8oz,3,3.79,11.37 GRAND TOTAL,,,35.81
You are an accounts payable assistant. Extract line items from this office supply receipt and flag any item where the line total does not match qty multiplied by unit price. Return a JSON array. Receipt: Printer Paper Case x4 $42.00 ea = $168.00 Blue Pens Box x10 $3.50 ea = $38.00 Stapler x2 $14.99 ea = $29.98 Subtotal $235.98 Tax $18.88 Total $254.86
{"line_items":[{"item":"Printer Paper Case","qty":4,"unit_price":42.00,"stated_total":168.00,"calculated_total":168.00,"flag":false},{"item":"Blue Pens Box","qty":10,"unit_price":3.50,"stated_total":38.00,"calculated_total":35.00,"flag":true,"note":"Stated total $38.00 does not match calculated $35.00"},{"item":"Stapler","qty":2,"unit_price":14.99,"stated_total":29.98,"calculated_total":29.98,"flag":false}],"subtotal":235.98,"tax":18.88,"total":254.86}
Extract the itemized charges from this hotel folio receipt. Group them by category: room_charges, food_beverage, fees_taxes, other. Return as JSON. Folio: Room Rate Night 1 $189.00 Room Rate Night 2 $189.00 Room Service Breakfast $34.50 Mini Bar $12.00 Parking $28.00 Resort Fee $35.00 City Tax $37.80 State Tax $22.68 Total $548.98
{"room_charges":[{"item":"Room Rate Night 1","amount":189.00},{"item":"Room Rate Night 2","amount":189.00}],"food_beverage":[{"item":"Room Service Breakfast","amount":34.50},{"item":"Mini Bar","amount":12.00}],"fees_taxes":[{"item":"City Tax","amount":37.80},{"item":"State Tax","amount":22.68}],"other":[{"item":"Parking","amount":28.00},{"item":"Resort Fee","amount":35.00}],"grand_total":548.98}
Extract line items from this pharmacy receipt. Mark each item as FSA_eligible: true or false. Prescription drugs and medical devices are eligible; cosmetics, vitamins, and convenience items are not. Return JSON. Items: Lisinopril 10mg Rx #4421 $15.00 Bandage Assortment $6.49 Shampoo $8.99 Blood Pressure Monitor $45.00 Vitamin C 500mg $12.99 Total $88.47
{"line_items":[{"item":"Lisinopril 10mg Rx #4421","amount":15.00,"fsa_eligible":true},{"item":"Bandage Assortment","amount":6.49,"fsa_eligible":true},{"item":"Shampoo","amount":8.99,"fsa_eligible":false},{"item":"Blood Pressure Monitor","amount":45.00,"fsa_eligible":true},{"item":"Vitamin C 500mg","amount":12.99,"fsa_eligible":false}],"total":88.47,"fsa_eligible_total":66.49,"non_eligible_total":21.98}
Common mistakes to avoid
-
Sending image PDFs without OCR
Most AI chat interfaces and APIs accept text, not raw PDF bytes or embedded images. If your receipt PDF is a scanned image, the model has no text to read. Run the file through an OCR tool first to extract the text layer, then pass that text to the AI for structured extraction.
-
Vague output instructions
Prompts like 'extract the line items' without specifying format produce inconsistent results across receipts. One response might be a bulleted list, the next a JSON object, the next a prose paragraph. Always define the exact output structure you need, including key names, data types, and how to handle missing fields.
-
Trusting totals without verification
AI models can confidently reproduce an incorrect total that was already wrong on the original receipt, or occasionally transpose digits in prices. If the extracted data feeds into accounting software or a reimbursement claim, always programmatically sum the line totals and compare against the stated grand total before accepting the output.
-
Not handling multi-page receipts as chunks
Feeding a very long receipt as a single prompt can cause the model to truncate items near the token limit or lose track of column alignment across page breaks. Split multi-page receipts at natural page boundaries, extract each page separately, and merge the resulting arrays before doing your final total validation.
-
Ignoring currency and locale differences
Receipts from non-US vendors may use comma as the decimal separator, display VAT separately, or list prices in a foreign currency. If you do not specify how to handle these in your prompt, the model may silently convert or misparse numbers. Include a note about expected currency format and tax label conventions when processing international receipts.
Related queries
Frequently asked questions
Can I extract line items from a scanned receipt PDF?
Yes, but you need an OCR step first. Tools like Adobe Acrobat, AWS Textract, Google Document AI, or open-source options like Tesseract can convert the scanned image into machine-readable text. Once you have that text, you paste it into the AI prompt and extract the structured line items as you would from any digital receipt.
What is the best format to ask the AI to return extracted receipt data in?
JSON is the best default if the data will be processed programmatically, because it is easy to parse in any language and supports nested structures like a line_items array plus a summary object. CSV is better if you are pasting results directly into a spreadsheet. Choose based on your next step, and always specify the exact keys or column names you want.
How accurate is AI at extracting receipt line items?
On clean, text-layer PDFs from modern point-of-sale systems, accuracy is very high, typically above 95 percent for item names and prices. Accuracy drops on low-quality OCR output, handwritten amounts, or receipts with unusual formatting like item codes mixed into the price column. Always build a validation check that sums extracted line totals and compares them to the stated grand total.
Can I automate this for hundreds of receipts at once?
Yes. Use the AI provider's API rather than the chat interface. Write a script that loops through your receipt files, runs OCR if needed, sends the text to the API with your extraction prompt, parses the JSON response, and writes the results to a database or spreadsheet. Most providers offer batch endpoints or high rate limits suitable for bulk processing.
Will the AI extract tax and tip separately from the subtotal?
It will if you ask it to explicitly. Include instructions in your prompt like 'return a summary object with separate fields for subtotal, tax, tip, and grand_total.' Without that instruction, some models will lump those together or omit them. Being specific about every field you need is the single biggest driver of consistent output quality.
Can I use this approach to extract line items from invoice PDFs as well as receipts?
Yes, the same technique applies to invoices, purchase orders, and bills of lading. Invoices often have additional fields like SKU codes, account numbers, and payment terms that you should add to your extraction schema. The core prompt structure is identical: describe the output format, provide the document text, and ask for structured extraction.
Try it with a real tool
Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.