# Top 10 Customers Query Solutions
## 1. Schema Assumptions
- `orders` table: `id` (PK), `customer_id` (FK → customers.id), `total_amount` (NUMERIC/DECIMAL), `created_at` (TIMESTAMP), `status` (VARCHAR)
- `customers` table: `id` (PK), `name` (VARCHAR), `email` (VARCHAR), `country` (VARCHAR)
- Country values stored as exact strings `'Germany'` and `'France'`
- Status value `'completed'` is lowercase
- PostgreSQL connection string format assumed
---
## 2. SQLAlchemy ORM Version
```python
from datetime import datetime, timedelta
from decimal import Decimal
from typing import List, Tuple
from sqlalchemy import (
Column, Integer, String, Numeric, DateTime, ForeignKey, create_engine, func
)
from sqlalchemy.orm import declarative_base, relationship, sessionmaker, Session
Base = declarative_base()
class Customer(Base):
__tablename__ = "customers"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
email = Column(String, nullable=False)
country = Column(String, nullable=False)
orders = relationship("Order", back_populates="customer")
class Order(Base):
__tablename__ = "orders"
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey("customers.id"), nullable=False)
total_amount = Column(Numeric(12, 2), nullable=False)
created_at = Column(DateTime, nullable=False)
status = Column(String, nullable=False)
customer = relationship("Customer", back_populates="orders")
def get_top_customers_orm(
db_url: str,
countries: Tuple[str, ...] = ("Germany", "France"),
days: int = 90,
limit: int = 10,
) -> List[Tuple[str, str, Decimal]]:
engine = create_engine(db_url)
SessionLocal = sessionmaker(bind=engine, expire_on_commit=False)
cutoff = datetime.utcnow() - timedelta(days=days)
total_spent = func.sum(Order.total_amount).label("total_spent")
with SessionLocal() as session: # type: Session
results = (
session.query(Customer.name, Customer.email, total_spent)
.join(Order, Order.customer_id == Customer.id)
.filter(
Customer.country.in_(countries),
Order.status == "completed",
Order.created_at >= cutoff,
)
.group_by(Customer.id, Customer.name, Customer.email)
.order_by(total_spent.desc())
.limit(limit)
.all()
)
return [(name, email, total) for name, email, total in results]
if __name__ == "__main__":
rows = get_top_customers_orm("postgresql+psycopg2://user:pass@localhost/mydb")
for name, email, total in rows:
print(f"{name} <{email}>: {total}")
```
---
## 3. SQLAlchemy Core Version
```python
from datetime import datetime, timedelta
from decimal import Decimal
from typing import List, Tuple
from sqlalchemy import (
MetaData, Table, Column, Integer, String, Numeric, DateTime, ForeignKey,
create_engine, select, func
)
metadata = MetaData()
customers = Table(
"customers", metadata,
Column("id", Integer, primary_key=True),
Column("name", String, nullable=False),
Column("email", String, nullable=False),
Column("country", String, nullable=False),
)
orders = Table(
"orders", metadata,
Column("id", Integer, primary_key=True),
Column("customer_id", Integer, ForeignKey("customers.id"), nullable=False),
Column("total_amount", Numeric(12, 2), nullable=False),
Column("created_at", DateTime, nullable=False),
Column("status", String, nullable=False),
)
def get_top_customers_core(
db_url: str,
countries: Tuple[str, ...] = ("Germany", "France"),
days: int = 90,
limit: int = 10,
) -> List[Tuple[str, str, Decimal]]:
engine = create_engine(db_url)
cutoff = datetime.utcnow() - timedelta(days=days)
total_spent = func.sum(orders.c.total_amount).label("total_spent")
stmt = (
select(customers.c.name, customers.c.email, total_spent)
.select_from(customers.join(orders, orders.c.customer_id == customers.c.id))
.where(
customers.c.country.in_(countries),
orders.c.status == "completed",
orders.c.created_at >= cutoff,
)
.group_by(customers.c.id, customers.c.name, customers.c.email)
.order_by(total_spent.desc())
.limit(limit)
)
with engine.connect() as conn:
result = conn.execute(stmt)
return [tuple(row) for row in result.all()]
if __name__ == "__main__":
for row in get_top_customers_core("postgresql+psycopg2://user:pass@localhost/mydb"):
print(row)
```
---
## 4. Raw SQL Version
```python
from datetime import datetime, timedelta
from decimal import Decimal
from typing import List, Tuple
from sqlalchemy import create_engine, text
SQL_TOP_CUSTOMERS = """
SELECT
c.name,
c.email,
SUM(o.total_amount) AS total_spent
FROM customers AS c
JOIN orders AS o ON o.customer_id = c.id
WHERE c.country = ANY(:countries)
AND o.status = :status
AND o.created_at >= :cutoff
Generate SQLAlchemy and Raw SQL Queries with AI
Tested prompts for ai write sql query in python compared across 5 leading AI models.
If you typed 'ai write sql query in python' into Google, you are probably staring at a blank file or a half-broken query and want to move faster. Whether you need a raw SQL string to pass through psycopg2, a full SQLAlchemy ORM query with joins and filters, or a parameterized statement safe for production, AI can draft it in seconds once you give it the right context.
The core problem is that SQL dialects differ, SQLAlchemy's ORM syntax is verbose, and getting joins, subqueries, or window functions right from memory wastes time. AI models have seen enough Python database code to generate working drafts for PostgreSQL, MySQL, SQLite, and more, including both the query logic and the surrounding Python boilerplate.
This page shows you exactly which prompt structure gets the best output, how four leading models compare on a realistic SQL generation task, and what mistakes cause AI-generated queries to fail silently or return wrong results. Use the examples below as drop-in templates for your own schema.
When to use this
This approach works best when you have a clear schema and can describe the data transformation you need in plain English. It saves the most time on multi-table joins, aggregations with GROUP BY and HAVING, window functions, and SQLAlchemy ORM queries where the method-chaining syntax is easy to get wrong.
- Translating a business requirement into a SQLAlchemy ORM query against an existing schema you can describe
- Writing parameterized raw SQL for psycopg2 or sqlite3 when you know the table structure but not the exact syntax
- Generating migration-safe queries that need to run on multiple database backends
- Converting a working raw SQL query into its SQLAlchemy equivalent (or the reverse)
- Scaffolding repetitive CRUD queries across many models in a new project
When this format breaks down
- When your schema is undocumented or highly complex: the AI will hallucinate column names and relationships it was not given, producing queries that fail at runtime with no obvious error.
- For queries that touch sensitive data and need security review: AI-generated SQL can miss injection vectors or generate overly permissive WHERE clauses that expose unintended rows.
- When performance is the primary concern: AI drafts correct queries, not optimal ones. Indexes, query plans, and execution cost require profiling with real data, not AI output.
- When the query logic depends on business rules stored in code elsewhere: the AI does not have access to your application context and will make assumptions that silently return wrong results.
The prompt we tested
You are an expert Python developer specializing in database interactions using SQLAlchemy and raw SQL. Your task is to generate clean, efficient, and production-ready Python code that executes SQL queries based on the user's requirements. Output runnable Python code in fenced code blocks with necessary imports and type hints. Always use parameterized queries (never f-strings or string concatenation for user input) and include proper session/connection cleanup. Keep explanations concise—prioritize working code over prose. User's database query requirement: I have a PostgreSQL database with an 'orders' table (id, customer_id, total_amount, created_at, status) and a 'customers' table (id, name, email, country). I need to find the top 10 customers by total spending in the last 90 days, but only for customers in Germany or France whose orders have status='completed'. Return their name, email, and total spent. Provide your response in this exact structure: 1. **Schema Assumptions**: Briefly list any table/column assumptions you're making. 2. **SQLAlchemy ORM Version**: Complete Python code using SQLAlchemy ORM (with model definitions if needed). 3. **SQLAlchemy Core Version**: Complete Python code using SQLAlchemy Core expressions. 4. **Raw SQL Version**: Complete Python code using raw SQL with parameterized queries (using sqlite3, psycopg2, or SQLAlchemy's text()). 5. **Explanation**: 2-3 bullet points explaining the query logic and any performance considerations (indexes, N+1 issues, etc.). Ensure all queries use parameterized inputs to prevent SQL injection, include proper connection/session handling with context managers, and follow PEP 8 style.
Model comparison
| Model | Quality | Cost | Latency | Tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | 8 | $0.1622 | 18.5s | 580 / 2047 |
| Claude Haiku 4.5 | 7 | $0.0106 | 9.7s | 399 / 2048 |
| GPT-5.4 | 8 | $0.0273 | 18.8s | 353 / 1764 |
| Gemini 2.5 Pro | 2 | $0.0012 | 16.9s | 351 / 157 |
| Grok 4.1 Fast Reasoning | 8 | $0.0007 | 32.9s | 488 / 1207 |
Outputs
# Top 10 Customers by Spending Query
## 1. Schema Assumptions
- `orders` table: `id` (PK), `customer_id` (FK), `total_amount` (DECIMAL/NUMERIC), `created_at` (TIMESTAMP), `status` (VARCHAR)
- `customers` table: `id` (PK), `name` (VARCHAR), `email` (VARCHAR), `country` (VARCHAR)
- `created_at` is stored in UTC or database timezone
- `status` is a string column with value `'completed'`
- Both tables have standard integer primary keys
## 2. SQLAlchemy ORM Version
```python
from datetime import datetime, timedelta
from typing import List, Tuple
from sqlalchemy import create_engine, func
from sqlalchemy.orm import declarative_base, Session
from sqlalchemy.sql import and_
# Model definitions
Base = declarative_base()
from sqlalchemy import Column, Integer, String, Numeric, DateTime, ForeignKey
from sqlalchemy.orm import relationship
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
email = Column(String, nullable=False)
country = Column(String, nullable=False)
orders = relationship('Order', back_populates='customer')
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey('customers.id'), nullable=False)
total_amount = Column(Numeric, nullable=False)
created_at = Column(DateTime, nullable=False)
status = Column(String, nullable=False)
customer = relationship('Customer', back_populates='orders')
def get_top_customers_orm(db_url: str) -> List[Tuple[str, str, float]]:
"""
Fetch top 10 customers by spending in last 90 days using SQLAlchemy ORM.
Returns list of tuples: (name, email, total_spent)
"""
engine = create_engine(db_url)
# Calculate date 90 days ago
ninety_days_ago = datetime.utcnow() - timedelta(days=90)
with Session(engine) as session:
results = session.query(
Customer.name,
Customer.email,
func.sum(Order.total_amount).label('total_spent')
).join(
Order, Customer.id == Order.customer_id
).filter(
and_(
Order.status == 'completed',
Order.created_at >= ninety_days_ago,
Customer.country.in_(['Germany', 'France'])
)
).group_by(
Customer.id,
Customer.name,
Customer.email
).order_by(
func.sum(Order.total_amount).desc()
).limit(10).all()
return [(name, email, float(total)) for name, email, total in results]
if __name__ == '__main__':
# Example usage
db_url = 'postgresql://user:password@localhost:5432/dbname'
top_customers = get_top_customers_orm(db_url)
for name, email, total_spent in top_customers:
print(f"{name} ({email}): ${total_spent:.2f}")
```
## 3. SQLAlchemy Core Version
```python
from datetime import datetime, timedelta
from typing import List, Tuple
from sqlalchemy import (
create_engine, Table, Column, Integer, String, Numeric, DateTime,
ForeignKey, select, func, and_, MetaData
)
def get_top_customers_core(db_url: str) -> List[Tuple[str, str, float]]:
"""
Fetch top 10 customers by spending in last 90 days using SQLAlchemy Core.
Returns list of tuples: (name, email, total_spent)
"""
engine = create_engine(db_url)
metadata = MetaData()
# Define tables
customers = Table(
'customers',
metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('email', String),
Column('country', String),
)
orders = Table(
'orders',
metadata,
Column('id', Integer, primary_key=True),
Column('customer_id', Integer, ForeignKey('customers.id')),
Column('total_amount', Numeric),
Column('created_at', DateTime),
Column('status', String),
)
# Calculate date 90 days ago
ninety_days_ago = datetime.utcnow() - timedelta(days=90)
# Build query
stmt = select(
customers.c.name,
customers.c.email,
func.sum(orders.c.total_amount).label('total_spent')
).select_from(
customers.join(orders, customers.c.id == orders.c.customer_id)
).where(
and_(
orders.c.status == 'completed',
orders.c.created_at >= ninety_days_ago,
customers.c.country.in_(['Germany', 'France'])
)
).group_by(
customers.c.id,
customers.c.name,
customers.c.email
).order_by(
func.sum(orders.c.total_amount).desc()
).limit(10)
with engine.connect() as conn:
results = conn.execute(stmt).fetchall()
return [(name, email, float(total)) for name, email, total in results]
if __name__ == '__main__':
# Example usage
db_url = 'postgresql://user:password@localhost:5432/dbname'
top_customers = get_top_customers_core(db_url)
for name, email, total_spent in top_customers:
print(f"{name} ({email}): ${total_spent:.2f}")
```
## 4. Raw SQL Version
```python
from datetime import datetime, timedelta
from typing import List, Tuple
from sqlalchemy import create_engine, text
def get_top_customers_raw_sql(db_url: str) -> List[Tuple[str, str, float]]:
"""
Fetch top 10 customers by spending in last 90 days using raw SQL.
Returns list of tuples: (name, email, total_spent)
"""
engine = create_engine(db_url)
# Calculate date 90 days ago
ninety_days_ago = datetime.utcnow() - timedelta(days=90)
query = text("""
SELECT
c.name,
c.email,
SUM(o.total_amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE
o.status = :status
AND o.created_at >= :cutoff_date
AND c.country IN (:country1, :country2)
GROUP BY c.id, c.name, c.email
ORDER BY total_spent DESC
LIMIT :limit;
""")
with engine.connect() as conn:
results = conn.execute(
query,
{
'status': 'completed',
'cutoff_date': ninety_days_ago,
'country1': 'Germany',
'country2': 'France',
'limit': 10
}
).fetchall()
return [(name, email, float(total)) for name, email, total in results]
if __name__ == '__main__':
# Example usage
db_url = 'postgresql://user:password@localhost:5432/dbname'
top_customers = get_top_customers_raw_sql(db_url)
for name, email, total_spent in top_customers:
print(f"{name} ({email}): ${total_spent:.2f}")
```
## 5. Explanation
- **Query Logic**: The query performs an
1. **Schema Assumptions**:
- `orders.id` and `customers.id` are primary keys.
- `orders.customer_id` is a foreign key to `customers.id`.
- `orders.total_amount` is a numeric/decimal column.
- `orders.created_at` is a timestamp/datetime column stored in PostgreSQL.
- `orders.status` contains values like `'completed'`.
- `customers.country` contains country names like `'Germany'` and `'France'`.
2. **SQLAlchemy ORM Version**:
```python
from __future__ import annotations
from decimal import Decimal
from typing import List, TypedDict
from sqlalchemy import DateTime, ForeignKey, Numeric, String, create_engine, func, select
from sqlalchemy.orm import DeclarativeBase, Mapped, Session, mapped_column, relationship
DATABASE_URL = "postgresql+psycopg2://username:password@localhost:5432/your_database"
class Base(DeclarativeBase):
pass
class Customer(Base):
__tablename__ = "customers"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String, nullable=False)
email: Mapped[str] = mapped_column(String, nullable=False)
country: Mapped[str] = mapped_column(String, nullable=False)
orders: Mapped[list["Order"]] = relationship(back_populates="customer")
class Order(Base):
__tablename__ = "orders"
id: Mapped[int] = mapped_column(primary_key=True)
customer_id: Mapped[int] = mapped_column(ForeignKey("customers.id"), nullable=False)
total_amount: Mapped[Decimal] = mapped_column(Numeric(12, 2), nullable=False)
created_at: Mapped[DateTime] = mapped_column(DateTime(timezone=True), nullable=False)
status: Mapped[str] = mapped_column(String, nullable=False)
customer: Mapped[Customer] = relationship(back_populates="orders")
class CustomerSpendingResult(TypedDict):
name: str
email: str
total_spent: Decimal
def get_top_customers_orm(database_url: str) -> List[CustomerSpendingResult]:
engine = create_engine(database_url, future=True)
countries = ["Germany", "France"]
status = "completed"
days_back = 90
limit = 10
stmt = (
select(
Customer.name,
Customer.email,
func.sum(Order.total_amount).label("total_spent"),
)
.join(Order, Order.customer_id == Customer.id)
.where(Customer.country.in_(countries))
.where(Order.status == status)
.where(Order.created_at >= func.now() - func.make_interval(days=days_back))
.group_by(Customer.id, Customer.name, Customer.email)
.order_by(func.sum(Order.total_amount).desc())
.limit(limit)
)
with Session(engine) as session:
rows = session.execute(stmt).all()
return [
{
"name": row.name,
"email": row.email,
"total_spent": row.total_spent,
}
for row in rows
]
if __name__ == "__main__":
results = get_top_customers_orm(DATABASE_URL)
for row in results:
print(row)
```
3. **SQLAlchemy Core Version**:
```python
from __future__ import annotations
from decimal import Decimal
from typing import List, TypedDict
from sqlalchemy import (
Column,
DateTime,
ForeignKey,
Integer,
MetaData,
Numeric,
String,
Table,
create_engine,
func,
select,
)
DATABASE_URL = "postgresql+psycopg2://username:password@localhost:5432/your_database"
metadata = MetaData()
customers = Table(
"customers",
metadata,
Column("id", Integer, primary_key=True),
Column("name", String, nullable=False),
Column("email", String, nullable=False),
Column("country", String, nullable=False),
)
orders = Table(
"orders",
metadata,
Column("id", Integer, primary_key=True),
Column("customer_id", ForeignKey("customers.id"), nullable=False),
Column("total_amount", Numeric(12, 2), nullable=False),
Column("created_at", DateTime(timezone=True), nullable=False),
Column("status", String, nullable=False),
)
class CustomerSpendingResult(TypedDict):
name: str
email: str
total_spent: Decimal
def get_top_customers_core(database_url: str) -> List[CustomerSpendingResult]:
engine = create_engine(database_url, future=True)
countries = ["Germany", "France"]
status = "completed"
days_back = 90
limit = 10
total_spent = func.sum(orders.c.total_amount).label("total_spent")
stmt = (
select(customers.c.name, customers.c.email, total_spent)
.select_from(
customers.join(orders, orders.c.customer_id == customers.c.id)
)
.where(customers.c.country.in_(countries))
.where(orders.c.status == status)
.where(orders.c.created_at >= func.now() - func.make_interval(days=days_back))
.group_by(customers.c.id, customers.c.name, customers.c.email)
.order_by(total_spent.desc())
.limit(limit)
)
with engine.connect() as conn:
rows = conn.execute(stmt).mappings().all()
return [
{
"name": row["name"],
"email": row["email"],
"total_spent": row["total_spent"],
}
for row in rows
]
if __name__ == "__main__":
results = get_top_customers_core(DATABASE_URL)
for row in results:
print(row)
```
4. **Raw SQL Version**:
```python
from __future__ import annotations
from decimal import Decimal
from typing import List, TypedDict
from sqlalchemy import create_engine, text
DATABASE_URL = "postgresql+psycopg2://username:password@localhost:5432/your_database"
class CustomerSpendingResult(TypedDict):
name: str
email: str
total_spent: Decimal
def get_top_customers_raw_sql(database_url: str) -> List[CustomerSpendingResult]:
engine = create_engine(database_url, future=True)
query = text(
"""
SELECT
c.name,
c.email,
SUM(o.total_amount) AS total_spent
FROM customers AS c
JOIN orders AS o
ON o.customer_id = c.id
WHERE c.country IN (:country1, :country2)
AND o.status = :status
AND o.created_at >= NOW() - make_interval(days => :days_back)
GROUP BY c.id, c.name, c.email
ORDER BY total_spent DESC
LIMIT :limit_value
"""
)
params = {
"country1": "Germany",
"country2": "France",
"status": "completed",
"days_back": 90,
"limit_value": 10,
}
with engine.connect() as conn:
rows = conn.execute(query, params).mappings().all()
return [
{
"name": row["name"],
"email": row["email"],
"total_spent": row["total_spent"],
}
for row in rows
]
if __name__ == "__main__":
results = get_top_customers_raw_sql(DATABASE_URL)
for row in results:
print(row)
```
5. **Explanation**:
- The query joins `customers` and `orders`, filters to `country IN ('Germany', 'France')`, `status = 'completed'`, and `created_at` within the last 90 days, then aggregates with `SUM(total_amount)` per customer.
- Results are grouped by customer and sorted by total spending descending, returning only the top 10 customers.
- For performance, consider indexes such as `orders(customer_id)`, `orders(status, created_at)`, and `customers(country)`. Since aggregation is done in a single query, there is no N+1 query issue.
### 1. Schema Assumptions * `customers.id` is the primary key for the `customers` table. * `orders.id` is the primary key for the `orders` table. * `orders.customer_id` is a foreign key referencing `customers.id`. * `orders.total_amount` is a numeric type (e.g., `DECIMAL` or `FLOAT`). * `orders.created_at` is a timestamp or datetime type (e.g., `TIMESTAMP WITH TIME ZONE`). --- ### 2. SQLAlchemy ORM Version ```python import os from datetime import datetime, timedelta, timezone from typing import List, Tuple, Any from sqlalchemy import
1. **Schema Assumptions**:
- `customers`: `id` (Integer PK), `name` (String), `email` (String), `country` (String).
- `orders`: `id` (Integer PK), `customer_id` (Integer FK to `customers.id`), `total_amount` (Numeric), `created_at` (DateTime), `status` (String).
- Indexes recommended: `orders(customer_id, created_at, status)`, `customers(country)`.
2. **SQLAlchemy ORM Version**:
```python
from datetime import datetime, timedelta
from typing import List, Tuple
from sqlalchemy import create_engine, Column, Integer, String, Numeric, DateTime, ForeignKey, func, desc
from sqlalchemy.orm import declarative_base, relationship, sessionmaker
from sqlalchemy.orm import Session
# Replace with your PostgreSQL connection string
engine = create_engine("postgresql+psycopg2://user:pass@localhost/dbname")
Base = declarative_base()
class Customer(Base):
__tablename__ = "customers"
id = Column(Integer, primary_key=True)
name = Column(String)
email = Column(String)
country = Column(String)
orders = relationship("Order", back_populates="customer")
class Order(Base):
__tablename__ = "orders"
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey("customers.id"))
total_amount = Column(Numeric(10, 2))
created_at = Column(DateTime)
status = Column(String)
customer = relationship("Customer", back_populates="orders")
SessionLocal = sessionmaker(bind=engine)
date_filter: datetime = datetime.now() - timedelta(days=90)
with SessionLocal() as session:
result: List[Tuple[str, str, float]] = (
session.query(
Customer.name,
Customer.email,
func.sum(Order.total_amount).label("total_spent"),
)
.join(Order)
.filter(
Customer.country.in_(["Germany", "France"]),
Order.created_at >= date_filter,
Order.status == "completed",
)
.group_by(Customer.id, Customer.name, Customer.email)
.order_by(desc("total_spent"))
.limit(10)
.all()
)
print(result) # List of (name, email, total_spent) tuples
```
3. **SQLAlchemy Core Version**:
```python
from datetime import datetime, timedelta
from typing import List, Tuple
from sqlalchemy import create_engine, Table, Column, Integer, String, Numeric, DateTime, MetaData, select, func, desc
# Replace with your PostgreSQL connection string
engine = create_engine("postgresql+psycopg2://user:pass@localhost/dbname")
metadata = MetaData()
customers = Table(
"customers",
metadata,
Column("id", Integer, primary_key=True),
Column("name", String),
Column("email", String),
Column("country", String),
)
orders = Table(
"orders",
metadata,
Column("id", Integer, primary_key=True),
Column("customer_id", Integer),
Column("total_amount", Numeric(10, 2)),
Column("created_at", DateTime),
Column("status", String),
)
date_filter: datetime = datetime.now() - timedelta(days=90)
stmt = (
select(
customers.c.name,
customers.c.email,
func.sum(orders.c.total_amount).label("total_spent"),
)
.select_from(customers.join(orders, customers.c.id == orders.c.customer_id))
.where(
customers.c.country.in_(["Germany", "France"]),
orders.c.created_at >= date_filter,
orders.c.status == "completed",
)
.group_by(customers.c.id, customers.c.name, customers.c.email)
.order_by(desc("total_spent"))
.limit(10)
)
with engine.connect() as conn:
result: List[Tuple[str, str, float]] = conn.execute(stmt).fetchall()
print(result) # List of (name, email, total_spent) tuples
```
4. **Raw SQL Version**:
```python
from datetime import datetime, timedelta
from typing import List, Tuple
from sqlalchemy import create_engine, text
# Replace with your PostgreSQL connection string
engine = create_engine("postgresql+psycopg2://user:pass@localhost/dbname")
date_filter: datetime = datetime.now() - timedelta(days=90)
query = text(
"""
SELECT c.name, c.email, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country IN ('Germany', 'France')
AND o.created_at >= :date_filter
AND o.status = 'completed'
GROUP BY c.id, c.name, c.email
ORDER BY total_spent DESC
LIMIT 10
"""
)
with engine.connect() as conn:
result: List[Tuple[str, str, float]] = conn.execute(query, {"date_filter": date_filter}).fetchall()
print(result) # List of (name, email, total_spent) tuples
```
5. **Explanation**:
- Joins `customers` to `orders`, filters by `country IN ('Germany', 'France')`, `status='completed'`, and `created_at >= 90 days ago`; groups by customer identifiers, sums `total_amount`, sorts descending, limits to 10.
- Parameterized `:date_filter` prevents injection; single query avoids N+1 issues.
- Performance: Leverage indexes on `orders(created_at, status, customer_id)` and `customers(country)` for fast filtering/joins/grouping on potentially large tables.
What makes these work
-
01Always include your schema
The single biggest factor in output quality is giving the model your actual table and column names. Paste a CREATE TABLE statement or a short column list directly into the prompt. Without it, the model invents plausible-sounding names that will not match your database.
-
02Specify the library and version
SQLAlchemy 1.x and 2.x have different query syntax, and raw SQL for PostgreSQL differs from SQLite. State 'SQLAlchemy 2.0 select() style' or 'psycopg2 parameterized query' explicitly. Vague prompts produce a mix of styles that may not run without editing.
-
03Ask for parameterized output by default
Tell the model to use bound parameters rather than string interpolation. This keeps the output safe to use in production and avoids the habit of formatting user input directly into query strings, which is a SQL injection risk even in internal tooling.
-
04Request the surrounding Python boilerplate
Ask for the full executable snippet including imports, session or connection usage, and result handling. A query string alone is half the answer. When you ask for the complete working code block, you can copy it directly into your file and test it immediately.
More example scenarios
Write a SQLAlchemy ORM query using Python. Tables: Order (id, customer_id, created_at, total_amount) and Customer (id, email, name). Return the top 10 customers by total spend in the last 90 days. Use SQLAlchemy 2.0 style with select().
from sqlalchemy import select, func
from datetime import datetime, timedelta
cutoff = datetime.utcnow() - timedelta(days=90)
stmt = (
select(Customer.id, Customer.name, Customer.email, func.sum(Order.total_amount).label('total_spend'))
.join(Order, Order.customer_id == Customer.id)
.where(Order.created_at >= cutoff)
.group_by(Customer.id, Customer.name, Customer.email)
.order_by(func.sum(Order.total_amount).desc())
.limit(10)
)
results = session.execute(stmt).all()
Write raw SQL for PostgreSQL using Python's psycopg2. Table: events (user_id, event_type, occurred_at). Calculate monthly active users for each of the last 6 months, where active means at least one event. Return month, user_count. Use a parameterized query.
query = """
SELECT DATE_TRUNC('month', occurred_at) AS month,
COUNT(DISTINCT user_id) AS user_count
FROM events
WHERE occurred_at >= NOW() - INTERVAL '6 months'
GROUP BY DATE_TRUNC('month', occurred_at)
ORDER BY month DESC;
"""
cursor.execute(query)
Write a SQLAlchemy Core query in Python. Tables: employees (id, name, department_id), documents (id, employee_id, doc_type). Find all employees who do NOT have a document with doc_type='tax_form'. Return employee id and name.
from sqlalchemy import select, not_, exists
has_tax_form = (
select(documents.c.id)
.where(documents.c.employee_id == employees.c.id)
.where(documents.c.doc_type == 'tax_form')
.correlate(employees)
)
stmt = select(employees.c.id, employees.c.name).where(not_(exists(has_tax_form)))
results = conn.execute(stmt).fetchall()
Write a raw SQL query for SQLite in Python using the sqlite3 module. Table: stock_movements (id, product_id, quantity, movement_type ['in','out'], moved_at). Return each product_id with its current stock balance using a CTE.
query = """
WITH movements AS (
SELECT product_id,
SUM(CASE WHEN movement_type = 'in' THEN quantity ELSE -quantity END) AS balance
FROM stock_movements
GROUP BY product_id
)
SELECT product_id, balance FROM movements ORDER BY product_id;
"""
conn.execute(query).fetchall()
Write a parameterized SQLAlchemy 2.0 query in Python. Table: tickets (id, tenant_id, status, created_at, priority). Return open tickets for a given tenant_id, ordered by priority descending then created_at ascending. Priority values are 1-5.
from sqlalchemy import select
stmt = (
select(Ticket)
.where(Ticket.tenant_id == tenant_id)
.where(Ticket.status == 'open')
.order_by(Ticket.priority.desc(), Ticket.created_at.asc())
)
results = session.execute(stmt, {"tenant_id": tenant_id}).scalars().all()
Common mistakes to avoid
-
Omitting the database dialect
Not specifying PostgreSQL, MySQL, or SQLite causes the model to write generic SQL that may use unsupported functions. For example, DATE_TRUNC works in PostgreSQL but not SQLite. Always name your database backend in the prompt.
-
Accepting hallucinated column names
If you describe your schema loosely, the model fills gaps with guesses. The query runs without a syntax error but raises an OperationalError or returns no rows. Always verify column names against your actual schema before running the output.
-
Skipping the N+1 check on ORM queries
AI-generated SQLAlchemy ORM queries often load related objects lazily, causing N+1 query problems that only appear under real data volume. Check whether the output uses joinedload or selectinload for relationships you access in the result loop.
-
Using string formatting instead of parameters
Some model outputs build query strings with f-strings or .format() for convenience. This is unsafe when any variable comes from user input. Reject any output that interpolates variables directly into a SQL string and ask the model to rewrite it with bound parameters.
-
Not testing with edge-case data
AI queries are drafted against your description, not your actual data. NULL values, empty sets, and duplicate rows expose logic errors that the model had no way to anticipate. Run the generated query against a sample that includes these cases before deploying.
Related queries
Frequently asked questions
Can AI generate SQLAlchemy ORM queries from a plain English description?
Yes, and it works well when you provide your model class definitions or table schema alongside the description. The model maps your column names to the correct ORM attributes and chains .where(), .join(), and .order_by() calls accurately. Without the schema, expect column name errors that require manual correction.
Is it safe to use AI-generated SQL in a production Python app?
It is safe as a starting point if you review the output before deploying. Specifically check that variables are passed as bound parameters, not interpolated into strings, and that WHERE clauses are restrictive enough for your access control requirements. Treat AI output as a senior developer's first draft, not as reviewed, production-ready code.
What is the best way to prompt AI to write a complex JOIN query in Python?
Include the CREATE TABLE statements or column lists for every table involved, describe the join condition in plain English, and state the exact output columns you need. Adding a sentence like 'use SQLAlchemy 2.0 with select() and explicit join conditions' removes ambiguity about syntax style and produces cleaner output.
Can I use AI to convert raw SQL into a SQLAlchemy query?
Yes, this is one of the cleanest use cases. Paste the raw SQL, state your SQLAlchemy version, and ask for the equivalent ORM or Core query. The model handles most standard SQL patterns accurately. Complex subqueries and database-specific functions occasionally need manual adjustment.
Which AI model writes the best SQL for Python, ChatGPT or Claude?
Both perform well on standard queries. The comparison table on this page shows how they differ on edge cases like window functions, CTEs, and ORM relationship loading. The prompt quality matters more than the model choice for most practical use cases.
How do I get AI to write a query that avoids SQL injection in Python?
Include the instruction 'use parameterized queries, do not interpolate variables into the SQL string' explicitly in your prompt. Follow up by checking the output for any f-string or .format() usage inside the query string. Both psycopg2 and SQLAlchemy support bound parameters natively and the model knows the correct syntax for each.
Try it with a real tool
Run this prompt in one of these tools. Affiliate links help keep Gridlyx free.