Toasty Architecture Overview
Project Structure
Toasty is an ORM for Rust that supports SQL and NoSQL databases. The codebase is a Cargo workspace with separate crates for each layer.
Crates
1. toasty
User-facing crate with query engine and runtime.
Key Components:
engine/: Multi-phase query compilation and execution pipeline- See Query Engine Architecture for detailed documentation
stmt/: Typed statement builders (wrappers aroundtoasty_core::stmttypes)relation/: Relationship abstractions (HasMany, BelongsTo, HasOne)model.rs: Model trait and ID generation
Query Execution Pipeline (high-level):
Statement AST → Simplify → Lower → Plan → Execute → Results
The engine compiles queries into a mini-program of actions executed by an interpreter. For details on HIR, MIR, and the full compilation pipeline, see Query Engine Architecture.
2. toasty-core
Shared types used by all other crates: schema representations, statement AST, and driver interface.
Key Components:
schema/: Model and database schema definitionsapp/: Model-level definitions (fields, relations, constraints)db/: Database-level table and column definitionsmapping/: Maps between models and database tablesbuilder/: Schema construction utilitiesverify/: Schema validation
stmt/: Statement AST nodes for queries, inserts, updates, deletesdriver/: Driver interface, capabilities, and operations
3. toasty-macros (code generation)
The toasty-macros crate contains both the proc-macro entry points and the code generation logic. It generates Rust code from the #[derive(Model)] and #[derive(Embed)] macros.
Key Components:
schema/: Parses model attributes into schema representationexpand/: Generates implementations for modelsmodel.rs: Model trait implementationquery.rs: Query builder methodscreate.rs: Create/insert buildersupdate.rs: Update buildersrelation.rs: Relationship methodsfields.rs: Field accessorsfilters.rs: Filter method generationschema.rs: Runtime schema generation
4. toasty-driver-*
Database-specific driver implementations.
Supported Databases:
toasty-driver-sqlite: SQLite implementationtoasty-driver-postgresql: PostgreSQL implementationtoasty-driver-mysql: MySQL implementationtoasty-driver-dynamodb: DynamoDB implementation
5. toasty-sql
Converts statement AST to SQL strings. Used by SQL-based drivers.
Key Components:
serializer/: SQL generation with dialect supportflavor.rs: Database-specific SQL dialectsstatement.rs: Statement serializationexpr.rs: Expression serializationty.rs: Type serialization
stmt/: SQL-specific statement types
Further Reading
- Query Engine Architecture - Query compilation and execution pipeline
- Type System - Type system design and conversions
Toasty Query Engine
This document provides a high-level overview of the Toasty query execution engine for developers working on engine internals. It describes the multi-phase pipeline that transforms user queries into database operations.
Overview
The Toasty engine is a multi-database query compiler and runtime that executes ORM operations across SQL and NoSQL databases. It transforms a user’s query (represented as a Statement AST) into a sequence of executable actions through multiple compilation phases.
Execution Model
The final output is a mini program executed by an interpreter. Think of it like a small virtual machine or bytecode interpreter, though there is no control flow (yet):
- Instructions (Actions): Operations like “execute this SQL”, “filter these results”, “merge child records into parents”
- Variables: Storage slots, or registers, that hold intermediate results between instructions
- Linear Execution: Instructions run in sequence (no control flow - no branches or loops, yet). Eventually, the interpreter will be smart about concurrency and execute independent operations in parallel when possible.
- Interpreter: The engine executor reads each instruction, fetches inputs from variables, performs the operation, and stores outputs back to variables
For example, loading users with their todos:
SELECT users.id, users.name, (
SELECT todos.id, todos.title
FROM todos
WHERE todos.user_id = users.id
) FROM users WHERE ...
compiles to a program like:
$0 = ExecSQL("SELECT * FROM users WHERE ...")
$1 = ExecSQL("SELECT * FROM todos WHERE user_id IN ...")
$2 = NestedMerge($0, $1, by: user_id)
return $2
The compilation pipeline below transforms user queries into this instruction/variable representation. Each phase brings the query closer to this final executable form.
Compilation Pipeline
User Query (Statement AST)
↓
[Verification] - Validate statement structure (debug builds only)
↓
[Simplification] - Normalize and optimize the statement AST
↓
[Lowering] - Convert to HIR for dependency analysis
↓
[Planning] - Build MIR operation graph
↓
[Execution Planning] - Convert to action sequence with variables
↓
[Execution] - Run actions against database driver
↓
Result Stream
Phase 1: Simplification
Location: engine/simplify.rs
The simplification phase normalizes and optimizes the statement AST before planning.
Key Transformations
- Association Rewriting: Converts relationship navigation (e.g.,
user.todos()) into explicit subqueries with foreign key filters - Subquery Lifting: Transforms
IN (SELECT ...)expressions into more efficient join-like operations - Expression Normalization: Simplifies complex expressions (e.g., flattening nested ANDs/ORs, constant folding)
- Path Expression Rewriting: Resolves field paths and relationship traversals into explicit column references
- Empty Query Detection: Identifies queries that will return no results
Example: Association Simplification
#![allow(unused)]
fn main() {
// user.todos().delete() generates:
Delete {
from: Todo,
via: User::todos, // relationship traversal
...
}
// After simplification:
Delete {
from: Todo,
filter: todo.user_id IN (SELECT id FROM users WHERE ...)
}
}
Converting relationship navigation into explicit filters early means downstream phases only need to handle standard query patterns with filters and subqueries - no special-case logic for each relationship type.
Phase 2: Lowering
Location: engine/lower.rs
Lowering converts a simplified statement into HIR (High-level Intermediate Representation) - a collection of related statements with tracked dependencies.
Toasty tries to maximize what the target database can handle natively, only decomposing queries when necessary. For example, a query like User::find_by_name("John").todos().all() contains a subquery. SQL databases can execute this as SELECT * FROM todos WHERE user_id IN (SELECT id FROM users WHERE name = 'John'). DynamoDB cannot handle subqueries, so lowering splits this into two statements: first fetch user IDs, then query todos with those IDs.
The HIR tracks a dependency graph between statements - which statements depend on results from others, and which columns flow between them. This graph can contain cycles when preloading associations. For example:
SELECT users.id, users.name, (
SELECT todos.id, todos.title
FROM todos
WHERE todos.user_id = users.id
) FROM users WHERE ...
The users query must execute first to provide IDs for the todos subquery, but the todos results must be merged back into the user records. This creates a cycle: users → todos → users.
This lowering phase handles:
- Statement Decomposition: Breaking queries into sub-statements when the database can’t handle them directly
- Dependency Tracking: Which statements must execute before others
- Argument Extraction: Identifying values passed between statements (e.g., a loaded model’s ID used in a child query’s filter)
- Relationship Handling: Processing relationship loads and nested queries
Lowering Algorithm
Lowering transforms model-level statements to table-level statements through a visitor pattern that rewrites each part of the statement AST:
- Table Resolution:
InsertTarget::Model,UpdateTarget::Model, etc. become their corresponding table references - Returning Clause Transformation:
Returning::Modelis replaced withReturning::Exprcontaining the expanded column expressions - Field Reference Resolution: Model field references are converted to table column references
- Include Expansion: Association includes become subqueries in the returning clause
The TableToModel mapping (built during schema construction) drives the transformation. It contains an expression for each model field that maps to its corresponding table column(s). This supports more than a 1-1 mapping—a model field can be derived from multiple columns or a column can map to multiple fields. Association fields are initialized to Null in this mapping.
When lowering encounters a Returning::Model { include } clause:
- Call
table_to_model.lower_returning_model()to get the base column expressions - For each path in the include list, call
build_include_subquery()to generate a subquery that selects the associated records - Replace the
Nullplaceholder in the returning expression with the generated subquery
Lowering Examples
Example 1: Simple query
Given a model with a renamed column:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key] #[auto] id: u64,
#[column(name = "first_and_last_name")]
name: String,
email: String,
}
}
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Note: At model-level, no specific fields are selected
// After lowering
SELECT id, first_and_last_name, email FROM users WHERE id = ?
}
Example 2: Query with association
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
INCLUDE todos
// After lowering
SELECT id, first_and_last_name, email, (
SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
}
Phase 3: Planning
Location: engine/plan.rs
Planning converts HIR into MIR (Middle-level Intermediate Representation) - a directed acyclic graph of operations, both database queries and in-memory transformations. Edges represent data dependencies: an operation cannot execute until all operations it depends on have completed and produced their results.
Since the HIR graph can contain cycles, planning must break them to produce a DAG. This is done by introducing intermediate operations that batch-load data and merge results (e.g., NestedMerge).
Operation Types
The MIR supports various operation types (see engine/mir.rs for details):
SQL operations:
ExecStatement- Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)ReadModifyWrite- Optimistic locking (read, modify, conditional write). Exists as a separate operation because the read result must be processed in-memory to compute the write, whichExecStatementcannot express.
Key-value operations (NoSQL):
GetByKey,DeleteByKey,UpdateByKey- Direct key accessQueryPk,FindPkByIndex- Key lookups via queries or indexes
In-memory operations:
Filter,Project- Transform and filter resultsNestedMerge- Merge child records into parent recordsConst- Constant values
Phase 4: Execution Planning
Location: engine/plan/execution.rs
Execution planning converts the MIR logical plan into a concrete sequence of actions that can be executed. This phase:
- Assigns variable slots for storing intermediate results
- Converts each MIR operation into an execution action
- Maintains topological ordering to ensure dependencies execute first
Action Types
Actions mirror MIR operations but include concrete variable bindings:
SQL actions:
ExecStatement: Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)ReadModifyWrite: Optimistic locking (read, modify, conditional write)
Key-value actions (NoSQL):
GetByKey: Batch fetch by primary keyDeleteByKey: Delete records by primary keyUpdateByKey: Update records by primary keyQueryPk: Query primary keysFindPkByIndex: Find primary keys via secondary index
In-memory actions:
Filter: Apply in-memory filter to a variable’s dataProject: Transform recordsNestedMerge: Merge child records into parent recordsSetVar: Set a variable to a constant value
Phase 5: Execution
Location: engine/exec.rs
The execution phase is the interpreter that runs the compiled program. It iterates through actions, reading inputs from variables, performing operations, and storing outputs back to variables.
Execution Loop
The interpreter follows a simple pattern:
- Initialize variable storage
- For each action in sequence:
- Load input data from variables
- Perform the operation (database query or in-memory transform)
- Store the result in the output variable
- Return to the user the result from the final variable (the last action’s output)
Variable Lifetime
The engine tracks how many times each variable is referenced by downstream actions. A variable may be used by multiple actions (e.g., the same user records merged with both todos and comments). When the last action that needs a variable completes, the variable’s value is dropped to free memory.
Driver Interaction
The execution phase is the only part of the engine that communicates with database drivers. The driver interface is intentionally simple: a single exec() method that accepts an Operation enum. This enum includes variants for both SQL operations (QuerySql, Insert) and key-value operations (GetByKey, QueryPk, FindPkByIndex, DeleteByKey, UpdateByKey).
Each driver implements whichever operations it supports. SQL drivers handle QuerySql natively while key-value drivers handle GetByKey, QueryPk, etc. The planner uses driver.capability() to determine which operations to generate for each database type.
Toasty Type System Architecture
Overview
Toasty uses Rust’s type system in the public API with both concrete types and generics. The query engine tracks the type of value each statement evaluates to using stmt::Type. This document describes how types flow through the system and the key components involved.
Type System Boundaries
Toasty has two distinct type systems with different responsibilities:
1. Rust-Level Type System (Compile-Time Safety)
At the Rust level, each model is a distinct type:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
name: String,
email: String,
}
#[derive(Model)]
struct Todo {
#[key]
#[auto]
id: u64,
user_id: u64,
title: String,
}
// Toasty generates type-safe field access preventing type mismatches:
User::get_by_email(&db, "john@example.com").await?; // ✓ String matches email field
User::filter_by_id(&user_id).filter(User::FIELDS.name().eq("John")).all(&db).await?; // ✓ String matches name field
// Type system prevents field/model confusion:
// User::FIELDS.title() // ← Compile error! User has no title field
// Todo::FIELDS.email() // ← Compile error! Todo has no email field
// User::FIELDS.name().eq(&todo_id) // ← Compile error! u64 doesn't match String
}
The query builder API maintains this type safety through generics and traits, preventing you from accidentally mixing model types or referencing non-existent fields. The API uses generic types (Statement<M>, Select<M>, etc.) that wrap toasty_core::stmt types.
2. Query Engine Type System (Runtime)
When db.exec(statement) is called, the generic <M> parameter is erased:
#![allow(unused)]
fn main() {
// Generated query builder returns a typed wrapper
let query: FindUserById = User::find_by_id(&id);
// .into() converts to Statement<User>
let statement: Statement<User> = query.into();
// At db.exec() - generic is erased, .untyped is extracted
pub async fn exec<M: Model>(&self, statement: Statement<M>) -> Result<ValueStream> {
engine::exec(self, statement.untyped).await // <- Only toasty_core::stmt::Statement
}
}
At this boundary, the statement becomes untyped (no Rust generic), but the engine tracks the type of value the statement evaluates to using stmt::Type. Initially, this remains at the model-level—a query for User evaluates to Type::List(Type::Model(user_model_id)). During lowering, these convert to structural record types for database execution.
Type Flow Through the System
Rust API → Query Builder → Engine Entry → Lowering/Planning → Execution
↓ ↓ ↓ ↓ ↓
Distinct Type-Safe Type::Model Type::Record stmt::Value
Types Generics (no generics) (typed)
(compile) (compile) (runtime) (runtime) (runtime)
At lowering, statements that evaluate to Type::Model(model_id) are converted to evaluate to Type::Record([field_types...]). This conversion enables the engine to work with concrete field types for database operations.
Detailed Architecture
Query Engine Entry Point
When the engine receives a toasty_core::stmt::Statement, it processes through verification, lowering, planning, and execution:
#![allow(unused)]
fn main() {
pub(crate) async fn exec(&self, stmt: Statement) -> Result<ValueStream> {
if cfg!(debug_assertions) {
self.verify(&stmt);
}
// Lower the statement to High-level intermediate representation
let hir = self.lower_stmt(stmt)?;
// Translate into a series of driver operations
let plan = self.plan_hir_statement(hir)?;
// Execute the plan
self.exec_plan(plan).await
}
}
Lowering Phase (Model-to-Table Transformation)
The lowering phase transforms statements from model-level to table-level representations.
Example 1: Simple query
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// Note: At model-level, no specific fields are selected
// After lowering
SELECT id, name, email FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([Type::Id, Type::String, Type::String]))
}
Example 2: Query with association
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User INCLUDE todos WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// where todos field is Type::List(Type::Model(todo_model_id))
// After lowering
SELECT id, name, email, (
SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([
// Type::Id, Type::String, Type::String,
// Type::List(Type::Record([Type::Id, Type::String, Type::Id]))
// ]))
}
Planning and Variable Types
During planning, the engine assigns variables to hold intermediate results (see Query Engine Architecture for details on the execution model). Each variable is registered with its type, which is always Type::List(...) or Type::Unit.
Execution
At execution time, the VarStore holds the type information from planning. When storing a value stream in a variable, the store associates the expected type with it. The value stream ensures each value it yields conforms to that type. This type information carries through to the final result returned to the user.
Type Inference
While statements entering the engine have known types, planning constructs new expressions—projections, filters, and merge qualifications—whose types aren’t explicitly declared. The engine must infer these types from the expression structure to register variables correctly.
Type inference is handled by ExprContext, which walks expression trees and determines their result types based on the schema. For example, a column reference’s type comes from the schema definition, and a record expression’s type is built from its field types.
#![allow(unused)]
fn main() {
// Create context for type inference
let cx = stmt::ExprContext::new_with_target(&*self.engine.schema, stmt);
// Infer type of an expression reference
let ty = cx.infer_expr_reference_ty(expr_reference);
// Infer type of a full expression with argument types
let ret = ExprContext::new_free().infer_expr_ty(expr.as_expr(), &args);
}
Design
Design documents for Toasty.
Batch Query Execution
Overview
Batch queries let users send multiple independent queries to the database in a single round-trip. The results come back as a typed tuple matching the input queries.
#![allow(unused)]
fn main() {
let (active_users, recent_posts) = toasty::batch((
User::find_by_active(true),
Post::find_recent(100),
)).exec(&db).await?;
// active_users: Vec<User>
// recent_posts: Vec<Post>
}
The batch composes all queries into a single Statement whose returning
expression is a record of subqueries. This means batch execution flows through
the existing exec path — no new executor methods, no new driver operations.
This design covers SQL databases only. DynamoDB support is out of scope.
New Trait: IntoStatement<T>
A single new trait bridges query builders to Statement<T>:
#![allow(unused)]
fn main() {
pub trait IntoStatement<T> {
fn into_statement(self) -> Statement<T>;
}
}
Query builders implement this for their model type. For example, UserQuery
implements IntoStatement<User>:
#![allow(unused)]
fn main() {
impl IntoStatement<User> for UserQuery {
fn into_statement(self) -> Statement<User> {
self.stmt.into()
}
}
}
The codegen already produces IntoSelect impls for query builders.
IntoStatement can be blanket-implemented for anything that implements
IntoSelect:
#![allow(unused)]
fn main() {
impl<T: IntoSelect> IntoStatement<T::Model> for T {
fn into_statement(self) -> Statement<T::Model> {
self.into_select().into()
}
}
}
Tuple implementations
Tuples of IntoStatement types implement IntoStatement by composing their
inner statements into a single select whose returning expression is a record of
subqueries:
#![allow(unused)]
fn main() {
impl<T1, T2, A, B> IntoStatement<(Vec<T1>, Vec<T2>)> for (A, B)
where
A: IntoStatement<T1>,
B: IntoStatement<T2>,
{
fn into_statement(self) -> Statement<(Vec<T1>, Vec<T2>)> {
let stmt_a = self.0.into_statement().untyped;
let stmt_b = self.1.into_statement().untyped;
// Build: SELECT (stmt_a), (stmt_b)
let query = stmt::Query::values(stmt::Expr::record([
stmt::Expr::subquery(stmt_a),
stmt::Expr::subquery(stmt_b),
]));
Statement::from_raw(query.into())
}
}
}
The resulting statement is equivalent to SELECT (subquery_1), (subquery_2).
At the Toasty AST level this is a Query whose returning body is a
Record([Expr::Stmt, Expr::Stmt]). The engine handles each subquery
independently during execution and packs the results into a single
Value::Record.
Tuple impls for arities 2 through 8 are generated with a macro.
Load for Tuples and Vec<T>
To deserialize the composed result, Load is implemented for Vec<T> and
for tuples:
#![allow(unused)]
fn main() {
impl<T: Load> Load for Vec<T> {
fn load(value: stmt::Value) -> Result<Self> {
match value {
Value::List(items) => items
.into_iter()
.map(T::load)
.collect(),
_ => Err(Error::type_conversion(value, "Vec<T>")),
}
}
}
impl<A: Load, B: Load> Load for (A, B) {
fn load(value: stmt::Value) -> Result<Self> {
match value {
Value::Record(mut record) => Ok((
A::load(record[0].take())?,
B::load(record[1].take())?,
)),
_ => Err(Error::type_conversion(value, "(A, B)")),
}
}
}
}
With these impls, Load for (Vec<User>, Vec<Post>) works automatically:
the outer tuple impl splits the record, then each Vec<T> impl iterates
the list and loads individual model instances.
User-Facing API
#![allow(unused)]
fn main() {
pub fn batch<T, Q: IntoStatement<T>>(queries: Q) -> Batch<T>
where
T: Load,
{
Batch {
stmt: queries.into_statement(),
}
}
pub struct Batch<T> {
stmt: Statement<T>,
}
impl<T: Load> Batch<T> {
pub async fn exec(self, executor: &mut dyn Executor) -> Result<T> {
use ExecutorExt;
let stream = executor.exec(self.stmt).await?;
let value = stream.next().await
.ok_or_else(|| Error::record_not_found("batch returned no results"))??;
T::load(value)
}
}
}
Batch::exec calls the regular ExecutorExt::exec method. The composed
statement flows through the standard engine pipeline. The result is a single
value (a record of lists) that T::load deserializes into the typed tuple.
Execution Flow
User code:
toasty::batch((UserQuery, PostQuery)).exec(&db)
IntoStatement for (A, B):
SELECT (SELECT ... FROM users WHERE ...), (SELECT ... FROM posts ...)
Engine pipeline (standard exec path):
lower → plan → exec
The engine recognizes Expr::Stmt subqueries in the returning
expression and executes each independently.
Result:
Value::Record([
Value::List([user1, user2, ...]),
Value::List([post1, post2, ...]),
])
Load for (Vec<User>, Vec<Post>):
(A::load(record[0]), B::load(record[1]))
→ (Vec<User>::load(list), Vec<Post>::load(list))
→ (vec![User::load(v1), ...], vec![Post::load(v1), ...])
Statement Changes
Statement<M> needs a way to construct from a raw stmt::Statement without
requiring M: Model:
#![allow(unused)]
fn main() {
impl<M> Statement<M> {
/// Build a statement from a raw untyped statement.
///
/// Used by batch composition where M may be a tuple, not a model.
pub(crate) fn from_raw(untyped: stmt::Statement) -> Self {
Self {
untyped,
_p: PhantomData,
}
}
}
}
The existing Statement::from_untyped requires M: Model (via IntoSelect).
from_raw has no bound on M and is pub(crate) so only internal code uses
it.
Engine Support
The engine needs to handle a Query whose returning expression is a record
of Expr::Stmt subqueries where each subquery returns multiple rows.
The lowerer already handles Expr::Stmt for association preloading (INCLUDE),
where subqueries get added to the dependency graph and executed as part of the
plan. Batch queries follow the same pattern: each Expr::Stmt in the returning
record becomes an independent subquery in the plan, and the exec phase collects
results into a Value::Record of Value::Lists.
If the existing lowerer does not handle bare subqueries in a returning record
(outside of an INCLUDE context), a small extension is needed to recognize this
pattern and plan it the same way.
Implementation Plan
Phase 1: IntoStatement trait and Load impls
- Add
IntoStatement<T>trait tocrates/toasty/src/stmt/ - Add blanket impl
IntoStatement<T::Model> for T: IntoSelect - Add
Load for Vec<T>andLoad for (A, B)(and higher arities via macro) - Add
Statement::from_raw - Export
IntoStatementfromlib.rsandcodegen_support
Phase 2: Batch API
- Add
toasty::batch()function andBatch<T>struct - Add tuple impls of
IntoStatement<(Vec<T1>, Vec<T2>, ...)>(via macro) - Wire
Batch::execthrough the standardExecutorExt::execpath
Phase 3: Engine support
- Verify that the lowerer handles
Expr::Stmtsubqueries in a returning record correctly (it may already work via theINCLUDEpath) - If not, extend the lowerer to plan bare record-of-subqueries statements
- Verify the exec phase packs subquery results into
Value::RecordofValue::Lists
Phase 4: Integration tests
- Batch two selects on different models
- Batch a select that returns rows with a select that returns empty
- Batch with filters, ordering, and limits
- Batch inside a transaction
- Batch of a single query (degenerates to normal execution)
Files Modified
| File | Change |
|---|---|
crates/toasty/src/stmt/into_statement.rs | New: IntoStatement<T> trait, blanket impl |
crates/toasty/src/stmt.rs | Add Statement::from_raw, re-export IntoStatement |
crates/toasty/src/load.rs | Add Load impls for Vec<T> and tuples |
crates/toasty/src/batch.rs | Add batch(), Batch<T>, tuple IntoStatement impls |
crates/toasty/src/lib.rs | Re-export batch, Batch, IntoStatement |
crates/toasty/src/engine/lower.rs | Handle record-of-subqueries in returning (if needed) |
DynamoDB: OR Predicates in Index Key Conditions
Problem
DynamoDB’s KeyConditionExpression does not support OR — neither for partition keys nor
sort keys. This means queries like WHERE user_id = 1 OR user_id = 2 on an indexed field
are currently broken for DynamoDB.
The engine must detect OR in index key conditions and fan them out into N individual
DynamoDB Query calls — one per OR branch — then concatenate the results.
A secondary motivation: the batch-load mechanism used for nested association preloads
(rewrite_stmt_query_for_batch_load_nosql) produces ANY(MAP(arg[input], pred)), which
at exec time expands to OR via simplify_expr_any. This hits the same DynamoDB
restriction and is addressed by the same fix.
Where OR Can Reach a Key Condition
Only two engine actions use KeyConditionExpression:
QueryPk— queries the primary table when exact PK keys cannot be extractedFindPkByIndex— queries a GSI to retrieve primary keys
GetByKey uses BatchGetItem (explicit key values, no expression), so OR is never
relevant there. pk = v1 OR pk = v2 on the primary key produces
IndexPlan.key_values = Some([v1, v2]), routing to GetByKey directly — no issue.
QueryPk
OR reaches QueryPk.pk_filter when IndexPlan.key_values is None:
- User-specified OR on sort key:
WHERE pk = v AND (sk >= s1 OR sk >= s2)— range predicates have no extractable key values. - Batch-load (e.g. a HasMany where the FK is the partition key of the child’s
composite primary key):
rewrite_stmt_query_for_batch_load_nosqlproducesANY(MAP(arg[input], fk = arg[0])). The list is a runtime input, sokey_valuesisNone. At exec timesimplify_expr_anyexpands it to OR.
FindPkByIndex
FindPkByIndex.filter is the output of partition_filter, which isolates index key
conditions from non-key conditions. partition_filter on AND distributes cleanly:
status = active AND (user_id = 1 OR user_id = 2) produces
index_filter = user_id = 1 OR user_id = 2 and result_filter = status = active.
OR reaches it in the same two ways as QueryPk:
- User-specified OR:
WHERE user_id = 1 OR user_id = 2on a GSI partition key. - Batch-load: same
ANY(MAP(arg[input], pred))expansion path as above.
Mixed OR Operands
partition_filter currently has a todo!() for OR operands that contain both index and
non-index parts — e.g. (pk = 1 AND status = a) OR pk = 2.
This is in scope. Strategy:
- Extract key conditions from each OR branch to build the fan-out:
ANY(MAP([1, 2], pk = arg[0])) - Apply the full original predicate as an in-memory post-filter:
(pk = 1 AND status = a) OR pk = 2
This is conservative but correct, and consistent with how post_filter is already used.
Canonical Form: ANY(MAP(key_list, per_call_pred))
All OR cases are represented uniformly as ANY(MAP(key_list, per_call_pred)):
key_list— one entry per requiredQuerycall; each entry has one value per key column (scalar for partition-key-only,Value::Recordfor partition + sort key)per_call_pred— the key condition for one call, referencing element fields asarg[0],arg[1], …
Single key column — user_id = 1 OR user_id = 2:
ANY(MAP([1, 2], user_id = arg[0]))
Composite key — (todo_id = t1 AND step_id >= s1) OR (todo_id = t2 AND step_id >= s2):
ANY(MAP([(t1, s1), (t2, s2)], todo_id = arg[0] AND step_id >= arg[1]))
Batch-load — ANY(MAP(arg[input], todo_id = arg[0])) — already in canonical form;
no structural change needed, only the exec fan-out behavior changes.
Design
1. Capability Flag
#![allow(unused)]
fn main() {
/// Whether OR is supported in index key conditions (e.g. DynamoDB KeyConditionExpression).
pub index_or_predicate: bool,
}
DynamoDB: false. All other backends: true (SQL backends never use these actions).
2. IndexPlan Output Contract
#![allow(unused)]
fn main() {
pub(crate) struct IndexPlan<'a> {
pub(crate) index: &'a Index,
/// Filter to push to the index. Guaranteed form:
///
/// | Condition | Form |
/// |------------------------------------|--------------------------------------------------|
/// | No OR | plain expr — `user_id = 1` |
/// | OR, `index_or_predicate = true` | `Expr::Or([branch1, branch2, ...])` |
/// | OR, `index_or_predicate = false` | `ANY(MAP(Value::List([v1, ...]), per_call_pred))` |
/// | Batch-load (any capability) | `ANY(MAP(arg[input], per_call_pred))` |
pub(crate) index_filter: stmt::Expr,
/// Non-index conditions applied in-memory after results return from each call.
pub(crate) result_filter: Option<stmt::Expr>,
/// Full original predicate applied after all fan-out results are collected.
/// Set for mixed OR operands — see §"Mixed OR Operands".
pub(crate) post_filter: Option<stmt::Expr>,
/// Literal key values for direct lookup: a `Value::List` of `Value::Record` entries,
/// one per lookup. Set by `partition_filter` when all key columns have literal equality
/// matches. When `Some`, the planner routes to `GetByKey` and ignores `index_filter`.
/// May coexist with a canonical `ANY(MAP(...))` `index_filter` — both are produced
/// simultaneously by `partition_filter`; the planner always prefers `GetByKey`.
pub(crate) key_values: Option<stmt::Value>,
}
}
Planner routing (primary key path):
key_values.is_some() → GetByKey (BatchGetItem)
index_filter = ANY(MAP(...)) → fan-out via QueryPk × N
otherwise → single QueryPk call
3. Key Value Extraction in index_match
partition_filter extracts literal key values during filter partitioning, setting
key_values when all key columns have literal equality matches. This replaces the
current try_build_key_filter (kv.rs) post-hoc re-analysis of index_filter.
What moves into index_match: walking each OR branch, reading the RHS of each key
column’s equality predicate, assembling Value::List([Value::Record([v0, ...]), ...]).
What stays in the planner: constructing eval::Func from key_values to drive the
GetByKey operation — a mechanical wrap requiring no further expression analysis.
Why this matters for ordering: if partition_filter produced the canonical
ANY(MAP([1,2], pk=arg[0])) form first, the downstream try_build_key_filter Or arm
would never fire, silently breaking the GetByKey path for primary key OR queries.
Extracting key values inside partition_filter eliminates this conflict — both outputs
are produced together.
4. Planner Invariant
When !capability.index_or_predicate, neither FindPkByIndex.filter nor
QueryPk.pk_filter contains Expr::Or. OR is always restructured into
ANY(MAP(arg[i], per_call_pred)) by partition_filter before reaching the exec layer.
Batch-load path — ANY(MAP(...)) is already produced upstream; the invariant holds.
Only the exec fan-out needs fixing.
User-specified OR path — partition_filter produces canonical form directly. The
planner consumes IndexPlan.index_filter as-is; no rewrite in plan_secondary_index_execution
or plan_primary_key_execution. For mixed OR operands, partition_filter additionally
sets IndexPlan.post_filter to the full original predicate.
5. Exec Fan-out
Both action_find_pk_by_index and action_query_pk receive the same treatment.
After substituting inputs into the filter, check for ANY(MAP(arg[i], per_call_pred)):
- If present: iterate over
input[i]element by element; substitute each intoper_call_predand issue one driver call; concatenate results. Do not callsimplify_expr_any— it would re-expand to OR. - Otherwise: unchanged single-call path.
6. DynamoDB Driver
Revert the temporary OR-splitting workaround in exec_find_pk_by_index. The driver
is a dumb executor of a single valid key condition.
Summary of Changes
| Location | Change |
|---|---|
Capability | Add index_or_predicate: bool; false for DynamoDB |
IndexPlan | Add key_values: Option<stmt::Value> field |
index_match / partition_filter | Or arm: produce canonical ANY(MAP(...)) when !index_or_predicate; extract key_values; fix mixed OR todo!() |
plan_primary_key_execution | Route on key_values / ANY(MAP(...)) instead of calling try_build_key_filter |
plan_secondary_index_execution | No rewrite needed; consumes IndexPlan.index_filter as-is |
kv.rs / try_build_key_filter | Remove (literal case now handled by index_match) |
action_find_pk_by_index | Fan out over ANY(MAP(...)) — one driver call per element; skip simplify_expr_any |
action_query_pk | Same fan-out treatment |
DynamoDB exec_find_pk_by_index | Revert OR-splitting workaround |
Data-Carrying Enum Implementation Design
Builds on unit enum support (#355). See docs/design/enums-and-embedded-structs.md
for the user-facing design.
Value Stream Encoding
Unit and data variants are encoded differently in the value stream:
- Unit variant:
Value::I64(discriminant)— unchanged from unit enum encoding - Data variant:
Value::Record([I64(discriminant), ...active_field_values])
Only the active variant’s fields appear in the record; inactive variant columns (NULL
in the DB) are not included. Primitive::load dispatches on the value type:
I64(d) => unit variant with discriminant d
Record(r) => data variant; r[0] is the discriminant, r[1..] are fields
Schema Changes
EnumVariant gains a fields: Vec<Field> — the same Field type used by
EmbeddedStruct. Field indices are assigned globally across all variants within the
enum, keeping FieldId { model: enum_id, index } as a unique identifier consistent
with how EmbeddedStruct works. The primary_key, auto, and constraints
members of Field are always false/None/[] for variant fields.
Primitive::ty() changes based on variant content:
- Unit-only enum →
Type::I64(unchanged) - Any data variant present →
Type::Model(Self::id()), same as embedded structs
Codegen Changes
Parsing: toasty-macros/src/schema/ parses variant fields and includes them
in EmbeddedEnum registration so the runtime schema is complete.
Primitive::load: generated arms dispatch on value type first (I64 vs Record),
then on the discriminant within each branch. Data variant arms load each field from
its positional index in the record.
IntoExpr: unit variants emit Value::I64(disc) as today; data variants emit
Value::Record([I64(disc), field_exprs...]).
{Enum}Fields struct: all enums (unit-only and data-carrying) generate a
{Enum}Fields struct with is_{variant}() methods for discriminant-only filtering.
For data-carrying enums, is_{variant}() uses project(path, [0]) to extract the
discriminant from the record representation. For unit-only enums, it compares the
path directly. The struct also delegates comparison methods (eq, ne, etc.) to
Path<Self>.
Engine: Expr::Match
Both table_to_model and model_to_table are expressed using:
Match { subject: Expr, arms: [(pattern: Value, expr: Expr)], else_expr: Expr }
Expr::Match is never serialized to SQL — it is either evaluated in the engine
(for writes) or eliminated by the simplifier before the plan stage (for reads/queries).
table_to_model
For an enum field, table_to_model emits a Match on the discriminator column.
Each arm produces the value shape Primitive::load expects: unit arms emit
I64(disc), data arms emit Record([I64(disc), ...field_col_refs]).
else branch: Expr::Error
The else branch of an enum Match represents the case where the discriminant column
holds an unrecognized value — semantically unreachable for well-formed data.
For data-carrying enums, the else branch is Record([disc_col, Error, ...Error]) —
the same Record shape as data arms, but with Expr::Error in every field slot. This
design is critical for the simplifier: projections distribute uniformly into the else
branch, and field-slot projections yield Expr::Error (correct: accessing a field
on an unknown variant is an error), while discriminant projections ([0]) yield
disc_col (the same as every arm). This enables the uniform-arms optimization to
fire after projection.
For unit-only enums with data variants, else is Expr::Error directly.
model_to_table
Runs the inverse: the incoming value (I64 or Record) is matched on its
discriminant, and each arm emits a flat record of all enum columns in DB order —
setting the discriminator and active variant fields, and NULLing every inactive
variant column. This NULL-out is mandatory: because writes may not have a loaded
model, the engine has no knowledge of the prior variant and must clear all
non-active columns unconditionally.
Simplifier Rules
Project into Match (expr_project.rs)
Distributes a projection into each Match arm AND the else branch:
project(Match(subj, [p => e, ...], else), [i])
→ Match(subj, [p => project(e, [i]), ...], else: project(else, [i]))
Projection is pushed into the else branch unconditionally — Expr::Error inside
a Record is handled naturally (projecting [0] out of Record([disc, Error])
yields disc; projecting [1] yields Error).
Uniform arms (expr_match.rs)
When all arms AND the else branch produce the same expression, the Match is redundant:
Match(subj, [1 => disc, 2 => disc], else: disc) → disc
The else branch MUST equal the common arm expression for this rule to fire. This makes the transformation provably correct — no branch is dropped that could produce a different value.
Match elimination in binary ops (expr_binary_op.rs)
Distributes a binary op over match arms, producing an OR of guarded comparisons. The else branch is included with a negated guard:
Match(subj, [p1 => e1, p2 => e2], else: e3) == rhs
→ OR(subj == p1 AND e1 == rhs,
subj == p2 AND e2 == rhs,
subj != p1 AND subj != p2 AND e3 == rhs)
Each term is fully simplified inline. Terms that fold to false/null are pruned.
No special handling is needed for the else branch — it is always included and
existing simplification rules handle Expr::Error naturally (see below).
Expr::Error semantics
Expr::Error is treated as “unreachable” — not as a poison value that propagates.
No special Error propagation rules exist. Instead, existing rules eliminate Error
through the surrounding context:
-
Data-carrying enum else:
Record([disc, Error, ...]). After tuple decomposition, the guarddisc != p1 AND disc != p2contradicts the decomposeddisc == cfrom the comparison target. The contradicting equality rule (a == c AND a != c → false) folds the AND to false. -
false AND (Error == x): Thefalseshort-circuit in AND eliminates the term without needing to simplifyError == x. -
Record([1, Error]) == Record([0, "alice"]): Tuple decomposition produces1 == 0 AND Error == "alice". The1 == 0 → falsefolds the AND to false.
In all well-formed cases, the guard constraints around Error cause the branch to be pruned without requiring Error-specific rules.
Type inference for Expr::Error
Expr::Error infers as Type::Unknown. TypeUnion::insert skips Unknown, so
an Error branch in a Match doesn’t widen the inferred type union.
Variant-only filter flow
is_email() generates eq(project(path, [0]), I64(1)). After lowering:
eq(project(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
else: Record([disc, Error])), [0]),
I64(1))
- Project-into-Match distributes
[0]into all branches including else project(Record([disc, addr]), [0])→disc(for each arm)project(Record([disc, Error]), [0])→disc(for else)- Uniform-arms fires: all arms AND else produce
disc→ folds todisc - Result:
eq(disc, I64(1))— a cleandisc_col = 1predicate
Full-value equality filter flow
contact().eq(ContactInfo::Email { address: "alice@example.com" }) generates
eq(path, Record([I64(1), "alice@example.com"])). After lowering:
eq(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
else: Record([disc, Error])),
Record([I64(1), "alice@example.com"]))
- Match elimination distributes eq into each arm AND else as OR
disc == 1 AND Record([disc, addr]) == Record([I64(1), "alice"])→ simplifiesdisc == 2 AND Record([disc, num]) == Record([I64(1), "alice"])→ false (pruned)- Else:
disc != 1 AND disc != 2 AND Record([disc, Error]) == Record([I64(1), "alice"])→ tuple decomposition:disc != 1 AND disc != 2 AND disc == 1 AND Error == "alice"→ contradicting equality (disc == 1 AND disc != 1) → false (pruned) - Result:
disc_col = 1 AND addr_col = 'alice@example.com'
Correctness Sharp Edges
Whole-variant replacement must NULL all inactive columns. The engine has no
knowledge of the prior variant for query-based updates, so the model_to_table arms
unconditionally NULL every column they do not own.
NULL discriminators are disallowed. The discriminator column carries NOT NULL,
consistent with unit enums today. Option<Enum> support is a future concern.
Unknown discriminants fail at load time. An unrecognized discriminant (e.g. from
a newer schema version) produces a runtime error via Expr::Error. Removing a
variant requires a data migration.
No DB-level integrity for active variant fields. All variant columns are nullable
(to accommodate inactive variants), so a NULL in a required active field is caught
only at load time by Primitive::load, not at write time.
DynamoDB
Equivalent encoding to be determined when implementing the DynamoDB driver phase.
Implementation Status
Completed
-
Schema:
fields: Vec<Field>onEnumVariant; codegen parsing;Primitive::ty()returnsType::Modelfor data-carrying enums. -
Value encoding:
Primitive::load()dispatches on I64 vs Record;IntoExpremits Record for data variants. -
Expr::Match+Expr::Error: Match/MatchArm AST nodes with visitors, eval, and simplifier integration.Expr::Errorfor unreachable branches.build_table_to_model_field_enumusesRecord([disc, Error, ...])for the else branch. -
Simplifier: project-into-Match distribution; uniform-arms folding (with else-branch check); Match-to-OR elimination in binary ops; case distribution for binary ops with Match operands.
-
{Enum}Fieldscodegen: all enums generate a fields struct withis_{variant}()methods and delegated comparison methods. -
Integration tests: CRUD for data-carrying enums; full-value equality filter; variant-only filter (
is_email()); unit enum variant filter (is_pending()). -
Variant+field filter (
contact().email().matches(|e| e.address().eq("x"))): per-variant field accessors with closure-based.matches()API. -
OR tautology elimination:
is_variant(x, 0) or is_variant(x, 1)covering all variants of an enum folds totruein the OR simplifier.
Remaining
-
Partial updates: within-variant partial update builder.
-
DynamoDB: equivalent encoding in the DynamoDB driver.
Open Questions
-
SparseRecord/reload: within-variant partial updates are supported, soSparseRecordandreloadare needed for enum variant fields. Determine howreloadshould handle aSparseRecordscoped to a specific variant’s fields — the in-memory model must update only the changed fields without disturbing the discriminant or other variant columns. -
Shared columns: variants sharing a column via
#[column("name")]is in the user-facing design. Schema parsing should record shared columns in Phase 1; full query support is a follow-on.
Enum and Embedded Struct Support
Addresses Issue #280.
Scope
Add support for:
- Enum types as model fields (unit, tuple, struct variants)
- Embedded structs (no separate table, stored inline)
Both use #[derive(toasty::Embed)].
Storage Strategy
Flattened storage:
- Enums: Discriminator column + nullable columns per variant field
- INTEGER discriminator with required
#[column(variant = N)]on each variant - Works uniformly across all databases (PostgreSQL, MySQL, SQLite, DynamoDB)
- INTEGER discriminator with required
- Embedded structs: No discriminator, just flattened fields
- Newtype structs (
struct Email(String)): Single unnamed field, maps to one column with the parent field’s name (no prefix). Supports#[key],#[unique], and#[index]on the parent model field.
Unit-only enums: No columns - stored as the INTEGER value itself.
Post-MVP: Native ENUM types for PostgreSQL/MySQL discriminators (optimization).
Column Naming
Newtype structs: {field} — no suffix. A newtype has one unnamed field, so
the column uses the parent field name directly (e.g., email: Email → column
email).
Multi-field embedded structs: {field}_{name} (e.g., address: Address with
field city → column address_city).
Enums: {field}_{variant}_{name}
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
critter: Creature, // field name
}
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human { profession: String }, // variant, field
#[column(variant = 2)]
Lizard { habitat: String },
}
// Columns:
// - critter (discriminator)
// - critter_human_profession
// - critter_lizard_habitat
}
Customization
Rename field (at enum definition):
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human { profession: String },
#[column(variant = 2)]
Lizard {
#[column("lizard_env")] // Must include variant scope
habitat: String,
},
}
// → critter_lizard_env (field prefix "critter" is prepended)
}
Custom column names for enum variant fields must include the variant scope. The pattern becomes {field}_{custom_name} where custom_name should include the variant portion.
Rename field prefix (per use):
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
#[column("creature_type")]
critter: Creature,
}
// → creature_type (discriminator)
// → creature_type_human_profession (field prefix replaced for all columns)
// → creature_type_lizard_habitat
}
The #[column("name")] attribute on the parent struct’s field replaces the field prefix for all generated columns.
Customize discriminator type (on enum definition):
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")]
enum Creature { ... }
}
The #[column(type = "...")] attribute on the enum type customizes the database type for the discriminator column (e.g., “bigint”, “smallint”, “tinyint”).
Tuple Variants
Numeric field naming: {field}_{variant}_{index}
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
#[column(variant = 1)]
Phone(String, String),
}
// Columns: contact, contact_phone_0, contact_phone_1
}
Customize with #[column("...")]:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
#[column(variant = 1)]
Phone(
#[column("phone_country")]
String,
#[column("phone_number")]
String,
),
}
// Columns: contact, contact_phone_country, contact_phone_number
}
Nested Types
Path flattened with underscores:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
#[column(variant = 1)]
Mail { address: Address },
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
// → contact_mail_address_city
// → contact_mail_address_street
}
Shared Columns Across Variants
Multiple variants can share the same column by specifying the same #[column("name")]:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
#[key]
#[auto]
id: u64,
creature: Creature,
}
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human {
#[column("name")]
name: String,
profession: String,
},
#[column(variant = 2)]
Animal {
#[column("name")]
name: String,
species: String,
},
}
// Columns:
// - creature (discriminator)
// - creature_name (shared between Human and Animal)
// - creature_human_profession
// - creature_animal_species
}
Requirements:
- Fields sharing a column must have compatible types (validated at schema build time)
- The shared column name must be identical across variants
- Compatible types: same primitive type, or compatible type conversions
- Shared columns are still nullable at the database level (NULL when variant doesn’t use that field)
Discriminator Types
MVP: INTEGER discriminator for all databases
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human { profession: String },
#[column(variant = 2)]
Lizard { habitat: String },
}
}
All variants require #[column(variant = N)] with unique integer values. Compile error if missing.
Customize discriminator type:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")] // Or "smallint", "tinyint", etc.
enum Creature {
#[column(variant = 1)]
Human { profession: String },
#[column(variant = 2)]
Lizard { habitat: String },
}
}
The #[column(type = "...")] attribute on the enum customizes the database type for the discriminator column.
Post-MVP: Native ENUM types for PostgreSQL/MySQL
CREATE TYPE creature AS ENUM ('Human', 'Lizard');
Can customize with #[column(variant = "name")] on variants.
NULL Handling
Inactive variant fields are NULL.
-- When critter = 'Human':
critter_human_profession = 'Knight'
critter_lizard_habitat = NULL
For Option<T> fields: Check discriminator first, then interpret NULL.
Usage
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
address: Address, // embedded struct
status: Status, // embedded enum
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active { since: DateTime },
}
}
Registration: Automatic. Registering a model transitively registers all models reachable through its fields, including nested embedded types and relation targets.
Relations: Forbidden in embedded types (compile error).
Examples
Basic Enum
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
#[key]
#[auto]
id: u64,
status: Status,
}
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active,
#[column(variant = 3)]
Done,
}
}
Schema:
CREATE TABLE task (
id INTEGER PRIMARY KEY,
status INTEGER NOT NULL
);
-- 1=Pending, 2=Active, 3=Done (requires #[column(variant = N)])
Data-Carrying Enum
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
contact: ContactMethod,
}
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Phone { country: String, number: String },
}
}
Schema:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
contact INTEGER NOT NULL,
contact_email_address TEXT,
contact_phone_country TEXT,
contact_phone_number TEXT
);
Embedded Struct
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
address: Address,
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
}
Schema:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
address_street TEXT NOT NULL,
address_city TEXT NOT NULL,
address_zip TEXT NOT NULL
);
Nested Enum + Embedded
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Mail { address: Address },
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
}
Schema:
-- contact: ContactInfo
contact INTEGER NOT NULL,
contact_email_address TEXT,
contact_mail_address_street TEXT,
contact_mail_address_city TEXT
Querying
Basic variant checks
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
#[key]
#[auto]
id: u64,
status: Status,
}
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active,
#[column(variant = 3)]
Done,
}
// Query by variant (shorthand)
Task::all().filter(Task::FIELDS.status().is_pending())
Task::all().filter(Task::FIELDS.status().is_active())
// Equivalent using .matches() without field conditions
Task::all().filter(
Task::FIELDS.status().matches(Status::VARIANTS.pending())
)
}
Field access on variant fields
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
contact: ContactMethod,
}
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Phone { country: String, number: String },
}
// Match specific variants and access their fields
User::all().filter(
User::FIELDS.contact().matches(
ContactMethod::VARIANTS.email().address().contains("@gmail")
)
)
User::all().filter(
User::FIELDS.contact().matches(
ContactMethod::VARIANTS.phone().country().eq("US")
)
)
// Shorthand for variant-only checks (no field conditions)
User::all().filter(User::FIELDS.contact().is_email())
User::all().filter(User::FIELDS.contact().is_phone())
// Equivalent using .matches()
User::all().filter(
User::FIELDS.contact().matches(ContactMethod::VARIANTS.email())
)
}
Embedded struct field constraints
Embedded struct fields can be accessed directly for filtering, ordering, and other query operations:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
address: Address,
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
// Filter by embedded struct fields
User::all().filter(User::FIELDS.address().city().eq("Seattle"))
User::all().filter(User::FIELDS.address().zip().like("98%"))
// Multiple constraints on embedded struct
User::all().filter(
User::FIELDS.address().city().eq("Seattle")
.and(User::FIELDS.address().zip().like("98%"))
)
// Order by embedded struct fields
User::all().order_by(User::FIELDS.address().city().asc())
// Select embedded struct fields (projection)
User::all()
.select(User::FIELDS.id())
.select(User::FIELDS.address().city())
}
Nested embedded structs
For nested embedded types, continue chaining field accessors:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Company {
#[key]
#[auto]
id: u64,
headquarters: Office,
}
#[derive(toasty::Embed)]
struct Office {
name: String,
location: Address,
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
// Access nested embedded struct fields
Company::all().filter(
Company::FIELDS.headquarters().location().city().eq("Seattle")
)
Company::all().filter(
Company::FIELDS.headquarters().name().eq("Main Office")
.and(Company::FIELDS.headquarters().location().zip().like("98%"))
)
}
Combining enum and embedded struct constraints
When an enum variant contains an embedded struct, use .matches() to specify the variant, then access the embedded struct’s fields:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
contact: ContactInfo,
}
#[derive(toasty::Embed)]
enum ContactInfo {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Mail { address: Address },
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
// Filter by embedded struct fields within enum variant
User::all().filter(
User::FIELDS.contact().matches(
ContactInfo::VARIANTS.mail().address().city().eq("Seattle")
)
)
// Multiple constraints on embedded struct within variant
User::all().filter(
User::FIELDS.contact().matches(
ContactInfo::VARIANTS.mail()
.address().city().eq("Seattle")
.address().street().contains("Main")
)
)
}
Constraints with shared columns
When enum variants share columns, constraints apply based on the variant being matched:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
#[key]
#[auto]
id: u64,
creature: Creature,
}
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human {
#[column("name")]
name: String,
profession: String,
},
#[column(variant = 2)]
Animal {
#[column("name")]
name: String,
species: String,
},
}
// Query the shared "name" field for a specific variant
Character::all().filter(
Character::FIELDS.creature().matches(
Creature::VARIANTS.human().name().eq("Alice")
)
)
// Query across variants using the shared column
// (finds any creature with this name, regardless of variant)
Character::all().filter(
Character::FIELDS.creature().name().eq("Bob")
)
// Variant-specific field
Character::all().filter(
Character::FIELDS.creature().matches(
Creature::VARIANTS.human().profession().eq("Knight")
)
)
}
Updating
Update builders provide two methods per field:
.field(value)- Direct value assignment.with_field(|f| ...)- Closure-based update
The .with_* methods provide a uniform API across all field types and enable:
- Embedded types: Partial updates (only set specific nested fields)
- Primitives: Future type-specific operations (e.g.,
NumericUpdate::increment()) - Enums: Update variant fields without changing the discriminator
Whole replacement
Setting an embedded struct field on an update replaces all of its columns:
#![allow(unused)]
fn main() {
// Loaded model update — sets address_street, address_city, address_zip
user.update()
.address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
.exec(&db).await?;
// Query-based update — same behavior, no model loaded
User::filter_by_id(id).update()
.address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
.exec(&db).await?;
}
Partial updates
Each field (primitive or embedded) generates a companion {Type}Update<'a> type that
provides a view into the update statement’s assignments. These update types hold a
reference to the statement and a projection path, allowing them to directly mutate
the statement as fields are set. This enables efficient nested updates without intermediate
allocations.
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
// AddressUpdate<'a> is generated automatically by `#[derive(toasty::Embed)]`
// StringUpdate<'a> is generated for primitive String fields
}
Embedded types:
#![allow(unused)]
fn main() {
// Whole replacement — sets all address columns
user.update()
.address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
.exec(&db).await?;
// Partial update — only address_city is SET
user.update()
.with_address(|a| {
a.set_city("Seattle");
})
.exec(&db).await?;
// Multiple sub-fields — only address_city and address_zip are SET
user.update()
.with_address(|a| {
a.set_city("Seattle");
a.set_zip("98101");
})
.exec(&db).await?;
// Query-based partial update
User::filter_by_id(id).update()
.with_address(|a| a.set_city("Seattle"))
.exec(&db).await?;
}
Primitive types:
#![allow(unused)]
fn main() {
// Direct value
user.update()
.name("Alice")
.exec(&db).await?;
// Via closure (enables future type-specific operations)
user.update()
.with_name(|n| {
n.set("Alice");
})
.exec(&db).await?;
}
For now, primitive update builders only provide .set(). Future enhancements could add
type-specific operations like NumericUpdate::increment(), StringUpdate::append(), etc.
Partial updates with nested embedded structs
Nested embedded structs also generate {Type}Update<'a> types. The .with_* methods
can be nested naturally:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Office {
name: String,
location: Address,
}
// Update only headquarters_location_city
company.update()
.with_headquarters(|h| {
h.with_location(|a| {
a.set_city("Seattle");
});
})
.exec(&db).await?;
// Update headquarters_name and headquarters_location_zip
company.update()
.with_headquarters(|h| {
h.with_name(|n| n.set("West Coast HQ"));
h.with_location(|a| {
a.set_zip("98101");
});
})
.exec(&db).await?;
}
Enum updates
Enums use whole-variant replacement. Setting an enum field replaces the discriminator and all variant columns:
#![allow(unused)]
fn main() {
// Replace the entire enum value — sets discriminator + variant fields,
// NULLs out fields from the previous variant
user.update()
.contact(ContactMethod::Email { address: "new@example.com".into() })
.exec(&db).await?;
}
For data-carrying variants, use .with_contact() to update fields within the current
variant without changing the discriminator:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Phone { country: String, number: String },
}
// Update only the phone number, leave country and discriminator unchanged
user.update()
.with_contact(|c| {
c.phone(|p| {
p.with_number(|n| n.set("555-1234"));
});
})
.exec(&db).await?;
// Update email variant
User::filter_by_id(id).update()
.with_contact(|c| {
c.email(|e| {
e.with_address(|a| a.set("new@example.com"));
});
})
.exec(&db).await?;
}
ContactMethodUpdate<'a> has one method per variant (e.g., .phone(), .email()). Each
method accepts a closure that receives a builder scoped to that variant’s fields. The
discriminator is not changed by partial updates.
Mapping Layer Formalization
Problem
Toasty’s mapping layer connects model-level fields to database-level columns.
A model field’s type may differ from its storage type (e.g., Timestamp stored
as i64 or text). The mapping must be a bijection — every model value
encodes to exactly one stored value and decodes back losslessly. The bijection
operates at the record level, not per-field: n model fields may map to m
database columns (e.g., multiple fields JSON-encoded into a single column).
The bijection alone is not sufficient. When lowering expressions (filters, ORDER BY, arithmetic) to the database, we need to know whether a given operator can be pushed through the encoding. This is the question of whether the encoding is a homomorphism with respect to that operator:
- For arithmetic:
encode(a ⊕ b) = encode(a) ⊕' encode(b) - For comparisons:
a < b ⟺ encode(a) <' encode(b)
If yes, the operator can be evaluated in storage space (efficient, index-friendly). If no, the database must first decode to the model type (SQL CAST) or the operation must be evaluated application-side.
These are two decoupled concerns:
- Bijection — can we round-trip values? (required for correctness)
- Operator homomorphism — which operators preserve semantics through the encoding? (determines what can be pushed to the DB)
A mapping with no homomorphic operators is still valid — you can store and retrieve. You just can’t push any filters or ordering to the database.
Examples
Timestamp as i64 (epoch seconds)
encode(ts) = ts.epoch_seconds()
decode(n) = Timestamp::from_epoch_seconds(n)
Bijection: ✓ — lossless round-trip.
== homomorphic: ✓ — ts1 == ts2 ⟺ encode(ts1) == encode(ts2)
< homomorphic: ✓ — ts1 < ts2 ⟺ encode(ts1) < encode(ts2)
Epoch seconds preserve temporal ordering under integer comparison, so range
queries (<, >, BETWEEN) can operate directly on the raw column.
+ homomorphic: ✓ — encode(ts + 234s) = encode(ts) + 234
Integer addition over epoch seconds preserves timestamp arithmetic.
Timestamp as text (ISO 8601)
encode(ts) = ts.to_iso8601()
decode(s) = Timestamp::parse_iso8601(s)
Bijection: ✓ — lossless round-trip (assuming canonical formatting).
== homomorphic: ✓ — injective encoding preserves equality.
< homomorphic: fragile — lexicographic order matches temporal order only
for fixed-width UTC formats. Not generally safe.
+ homomorphic: ✗ — text + 234 is meaningless.
String with case inversion
encode(s) = s.invert_case() // "Hello" → "hELLO"
decode(s) = s.invert_case() // "hELLO" → "Hello"
Bijection: ✓ — case inversion is its own inverse.
== homomorphic: ✓ — injective, so equality is preserved. Encode the
search term the same way and compare.
< homomorphic: ✗ — ordering is reversed between cases:
"ABC" < "abc" (A=65 < a=97)
encode("ABC") = "abc"
encode("abc") = "ABC"
"abc" > "ABC" — ordering reversed
A valid mapping, but useless for range queries in storage space.
Bijection by Construction
For arbitrary functions, bijectivity is undecidable. Instead of detecting it, we construct mappings from known-bijective primitives and composition rules that preserve bijectivity. If a mapping is built entirely from these, it is guaranteed valid.
Composition rules
- Sequential:
f ∘ gis a bijection if bothfandgare. - Parallel/product:
(f(a), g(b))is a bijection if bothfandgare.
These compose freely — complex mappings built from simple bijective pieces are automatically valid. Homomorphism properties, however, may be lost at each composition step and must be tracked separately.
Dimensionality: multiple fields → one column
Two fields may map to the same column if and only if the model constrains them to always hold the same value (an equivalence class). In this case no information is lost and the mapping remains a bijection — but only over the restricted domain where the constraint holds. Without such a constraint, collapsing two independent fields into one column destroys injectivity.
This gives us computed fields as a natural consequence. Two fields can reference the same column through different bijective transformations:
regular: String → column (identity)
inverted: String → invert_case(column) (bijection)
Because the transformations are bijections, both fields are readable AND writable.
Writing regular = "Hello" stores "Hello" in the column; inverted
automatically becomes "hELLO". Writing inverted = "hELLO" applies the inverse
to store "Hello"; regular is automatically "Hello". Data flow in both
directions is fully determined by the bijection — no special computed-field
machinery needed.
Computed Fields
Storage is the source of truth. Each field is a view of the underlying column(s) through its bijection. Computed fields are a direct consequence: when multiple fields reference the same column through different bijections, each field is a different view of the same stored data.
Schema representation
Each field stores a bijection pair:
field_to_column: encode — compute column value from field value (inverse)column_to_field: decode — compute field value from column value (forward)
A reverse index maps each column to the set of fields that reference it.
Write propagation
When a field is set, the column value is determined, which determines all sibling fields:
- Compute column value:
col = field_a.field_to_column(new_value) - For each sibling field on the same column:
field_b = field_b.column_to_field(col)
The composed transform between two fields sharing a column is:
field_b.column_to_field(field_a.field_to_column(value))
Conflict detection
If the user sets two fields that share a column in the same operation, the
resulting column values must agree. If
field_a.field_to_column(val_a) ≠ field_b.field_to_column(val_b), the write is
invalid and must be rejected.
Bijective Primitives
Three categories of bijective primitives, each with encode/decode halves:
Type reinterpretation
Converts a single value between two types with the same information content.
Implemented as Expr::Cast in both directions.
Current pairs:
- Timestamp ↔ String (ISO 8601)
- Uuid ↔ String
- Uuid ↔ Bytes
- Date ↔ String
- Time ↔ String
- DateTime ↔ String
- Zoned ↔ String
- Timestamp ↔ DateTime
- Timestamp ↔ Zoned
- Zoned ↔ DateTime
- Decimal ↔ String
- BigDecimal ↔ String
- Integer widening/narrowing (i8 ↔ i16 ↔ i32 ↔ i64, etc.)
Affine transformations
Arithmetic transformations by a constant. Each is a bijection with a known inverse.
x + k— inverse:x - kx * k(k ≠ 0) — inverse:x / kx * k + c(k ≠ 0) — inverse:(x - c) / k
Homomorphism properties (for x + k as representative):
==homomorphic: ✓ —a == b ⟺ (a+k) == (b+k)<homomorphic: ✓ —a < b ⟺ (a+k) < (b+k)+homomorphic: ✗ —encode(a+b) = a+b+k ≠ encode(a)+encode(b) = a+b+2k
Note: x * k for negative k reverses ordering (< not homomorphic).
Product (record)
Packs/unpacks multiple independent values into a fixed-size tuple.
- Encode:
Expr::Record— combine values into a tuple - Decode:
Expr::Project— extract by index
Bijective because each component is independent and individually recoverable. Used for embedded structs (fields flattened into columns).
Coproduct (tagged union)
Encodes/decodes a discriminated union (enum) where the discriminant partitions the domain into disjoint subsets.
- Encode:
Expr::Project— extract discriminant and per-variant fields - Decode:
Expr::Match— branch on discriminant, reconstruct variant viaExpr::Record
Bijective if and only if:
- Arms are exhaustive (cover all discriminant values)
- Arms are disjoint (no overlapping discriminants)
- Each arm’s body is individually a bijection
This is a coproduct of bijections: if f_i: A_i → B_i is a bijection for each
variant i, the combined mapping on the tagged union Σ_i A_i → Σ_i B_i is
also a bijection.
Operator Homomorphism
Operator inventory
Current Toasty binary operators (BinaryOp): ==, !=, <, <=, >, >=.
Arithmetic operators (+, -) are not yet in the AST but are needed for
computed fields and interval arithmetic.
For homomorphism analysis, != is the negation of ==, and >=/<= are
derivable from </>. So the independent set is: ==, <, +.
Per-primitive homomorphism
Type reinterpretation:
| Encoding | == | < | + |
|---|---|---|---|
| Timestamp ↔ String | ✓ | ✓ (¹) | ✗ |
| Uuid ↔ String | ✓ | ✗ | n/a |
| Uuid ↔ Bytes | ✓ | ✗ | n/a |
| Date ↔ String | ✓ | ✓ (¹) | ✗ |
| Time ↔ String | ✓ | ✓ (¹) | ✗ |
| DateTime ↔ String | ✓ | ✓ (¹) | ✗ |
| Zoned ↔ String | ✓ | ✗ | ✗ |
| Timestamp ↔ DateTime | ✓ | ✓ | ✓ |
| Timestamp ↔ Zoned | ✓ | ✓ | ✓ |
| Zoned ↔ DateTime | ✓ | ✓ | ✓ |
| Decimal ↔ String | ✓ | ✗ | ✗ |
| BigDecimal ↔ String | ✓ | ✗ | ✗ |
| Integer widening | ✓ | ✓ | ✓ |
(¹) Requires canonical fixed-width serialization format. Lexicographic ordering matches semantic ordering only if Toasty guarantees consistent formatting (no variable-length subsecond digits, no timezone offset variations, etc.).
All type reinterpretations are injective, so == is always preserved. < and
+ depend on whether the target type’s native operations align with the source
type’s semantics.
Affine transformations:
| Encoding | == | < | + |
|---|---|---|---|
x + k | ✓ | ✓ | ✗ |
x * k (k>0) | ✓ | ✓ | ✗ |
x * k (k<0) | ✓ | ✗ (reversed) | ✗ |
x * k + c | ✓ | ✓ if k>0 | ✗ |
Product (record):
| Operator | Homomorphic? |
|---|---|
== | ✓ — if each component preserves == |
< | conditional — requires lexicographic comparison and each component preserves < |
+ | ✓ — if each component preserves + (component-wise) |
Coproduct (tagged union):
| Operator | Homomorphic? |
|---|---|
== | ✓ — if discriminant + each arm preserves == |
< | generally ✗ — cross-variant comparison is usually meaningless |
+ | ✗ — arithmetic across variants undefined |
Homomorphism under composition
Sequential (g ∘ f): if both f and g are homomorphic for an operator,
so is the composition. Proof: a op b ⟺ f(a) op f(b) ⟺ g(f(a)) op g(f(b)).
Parallel/product ((f(a), g(b))): preserves == if both f and g do.
Preserves < only if tuple comparison is lexicographic and both preserve <.
Coproduct: preserves == if each arm does. Does not generally preserve <.
Cross-encoding comparisons
When two operands use different encodings (e.g., field₁ uses Timestamp→i64,
field₂ uses Timestamp→i64+offset), can_distribute does not directly apply.
The comparison encode₁(a) op encode₂(b) mixes two encodings and may not
preserve semantics.
Fallback: decode both to model space and compare there.
decode₁(col₁) op decode₂(col₂)
This always produces correct results but may require SQL CAST or application-side evaluation.
Database independence
can_distribute does not take a database parameter. Database capabilities
determine which bijection is selected (e.g., PostgreSQL has native timestamps
→ identity mapping; SQLite does not → Timestamp↔i64). Once the bijection is
chosen, can_distribute is purely a property of that bijection and the operator.
The only edge case is if two databases use the same types but their operators
behave differently (e.g., string collation affecting <). This can be handled by
treating such behavioral differences as part of the encoding rather than adding a
database parameter.
Precision / Domain Restriction
Lossy encodings like #[column(type = timestamp(2))] involve two distinct steps:
-
Domain restriction (lossy, write-time): the user’s full-precision value is truncated to the representable domain. This is many-to-one — multiple inputs collapse to the same output. It is not part of the mapping.
-
Encoding (bijective): over the restricted domain (values with ≤2 fractional digits), the mapping is a perfect bijection — lossless round-trip.
The mapping framework only governs step 2. Step 1 is a write-time concern:
when the user assigns a value, it gets projected into the representable domain.
Analogous to integer narrowing (i64 → i32): the mapping between i32 values
and the stored column is bijective; the loss happens if you store a value outside
i32 range.
Nullability
Option<T> with None → NULL is a coproduct bijection:
- Domain partition:
Option<T> = None | Some(T)— two disjoint cases. - Encoding:
None → NULL,Some(v) → encode(v)— each arm is individually bijective (unit↔NULL is trivially so;Somedelegates toT’s encoding). - Decoding:
NULL → None,non-NULL → Some(decode(v)).
This satisfies the coproduct conditions (exhaustive, disjoint, per-arm bijective).
NULL breaks standard ==
SQL uses three-valued logic: NULL = NULL evaluates to NULL (falsy), not
TRUE. This means the standard == operator is not homomorphic over the
nullable encoding — the model-level None == None is true, but
NULL = NULL is not.
NULL-safe operators
All Toasty target databases provide a NULL-safe equality operator:
| Database | Operator |
|---|---|
| PostgreSQL | IS NOT DISTINCT FROM |
| MySQL | <=> |
| SQLite | IS |
Using the NULL-safe operator restores == homomorphism:
a == b ⟺ encode(a) IS NOT DISTINCT FROM encode(b).
Operator mapping
This means homomorphism is not just a property of (encoding, operator) — it is
a property of the triple (encoding, model_op, storage_op). The lowerer may need
to emit a different SQL operator than the one the user wrote:
- Non-nullable field: model
==→ SQL= - Nullable field: model
==→ SQLIS NOT DISTINCT FROM(or<=>,IS)
can_distribute should return the storage-level operator to use, not just a
boolean. Signature sketch:
can_distribute(encoding, model_op) -> Option<storage_op>
None means the operator cannot be pushed to the DB. Some(op) means it can,
using the specified storage operator.
Ordering
NULL ordering is also database-specific (NULLS FIRST vs NULLS LAST). The
lowerer must ensure consistent behavior across backends, potentially by emitting
explicit NULLS FIRST/NULLS LAST clauses.
Lowering Algorithm
The lowerer transforms a model-level expression tree into a storage-level expression tree. The input contains field references and model-level literals. The output contains column references and storage-level values.
Core: lowering a binary operator
lower_binary_op(op, lhs, rhs):
// 1. Identify field references and look up their encodings
// from the schema/mapping.
lhs_encoding = lookup_encoding(lhs) if lhs is FieldRef, else None
rhs_encoding = lookup_encoding(rhs) if rhs is FieldRef, else None
// 2. Determine if the operator can distribute through the encoding.
// For single-column primitive encodings:
if both are FieldRefs with same encoding:
match can_distribute(encoding, op):
Some(storage_op):
// Both fields share the encoding — compare columns directly.
emit: column_lhs storage_op column_rhs
None:
// Decode both to model space.
emit: decode(column_lhs) op decode(column_rhs)
if one is FieldRef, other is Literal:
match can_distribute(field_encoding, op):
Some(storage_op):
// Encode the literal, compare in storage space.
emit: column storage_op encode(literal)
None:
// Decode the column to model space.
emit: decode(column) op literal
if both are Literals:
// Const-evaluate in model space.
emit: literal_lhs op literal_rhs
Encoding the literal
encode(literal) applies the field’s field_to_column bijection to the
model-level value, producing a storage-level value. For a UUID↔text encoding:
encode(UUID("abc-123")) → "abc-123".
Decoding the column
decode(column_ref) applies the field’s column_to_field bijection to the
column reference, wrapping it in the appropriate SQL expression. For UUID↔text:
decode(uuid_col) → CAST(uuid_col AS UUID).
If the database lacks the model type (e.g., SQLite has no UUID), decode is not expressible in SQL. The operation must be evaluated application-side or the query rejected.
Multi-column encodings (product / coproduct)
For fields that span multiple columns, == expands structurally:
lower_binary_op(==, coproduct_field, literal):
encoded = encode(literal)
// encoded is a tuple: (disc_val, col1_val, col2_val, ...)
// Expand into per-column comparisons:
result = TRUE
for each (column, encoded_value) in zip(field.columns, encoded):
col_encoding = encoding_for(column) // e.g., nullable text
match can_distribute(col_encoding, ==):
Some(storage_op):
result = result AND (column storage_op encoded_value)
None:
result = result AND (decode(column) == encoded_value)
emit: result
ORDER BY
lower_order_by(field):
encoding = lookup_encoding(field)
match can_distribute(encoding, <):
Some(_):
// Ordering is preserved in storage space.
emit: ORDER BY column
None:
// Must decode to model space for correct ordering.
emit: ORDER BY decode(column)
SELECT returning
Always decode — application needs model-level values:
lower_select_returning(field):
emit: decode(column) // column_to_field bijection
INSERT / UPDATE
Always encode — database needs storage-level values:
lower_insert_value(field, value):
emit: encode(value) // field_to_column bijection
Examples
WHERE uuid_col == UUID("abc-123"), UUID stored as text:
- LHS is FieldRef → encoding: UUID↔text, column:
uuid_col - RHS is literal:
UUID("abc-123") can_distribute(UUID↔text, ==)→Some(=)- Encode literal:
"abc-123" - Output:
uuid_col = 'abc-123'
WHERE uuid_col < UUID("abc-123"), UUID stored as text:
- LHS is FieldRef → encoding: UUID↔text, column:
uuid_col - RHS is literal:
UUID("abc-123") can_distribute(UUID↔text, <)→None- Decode column:
CAST(uuid_col AS UUID) - Output:
CAST(uuid_col AS UUID) < UUID('abc-123') - (If DB lacks UUID type → application-side evaluation or reject)
WHERE contact == Contact::Phone { number: "123" }, coproduct encoding:
- LHS is FieldRef → coproduct encoding, columns:
disc,phone_number,email_address - RHS is literal → encode:
(0, "123", NULL) - Expand per-column:
disc = 0(can_distribute(i64, ==)→Some(=))phone_number = '123'(can_distribute(nullable text, ==)→Some(=))email_address IS NULL(can_distribute(nullable text, ==)→Some(IS))
- Output:
disc = 0 AND phone_number = '123' AND email_address IS NULL
Schema Representation
Each field’s mapping is stored as a structured Bijection tree. This is the
single source of truth — encode/decode expressions are derived from it.
Bijection enum
#![allow(unused)]
fn main() {
enum Bijection {
/// No transformation — field type == column type.
Identity,
/// Lossless cast between two types with the same information content.
/// e.g., UUID↔text, Timestamp↔i64, integer widening.
Cast { from: Type, to: Type },
/// x*k + c (k ≠ 0). Inverse: (x - c) / k.
Affine { k: Value, c: Value },
/// Option<T> → nullable column.
/// Wraps an inner bijection with None↔NULL.
Nullable(Box<Bijection>),
/// Embedded struct → multiple columns.
/// Each component is an independent bijection on one field↔column pair.
Product(Vec<Bijection>),
/// Enum → discriminant column + per-variant columns.
Coproduct {
discriminant: Box<Bijection>,
variants: Vec<CoproductArm>,
},
/// Composition: apply `inner` first, then `outer`.
/// encode = outer.encode(inner.encode(x))
/// decode = inner.decode(outer.decode(x))
Compose {
inner: Box<Bijection>,
outer: Box<Bijection>,
},
}
struct CoproductArm {
discriminant_value: Value,
body: Bijection, // typically Product for data-carrying variants
}
}
Methods on Bijection
#![allow(unused)]
fn main() {
impl Bijection {
/// Encode a model-level value to a storage-level value.
fn encode(&self, value: Value) -> Value;
/// Produce a decode expression: given a column reference (or tuple of
/// column references), return a model-level expression.
fn decode(&self, column_expr: Expr) -> Expr;
/// Query whether `model_op` can be pushed through this encoding.
/// Returns the storage-level operator to use, or None if the
/// operation must fall back to model space.
fn can_distribute(&self, model_op: BinaryOp) -> Option<StorageOp>;
/// Number of columns this bijection spans.
fn column_count(&self) -> usize;
}
}
can_distribute is defined recursively:
- Identity: always
Some(model_op)— no transformation. - Cast: lookup in the per-pair homomorphism table.
- Affine:
==→Some(=).<→Some(<)if k > 0,Noneif k < 0. - Nullable: delegates to inner, may change op (e.g.,
==→IS NOT DISTINCT FROM). - Product:
==→Some(=)if all components returnSome.<→ only if lexicographic and all components support<. - Coproduct:
==→Someif discriminant + each arm returnsSome.<→ generallyNone. - Compose:
Someonly if both inner and outer returnSome.
Per-field mapping
#![allow(unused)]
fn main() {
struct FieldMapping {
bijection: Bijection,
columns: Vec<ColumnId>, // columns this field maps to (1 for primitive, N for product/coproduct)
}
}
The model-level mapping::Model holds a FieldMapping per field, plus a
reverse index from columns to fields (for computed field propagation).
Verification
The framework should be formally verified using Lean 4 + Mathlib. Mathlib already provides the algebraic vocabulary (bijections, homomorphisms, products, coproducts). The plan:
- Define the primitives and composition rules in Lean
- Prove the general theorems once (composition preserves bijection, coproduct conditions, etc.)
- For each concrete primitive, state and prove its homomorphism properties
- Lean checks everything mechanically
Engine-Level Pagination Design
Overview
This document describes the implementation of engine-level pagination in Toasty. The key principle is that pagination logic (limit+1 strategy, cursor extraction, etc.) should be handled by the engine, not in application-level code. This allows the engine to leverage database-specific capabilities (e.g., DynamoDB’s native cursor support) while providing compatibility for databases that don’t have native support (e.g., SQL databases).
Architecture Context
Statement System
toasty_core::stmt::Statementrepresents a superset of SQL - “Toasty-flavored SQL”- Contains both SQL concepts AND Toasty application-level concepts (models, paths, pagination)
Limit::PaginateForwardis a Toasty-level concept that must be transformed by the engine before reaching SQL generation- By the time statements reach
toasty-sql, they must contain ONLY valid SQL
Engine Pipeline
- Planner: Transforms Toasty statements into a pipeline of actions
- Actions: Executed by the engine, store results in VarStore
- VarStore: Stores intermediate results between pipeline steps
- ExecResponse: Final result containing values and optional metadata
Existing Patterns
- eval::Func: Pre-computed transformations that execute during pipeline execution
- partition_returning: Separates database-handled expressions from in-memory evaluations
- Output::project: Transforms raw database results before storing in VarStore
Design
Core Types
#![allow(unused)]
fn main() {
// In engine.rs
pub struct ExecResponse {
pub values: ValueStream,
pub metadata: Option<Metadata>,
}
pub struct Metadata {
pub next_cursor: Option<Expr>,
pub prev_cursor: Option<Expr>,
pub query: Query,
}
// In engine/plan/exec_statement.rs
pub struct ExecStatement {
pub input: Option<Input>,
pub output: Option<Output>,
pub stmt: stmt::Statement,
pub conditional_update_with_no_returning: bool,
/// Pagination configuration for this query
pub pagination: Option<Pagination>,
}
pub struct Pagination {
/// Original limit before +1 transformation
pub limit: u64,
/// Function to extract cursor from a row
/// Takes row as arg[0], returns cursor value(s)
pub extract_cursor: eval::Func,
}
}
VarStore Changes
The VarStore needs to be updated to store ExecResponse instead of ValueStream:
#![allow(unused)]
fn main() {
pub(crate) struct VarStore {
slots: Vec<Option<ExecResponse>>,
}
}
This allows pagination metadata to flow through the pipeline and be returned from engine::exec.
Implementation Plan
Phase 1: Update VarStore to ExecResponse [Mechanical Change]
This phase is a purely mechanical change to update the VarStore infrastructure. No pagination logic yet.
-
Update VarStore (
engine/exec/var_store.rs):- Change storage type from
ValueStreamtoExecResponse - Update
load()to returnExecResponse - Update
store()to acceptExecResponse - Update
dup()to clone entireExecResponse(including metadata)
- Change storage type from
-
Update all action executors to wrap their results in
ExecResponse:- For now, all actions will use
metadata: None - Each action’s result becomes:
ExecResponse { values, metadata: None } - Actions to update:
action_associateaction_batch_writeaction_delete_by_keyaction_exec_statementaction_find_pk_by_indexaction_get_by_keyaction_insertaction_query_pkaction_update_by_keyaction_set_var
- For now, all actions will use
-
Update pipeline execution (
engine/exec.rs):exec_pipelinereturnsExecResponse- Handle
VarStorereturningExecResponse
-
Update main engine (
engine.rs):exec::execnow returnsExecResponsedirectly- Remove the temporary wrapping logic
This phase establishes the infrastructure without any behavioral changes. All existing tests should continue to pass.
Phase 2: Add Pagination to ExecStatement [Task 2]
- Add
Paginationstruct toengine/plan/exec_statement.rs - Add
pagination: Option<Pagination>field toExecStatement - No execution changes yet - just the structure
Phase 3: Planner Support for SQL Pagination [Task 3]
In planner/select.rs, add pagination planning logic:
#![allow(unused)]
fn main() {
impl Planner<'_> {
fn plan_select_sql(...) {
// ... existing logic ...
// Handle pagination
let pagination = if let Some(Limit::PaginateForward { limit, after }) = &stmt.limit {
Some(self.plan_pagination(&mut stmt, &mut project, limit)?)
} else {
None
};
self.push_action(plan::ExecStatement {
input,
output: Some(plan::Output { var: output, project }),
stmt: stmt.into(),
conditional_update_with_no_returning: false,
pagination,
});
}
fn plan_pagination(
&mut self,
stmt: &mut stmt::Query,
project: &mut eval::Func,
limit_expr: &stmt::Expr,
) -> Result<Pagination> {
let original_limit = self.extract_limit_value(limit_expr)?;
// Get ORDER BY clause (required for pagination)
let order_by = stmt.order_by.as_ref()
.ok_or_else(|| anyhow!("Pagination requires ORDER BY"))?;
// Check if ORDER BY is unique
let is_unique = self.is_order_by_unique(order_by, stmt);
// If not unique, append primary key as tie-breaker
if !is_unique {
self.append_pk_to_order_by(stmt)?;
}
// Ensure ORDER BY fields are in returning clause
let (added_indices, original_field_count) =
self.ensure_order_by_in_returning(stmt)?;
// Build cursor extraction function
let extract_cursor = self.build_cursor_extraction_func(
stmt,
&added_indices,
)?;
// Modify project function if we added fields
if !added_indices.is_empty() {
self.adjust_project_for_pagination(
project,
original_field_count,
added_indices.len(),
);
}
// Transform limit to +1 for next page detection
*stmt.limit.as_mut().unwrap() = Limit::Offset {
limit: (original_limit + 1).into(),
offset: None,
};
Ok(Pagination {
limit: original_limit,
extract_cursor,
})
}
}
}
Key helper methods:
is_order_by_unique: Checks if ORDER BY fields form a unique constraintappend_pk_to_order_by: Adds primary key as tie-breakerensure_order_by_in_returning: Adds ORDER BY fields to SELECT if missingbuild_cursor_extraction_func: Createseval::Functo extract cursoradjust_project_for_pagination: Modifies project to filter out added fields
Phase 4: Executor Implementation [Task 4]
In engine/exec/exec_statement.rs:
#![allow(unused)]
fn main() {
impl Exec<'_> {
pub(super) async fn action_exec_statement(
&mut self,
action: &plan::ExecStatement,
) -> Result<()> {
// ... existing logic to execute statement ...
let res = if let Some(pagination) = &action.pagination {
self.handle_paginated_query(res, pagination, &action.stmt).await?
} else {
ExecResponse {
values: /* normal value stream */,
metadata: None,
}
};
self.vars.store(out.var, res);
Ok(())
}
async fn handle_paginated_query(
&mut self,
rows: Rows,
pagination: &Pagination,
stmt: &Statement,
) -> Result<ExecResponse> {
// Collect limit+1 rows
let mut buffer = Vec::new();
let mut count = 0;
match rows {
Rows::Values(stream) => {
for await value in stream {
buffer.push(value?);
count += 1;
if count > pagination.limit {
break;
}
}
}
_ => return Err(anyhow!("Pagination requires row results")),
}
// Check if there's a next page
let has_next = buffer.len() > pagination.limit as usize;
// Extract cursor if there's a next page
let next_cursor = if has_next {
// Get cursor from the LAST item we're keeping
let last_kept = &buffer[pagination.limit as usize - 1];
let cursor_value = pagination.extract_cursor.eval(&[last_kept.clone()])?;
// Truncate buffer to requested limit
buffer.truncate(pagination.limit as usize);
Some(stmt::Expr::Value(cursor_value))
} else {
None
};
Ok(ExecResponse {
values: ValueStream::from_vec(buffer),
metadata: Some(Metadata {
next_cursor,
prev_cursor: None, // TODO: implement in future
query: stmt.as_query().cloned().unwrap_or_default(),
}),
})
}
}
}
Phase 5: Clean Up Application Layer [Task 5]
Remove the limit+1 logic from Paginate::collect:
#![allow(unused)]
fn main() {
pub async fn collect(self, db: &Db) -> Result<Page<M>> {
// Simply delegate to db.paginate - engine handles pagination
db.paginate(self.query).await
}
}
Update Db::paginate to use the metadata from ExecResponse:
#![allow(unused)]
fn main() {
pub async fn paginate<M: Model>(&self, statement: stmt::Select<M>) -> Result<Page<M>> {
let exec_response = engine::exec(self, statement.untyped.clone().into()).await?;
// Convert value stream to models
let mut cursor = Cursor::new(self.schema.clone(), exec_response.values);
let mut items = Vec::new();
while let Some(item) = cursor.next().await {
items.push(item?);
}
// Extract pagination metadata
let (next_cursor, prev_cursor) = match exec_response.metadata {
Some(metadata) => (metadata.next_cursor, metadata.prev_cursor),
None => (None, None),
};
Ok(Page::new(items, statement, next_cursor, prev_cursor))
}
}
Key Design Decisions
-
Single Source of Truth: The
extract_cursorfunction is the only place that knows how to extract cursors. No redundantorder_by_indices. -
Type Safety: Cursor extraction function uses actual inferred types from the schema, not
Type::Any. -
Automatic Tie-Breaking: The planner automatically appends primary key to ORDER BY when needed for uniqueness.
-
Transparent Field Addition: ORDER BY fields are added to returning clause transparently, and filtered out via the project function.
-
Metadata Threading:
ExecResponseflows through VarStore, preserving metadata through the pipeline.
Testing Strategy
- Unit Tests: Test cursor extraction function generation
- Integration Tests: Test pagination with various ORDER BY configurations
- Database Tests: Ensure SQL generation is correct (no
PaginateForwardin SQL) - End-to-End Tests: Verify pagination works across different databases
Future Enhancements
- Previous Page Support: Implement
prev_cursorextraction andPaginateBackward - DynamoDB Native Pagination: Leverage LastEvaluatedKey instead of limit+1
- Complex ORDER BY: Support expressions beyond simple column references
- Optimization: Cache cursor extraction functions for common patterns
Serialized Field Implementation Design
Builds on the #[serialize] bookkeeping already in place (attribute parsing,
SerializeFormat enum, FieldPrimitive.serialize field). This document covers
the runtime serialization/deserialization codegen.
User-Facing API
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: uuid::Uuid,
name: String,
#[serialize(json)]
tags: Vec<String>,
// nullable: the column may be NULL. The Rust type must be Option<T>.
// None maps to NULL; Some(v) is serialized as JSON.
#[serialize(json, nullable)]
metadata: Option<HashMap<String, String>>,
// Non-nullable Option: the entire Option value is serialized as JSON.
// Some(v) → `v` as JSON, None → `null` as JSON text (column is NOT NULL).
#[serialize(json)]
extra: Option<String>,
}
}
Fields annotated with #[serialize(json)] are stored as JSON text in a single
database column. The field’s Rust type must implement serde::Serialize and
serde::DeserializeOwned. The database column type defaults to String/TEXT.
Nullability
By default, serialized fields are not nullable. The entire Rust value —
including Option<T> — is serialized as-is into JSON text stored in a NOT NULL
column. This means None becomes the JSON text null, and Some(v) becomes
the JSON serialization of v.
To make the database column nullable, add nullable to the attribute:
#[serialize(json, nullable)]. When nullable is set:
- The Rust type must be
Option<T>. Nonemaps to a SQLNULL(no value stored).Some(v)serializesvas JSON text.
This is an explicit opt-in because the two behaviors are meaningfully different:
a user may legitimately want to serialize None as JSON null text in a NOT
NULL column (e.g., for a JSON API field where null is a valid value distinct
from “no row”).
Value Encoding
A serialized field stores a JSON string in the database. The value stream uses
Value::String for serialized fields, not the field’s logical Rust type.
Rust value ──serde_json::to_string──► Value::String(json) ──► DB column (TEXT)
DB column (TEXT) ──► Value::String(json) ──serde_json::from_str──► Rust value
Schema Changes
For serialized fields, field_ty bypasses <T as Primitive>::field_ty() and
constructs FieldPrimitive directly with ty: Type::String. The user’s Rust
type T does not need to implement Primitive — it only needs Serialize +
DeserializeOwned.
Nullability is determined by the nullable flag in the attribute, not by
inspecting the Rust type.
Remove serialize from Primitive::field_ty
Today Primitive::field_ty accepts a serialize argument so it can thread
SerializeFormat into the FieldPrimitive it builds. With this design,
serialized fields never go through Primitive::field_ty — codegen constructs
the FieldPrimitive directly. That means the serialize parameter is dead
for all callers and should be removed.
#![allow(unused)]
fn main() {
// Primitive trait (before):
fn field_ty(
storage_ty: Option<db::Type>,
serialize: Option<SerializeFormat>,
) -> FieldTy;
// Primitive trait (after):
fn field_ty(storage_ty: Option<db::Type>) -> FieldTy;
}
The default implementation drops the serialize field from the constructed
FieldPrimitive (it is always None when going through the trait). Embedded
type overrides (Embed, enum) already ignore both parameters.
Codegen changes:
#![allow(unused)]
fn main() {
// Non-serialized field (calls through the trait):
field_ty = quote!(<#ty as Primitive>::field_ty(#storage_ty));
nullable = quote!(<#ty as Primitive>::NULLABLE);
// Serialized field (constructed directly):
field_ty = quote!(FieldTy::Primitive(FieldPrimitive {
ty: Type::String,
storage_ty: #storage_ty,
serialize: Some(SerializeFormat::Json),
}));
nullable = #serialize_nullable; // literal bool from attribute
}
No type-level hack is needed — the nullable flag is parsed from the attribute
at macro expansion time and threaded through to schema registration as a
literal bool.
Codegen Changes
Primitive::load / Model::load
For serialized fields, the generated load code reads a String from the record
and deserializes it. The behavior depends on whether nullable is set:
#![allow(unused)]
fn main() {
// Non-nullable (default) — works for any T including Option<T>:
field_name: {
let json_str = <String as Primitive>::load(record[i].take())?;
serde_json::from_str(&json_str)
.map_err(|e| Error::from_args(
format_args!("failed to deserialize field '{}': {}", "field_name", e)
))?
},
// Nullable (#[serialize(json, nullable)]) — T must be Option<U>:
field_name: {
let value = record[i].take();
if value.is_null() {
None
} else {
let json_str = <String as Primitive>::load(value)?;
Some(serde_json::from_str(&json_str)
.map_err(|e| Error::from_args(
format_args!("failed to deserialize field '{}': {}", "field_name", e)
))?)
}
},
}
Non-serialized fields are unchanged: <T as Primitive>::load(record[i].take())?.
Reload (root model and embedded)
Reload match arms follow the same pattern: load as String, then deserialize.
For nullable fields, check null first.
Create builder setters
Serialized field setters accept the concrete Rust type (not impl IntoExpr<T>,
since T does not implement IntoExpr) and serialize to a String expression:
#![allow(unused)]
fn main() {
// Non-nullable (default) — accepts T directly (including Option<T>):
pub fn field_name(mut self, field_name: FieldType) -> Self {
let json = serde_json::to_string(&field_name).expect("failed to serialize");
self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
self
}
// Nullable (#[serialize(json, nullable)]) — accepts Option<InnerType>:
pub fn field_name(mut self, field_name: Option<InnerType>) -> Self {
match &field_name {
Some(v) => {
let json = serde_json::to_string(v).expect("failed to serialize");
self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
}
None => {
self.stmt.set(index, Expr::<String>::from_value(Value::Null));
}
}
self
}
}
Update builder setters
Same pattern as create: accept the concrete type, serialize to JSON, store as
String expression.
Dependencies
serde_json is added as an optional dependency of the toasty crate, gated
behind the existing serde feature:
# crates/toasty/Cargo.toml
[features]
serde = ["dep:serde_core", "dep:serde_json"]
[dependencies]
serde_json = { workspace = true, optional = true }
Generated code references serde_json through the codegen support module:
#![allow(unused)]
fn main() {
// crates/toasty/src/lib.rs, in codegen_support
#[cfg(feature = "serde")]
pub use serde_json;
}
If a user writes #[serialize(json)] without enabling the serde feature, the
generated code fails to compile because codegen_support::serde_json does not
exist. The compiler error points at the generated serde_json::from_str call.
Files Modified
| File | Change |
|---|---|
crates/toasty/Cargo.toml | Add serde_json optional dep, update serde feature |
crates/toasty/src/lib.rs | Re-export serde_json in codegen_support |
crates/toasty/src/stmt/primitive.rs | Remove serialize param from Primitive::field_ty |
crates/toasty-macros/src/schema/field.rs | Parse nullable flag from #[serialize(...)] attribute |
crates/toasty-macros/src/expand.rs | Update Embed/enum field_ty overrides to drop serialize param |
crates/toasty-macros/src/expand/schema.rs | Construct FieldPrimitive directly for serialized fields; remove serialize arg from non-serialized field_ty call |
crates/toasty-macros/src/expand/embedded_enum.rs | Drop serialize arg from field_ty call |
crates/toasty-macros/src/expand/model.rs | Deserialize in expand_load_body() and expand_embedded_reload_body() |
crates/toasty-macros/src/expand/create.rs | Serialize in create setter for serialized fields |
crates/toasty-macros/src/expand/update.rs | Serialize in update setter, deserialize in reload arms |
crates/toasty-driver-integration-suite/Cargo.toml | Add serde, serde_json deps, enable serde feature |
crates/toasty-driver-integration-suite/src/tests/serialize.rs | Integration tests |
Integration Tests
New file serialize.rs in the driver integration suite. Test cases:
- Round-trip a
Vec<String>field through create and read-back - Round-trip a nullable
Option<T>field withSomeandNone(SQL NULL) values - Non-nullable
Option<T>field:Noneround-trips as JSONnulltext (not SQL NULL) - Update a serialized field and verify the new value persists
- Round-trip a custom struct with
serde::Serialize + DeserializeOwned
Static Assertions for create! Required Fields
The create! macro does not check that all required fields are specified.
Missing a required field compiles successfully but fails at runtime when the
database rejects a NULL value in a NOT NULL column. This design adds
compile-time checking so that omitting a required field is a compilation error.
Problem
Given these models:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: Id<User>,
name: String,
#[has_many]
todos: HasMany<Todo>,
}
#[derive(Model)]
struct Todo {
#[key]
#[auto]
id: Id<Todo>,
#[index]
user_id: Id<User>,
#[belongs_to(key = user_id, references = id)]
user: BelongsTo<User>,
title: String,
}
}
This compiles today but panics at runtime:
#![allow(unused)]
fn main() {
// Missing `name` — no compile error
let user = toasty::create!(User { }).exec(&mut db).await?;
}
Approach
Per-level validation with monomorphization
Each model carries a flat CreateMeta constant that lists only its own
fields — no pointers to other models’ metadata. Validation happens one
nesting level at a time, using the compiler’s type inference at each level
to resolve the target model.
This avoids const evaluation cycles entirely. A const in Rust must be
fully evaluated before it exists, so if User::CREATE_META contained a
&'static reference to Todo::CREATE_META and vice versa, the compiler
would detect a cycle and reject it. By keeping each model’s metadata flat
and resolving cross-model references at each nesting level through
monomorphization, no model’s const ever needs to reference another.
CreateMeta struct
A simple struct in toasty::schema::create_meta (re-exported through
codegen_support) describes the fields a model exposes on creation:
#![allow(unused)]
fn main() {
pub struct CreateMeta {
pub fields: &'static [CreateField],
pub model_name: &'static str,
}
pub struct CreateField {
pub name: &'static str,
pub required: bool,
}
}
Each field’s required flag is computed at compile time using the
Field::NULLABLE trait constant, so the proc macro does not need to parse
Option<T> syntactically:
#![allow(unused)]
fn main() {
// generated by #[derive(Model)]
const CREATE_META: CreateMeta = CreateMeta {
fields: &[
CreateField { name: "name", required: !<String as Field>::NULLABLE },
CreateField { name: "bio", required: !<Option<String> as Field>::NULLABLE },
],
model_name: "User",
};
}
<String as Field>::NULLABLE is false, so required is true.
<Option<String> as Field>::NULLABLE is true, so required is false.
A const fn helper performs the actual checking:
#![allow(unused)]
fn main() {
pub const fn assert_create_fields(meta: &CreateMeta, provided: &[&str]) {
// panics at compile time listing the missing field
}
}
This uses byte-level string comparison (str::as_bytes() in a while
loop) since const fn cannot call trait methods like PartialEq.
ValidateCreate trait
A #[doc(hidden)] trait carries the CreateMeta reference. This trait
is the single mechanism used for validation at every level — typed creates,
scoped creates, and nested creates all use it through monomorphization:
#![allow(unused)]
fn main() {
#[doc(hidden)]
pub trait ValidateCreate {
const CREATE_META: &'static CreateMeta;
}
}
The derive macro generates ValidateCreate impls for:
- Fields structs (
UserFields<Origin>,TodoFieldsList<Origin>) — so that nested field accessors likeUser::fields().todos()return a type that carries the target model’s metadata. - Relation scope types (
Many,One,OptionOne) — so that scoped expressions likeuser.todos()return a type that carries the target model’s metadata.
Each impl simply references the target model’s CREATE_META:
#![allow(unused)]
fn main() {
// On the fields struct for Todo (generated by derive)
impl<__Origin> ValidateCreate for TodoFieldsList<__Origin> {
const CREATE_META: &'static CreateMeta = &Todo::CREATE_META;
}
// On the relation scope type (generated by derive)
impl ValidateCreate for Many {
const CREATE_META: &'static CreateMeta = &Todo::CREATE_META;
}
}
Because ValidateCreate is separate from Scope and Model, it carries
no other obligations and can be implemented on any generated type without
affecting the existing trait hierarchy.
Model trait
CREATE_META remains an associated constant on Model as well. This is
the canonical owned constant — the ValidateCreate impls reference it:
#![allow(unused)]
fn main() {
pub trait Model {
// ...existing associated types and methods...
const CREATE_META: CreateMeta;
}
}
CREATE_META is removed from the Scope trait. Scoped validation now
goes through ValidateCreate instead.
Which fields are included
CreateMeta.fields contains all primitive fields that are:
- Not
#[auto] - Not
#[default(...)] - Not
#[update(...)]
Each of these fields has required set to !<T as Field>::NULLABLE, so
Option<T> fields are included but marked as not required.
These fields are always excluded from the list entirely:
- Relation fields (
BelongsTo,HasMany,HasOne) - FK source fields (fields referenced by a
#[belongs_to(key = ...)]on the same model)
FK source fields are excluded from CreateMeta.fields because in a
top-level create they are set implicitly when you provide the BelongsTo
relation. In a nested or scoped create the parent context fills them in.
For the models above:
| Model | Required | Not required | Excluded |
|---|---|---|---|
User | name | id (auto), todos (relation) | |
Todo | title | id (auto), user_id (FK source), user (relation) |
File layout
crates/toasty/src/schema/create_meta.rs — CreateMeta, CreateField, const fn helpers
crates/toasty/src/schema.rs — pub mod create_meta; pub use ...
crates/toasty/src/lib.rs — codegen_support re-exports
Typed creates
create!(User { name: "Alice" }) expands to:
#![allow(unused)]
fn main() {
{
const _CREATE: () = toasty::codegen_support::assert_create_fields(
&<User as toasty::codegen_support::Model>::CREATE_META,
&["name"],
);
User::create().name("Alice")
}
}
The const _CREATE: () block forces compile-time evaluation. If
assert_create_fields panics, the compiler reports the panic message as
an error at the create! call site.
Scoped creates
create!(in user.todos() { title: "buy milk" }) is harder because the
macro does not know the scope type — it only has the expression
user.todos().
The workaround uses monomorphization-time const evaluation. The macro
generates a local generic struct bounded on ValidateCreate whose
associated constant contains the assertion, then forces monomorphization by
calling a helper function that infers the type from the expression:
#![allow(unused)]
fn main() {
{
let __scope = user.todos();
struct __Check<__S: toasty::codegen_support::ValidateCreate>(
std::marker::PhantomData<__S>,
);
impl<__S: toasty::codegen_support::ValidateCreate> __Check<__S> {
const __ASSERT: () = toasty::codegen_support::assert_create_fields(
__S::CREATE_META,
&["title"],
);
}
fn __force_check<__S: toasty::codegen_support::ValidateCreate>(_: &__S) {
let _ = __Check::<__S>::__ASSERT;
}
__force_check(&__scope);
let __scope_fields = toasty::codegen_support::scope_fields(&__scope);
__scope.create().title("buy milk")
}
}
This works because user.todos() returns a type (e.g. todo::Many) that
implements ValidateCreate. When the compiler monomorphizes
__Check<todo::Many>::__ASSERT, it evaluates the const expression. If it
panics, the error points at the create! call site. No unstable features
required.
Nested creates
Nested creates use the same monomorphization trick, but through the fields structs rather than the scope expression. Consider:
#![allow(unused)]
fn main() {
create!(User { name: "Alice", todos: [{ title: "Do it" }] })
}
The create! macro expands this to:
#![allow(unused)]
fn main() {
{
// Level 0: validate User's fields directly (type is known)
const _CREATE: () = {
toasty::codegen_support::assert_create_fields(
&<User as toasty::codegen_support::Model>::CREATE_META,
&["name", "todos"],
);
};
let __fields = User::fields();
// Level 1: validate Todo's fields via monomorphization
// __fields.todos() returns TodoFieldsList<User>, which impls ValidateCreate
{
let __nested = __fields.todos();
struct __Check<__S: toasty::codegen_support::ValidateCreate>(
std::marker::PhantomData<__S>,
);
impl<__S: toasty::codegen_support::ValidateCreate> __Check<__S> {
const __ASSERT: () = toasty::codegen_support::assert_create_fields(
__S::CREATE_META,
&["title"],
);
}
fn __force<__S: toasty::codegen_support::ValidateCreate>(_: &__S) {
let _ = __Check::<__S>::__ASSERT;
}
__force(&__nested);
}
User::create()
.name("Alice")
.todos([__fields.todos().create().title("Do it")])
}
}
The key: User::fields().todos() returns TodoFieldsList<User>, which
implements ValidateCreate with CREATE_META = &Todo::CREATE_META. The
monomorphization trick infers the concrete type and evaluates the const
assertion for Todo’s fields.
Arbitrary nesting depth
Each nesting level is an independent const evaluation. For deeper nesting:
#![allow(unused)]
fn main() {
create!(User {
name: "Alice",
todos: [{
title: "Do it",
categories: [{ name: "Work" }]
}]
})
}
The macro emits three independent validation blocks:
- Level 0:
assert_create_fields(&User::CREATE_META, &["name", "todos"])— direct const, no monomorphization needed. - Level 1: monomorphize on
User::fields().todos()(which isTodoFieldsList<User>, targetingTodo) to check["title", "categories"]. - Level 2: monomorphize on
Todo::fields().categories()to check["name"].
No model’s CREATE_META ever references another model’s CREATE_META.
Each level resolves the target model through the type system at
monomorphization time, not through &'static pointers at const evaluation
time.
Why this avoids const cycles
The previous design embedded &'static CreateMeta pointers in a
CreateNested struct, so User::CREATE_META contained a reference to
Todo::CREATE_META and vice versa. This creates a const evaluation
cycle: the compiler must fully evaluate a const before it exists, but
evaluating User::CREATE_META requires Todo::CREATE_META which requires
User::CREATE_META.
The new design eliminates cross-model references entirely:
#![allow(unused)]
fn main() {
// User::CREATE_META — only knows about User's own fields
const CREATE_META: CreateMeta = CreateMeta {
fields: &[CreateField { name: "name", required: true }],
model_name: "User",
};
// Todo::CREATE_META — only knows about Todo's own fields
const CREATE_META: CreateMeta = CreateMeta {
fields: &[CreateField { name: "title", required: true }],
model_name: "Todo",
};
}
Cross-model resolution happens at monomorphization time through
ValidateCreate impls on the fields structs. Function definitions don’t
create const evaluation cycles — only const definitions that reference each
other do. So even for self-referential models:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Person {
#[key] #[auto] id: Id<Person>,
name: String,
#[has_many]
children: HasMany<Person>,
}
}
Person::CREATE_META contains only [CreateField { name: "name", ... }].
The derive generates ValidateCreate for PersonFieldsList<Origin> pointing
at &Person::CREATE_META. When the create! macro validates a nested
children: [{ name: "Kid" }], it monomorphizes through
Person::fields().children() which returns PersonFieldsList<Person>,
evaluating Person::CREATE_META — no cycle because Person::CREATE_META
doesn’t reference itself.
Batch and tuple creates
TypedBatch (User::[{ name: "A" }, { name: "B" }]): Each item in the
batch gets its own assertion since different items can specify different
field sets.
Tuple ((User { name: "A" }, Todo { title: "x" })): Each element is
a CreateItem and is checked independently.
Code generation changes
#[derive(Model)] changes
The derive macro generates:
-
CREATE_METAonimpl Model— a flatCreateMetacontaining only the model’s own primitive fields (filtered as described in “Which fields are included”). -
ValidateCreateimpls on the fields structs (UserFields<Origin>andUserFieldsList<Origin>) referencing&<Model>::CREATE_META. -
ValidateCreateimpls on the relation scope types (Many,One,OptionOne) referencing&<Model>::CREATE_META.
The Scope trait no longer carries CREATE_META.
create! macro changes
The expand function in create/expand.rs emits validation at each
nesting level:
- Typed top-level: a plain
constassertion using<Path as Model>::CREATE_METAdirectly. - Scoped top-level: a monomorphization block bounded on
ValidateCreate, inferring the type from the scope expression. - Each nested level: a monomorphization block bounded on
ValidateCreate, inferring the type from the fields struct accessor (e.g.User::fields().todos()).
The macro walks the parsed FieldSet tree recursively, emitting one
validation block per nesting level.
Example error messages
Missing a top-level field:
error[E0080]: evaluation panicked: missing required field `name` in create! for `User`
--> src/main.rs:10:5
|
10 | toasty::create!(User { })
| ^^^^^^^^^^^^^^^^^^^^^^^^^ evaluation of `_CREATE` failed inside this call
Missing a nested field:
error[E0080]: evaluation panicked: missing required field `title` in create! for `Todo`
--> src/main.rs:12:5
|
12 | toasty::create!(User { name: "Alice", todos: [{ }] })
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ evaluation of `_CREATE` failed inside this call
Limitations and future work
-
Embedded model fields are not included in
CreateMeta. Fields whose type implementsEmbed(via#[derive(Embed)]) are skipped because they are notFieldTy::Primitive. A future enhancement should include them. -
#[serialize]fields are excluded because their Rust types (e.g.Vec<String>,HashMap<K,V>, custom structs) do not implement theFieldtrait, so<T as Field>::NULLABLEcannot be evaluated. A future enhancement could infer nullability syntactically or introduce a separate trait bound for serialized fields. -
BelongsTorelation fields themselves are not checked. If you writecreate!(Todo { title: "x" })without providinguseroruser_id, it compiles but fails at the database. A future enhancement could add disjunction checking (requireuserORuser_idin top-level creates). In nested and scoped creates this is not a problem because the parent context provides the FK. -
Error messages include the field name but not a file/line pointer to the model definition. The Rust compiler’s error output shows the
create!call site, which is the actionable location.
Database Enum Types
Overview
Embedded enums with string labels use the best available enum representation for the target database by default. On databases with native enum types, Toasty uses them. On databases without native enums, Toasty falls back to string columns with constraints where possible, or plain string columns as a last resort.
No annotation is needed to get this behavior — the simplest enum definition gets the best storage automatically:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
Pending,
Active,
Done,
}
}
On PostgreSQL this creates a CREATE TYPE status AS ENUM type. On MySQL it
uses an inline ENUM(...) column. On SQLite it uses a TEXT column with a
CHECK constraint. On DynamoDB it stores a plain string.
Discriminant types
Toasty supports three discriminant storage strategies for embedded enums:
| Enum definition | Storage strategy |
|---|---|
| String labels (default or explicit) | Native enum representation per backend |
#[column(type = varchar)] or #[column(type = text)] | Plain string column, no DB-level enum enforcement |
#[column(variant = N)] with integers | INTEGER column |
Default: native enum
When an enum uses string labels (either default identifiers or explicit
#[column(variant = "label")]), Toasty uses native enum storage:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
Pending, // label: 'pending'
Active, // label: 'active'
Done, // label: 'done'
}
}
This is equivalent to writing #[column(type = enum)] explicitly.
Opting out: plain string column
Use #[column(type = varchar)] or #[column(type = text)] to store the
discriminant as a plain string column with no database-level enum type or
constraint:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = text)]
enum Status {
Pending,
Active,
Done,
}
}
This stores discriminants in a TEXT column. The database accepts any string value; Toasty is responsible for writing correct values. Use this when you need to interoperate with external tools that write directly to the table, or when you want to avoid database-level enum machinery for any reason.
Integer discriminants
Integer discriminants remain unchanged from existing behavior:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active,
#[column(variant = 3)]
Done,
}
}
This stores discriminants as an INTEGER column. Integer and string discriminants cannot be mixed in the same enum.
Variant labels
Toasty converts Rust variant identifiers to snake_case for database labels by default, following the same convention used for table and column names:
| Rust variant | Default label |
|---|---|
Pending | 'pending' |
InProgress | 'in_progress' |
AlmostDone | 'almost_done' |
Use #[column(variant = "label")] on individual variants to override the
default:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
#[column(variant = "pending")]
Pending,
#[column(variant = "active")]
Active,
#[column(variant = "done")]
Done,
}
}
Explicit labels and defaults can coexist:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
#[column(variant = "in_progress")]
InProgress, // stored as 'in_progress' (explicit)
Done, // stored as 'done' (default snake_case)
}
}
Database Support
The default native enum strategy adapts to each backend’s capabilities:
| Backend | Representation | Validation |
|---|---|---|
| PostgreSQL | CREATE TYPE ... AS ENUM (named type) | Database rejects invalid values |
| MySQL | Inline ENUM('a', 'b', 'c') column type | Database rejects invalid values |
| SQLite | TEXT column + CHECK constraint | Database rejects invalid values |
| DynamoDB | String attribute | No database-level validation (Toasty validates at the application level) |
PostgreSQL
Toasty creates a standalone named type with CREATE TYPE ... AS ENUM and
references it from column definitions.
MySQL
Toasty generates ENUM('a', 'b', 'c') as the column type. There is no
standalone named type. When the same Rust enum is used in multiple tables,
each table gets its own inline ENUM(...) definition.
SQLite
SQLite has no native enum type. Toasty stores the discriminant as a TEXT
column with a CHECK constraint that restricts values to the declared
labels:
CREATE TABLE tasks (
id INTEGER PRIMARY KEY,
status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'done'))
);
This gives database-level validation while remaining compatible with SQLite’s type system.
DynamoDB
DynamoDB has no column type system or constraint mechanism. Toasty stores the discriminant as a string attribute. Validation happens at the Toasty application level only — the database itself accepts any string value.
Generated SQL Schema
PostgreSQL
Toasty creates a PostgreSQL enum type named after the Rust enum in snake_case:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum OrderState {
#[column(variant = "new")]
New,
#[column(variant = "shipped")]
Shipped,
#[column(variant = "delivered")]
Delivered,
}
}
CREATE TYPE order_state AS ENUM ('new', 'shipped', 'delivered');
The discriminant column uses the enum type:
#![allow(unused)]
fn main() {
#[derive(toasty::Model)]
struct Order {
#[key]
#[auto]
id: i64,
state: OrderState,
}
}
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
state order_state NOT NULL
);
Customizing the PostgreSQL type name
To specify a custom name for the PostgreSQL enum type, use enum with a name
argument in the #[column(type = ...)] attribute:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = enum("order_status"))]
enum OrderState {
New,
Shipped,
Delivered,
}
}
CREATE TYPE order_status AS ENUM ('new', 'shipped', 'delivered');
Without this attribute, Toasty derives the type name from the Rust enum name in snake_case.
MySQL
MySQL enum types are defined inline on the column:
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
state ENUM('new', 'shipped', 'delivered') NOT NULL
);
The enum("name") syntax is ignored on MySQL since there is no standalone
type to name.
SQLite
SQLite uses a TEXT column with a CHECK constraint:
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
state TEXT NOT NULL CHECK (state IN ('new', 'shipped', 'delivered'))
);
Data-carrying enums
Data-carrying enums work the same way on all backends. The discriminant column uses the enum representation; variant fields remain as separate nullable columns:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = "email")]
Email { address: String },
#[column(variant = "phone")]
Phone { country: String, number: String },
}
}
PostgreSQL:
CREATE TYPE contact_method AS ENUM ('email', 'phone');
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
contact contact_method NOT NULL,
contact_email_address TEXT,
contact_phone_country TEXT,
contact_phone_number TEXT
);
MySQL:
CREATE TABLE users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
contact ENUM('email', 'phone') NOT NULL,
contact_email_address TEXT,
contact_phone_country TEXT,
contact_phone_number TEXT
);
SQLite:
CREATE TABLE users (
id INTEGER PRIMARY KEY,
contact TEXT NOT NULL CHECK (contact IN ('email', 'phone')),
contact_email_address TEXT,
contact_phone_country TEXT,
contact_phone_number TEXT
);
Migrations
Creating a new enum
When a model with a string-label enum is first migrated, Toasty issues the appropriate DDL.
PostgreSQL:
CREATE TYPE status AS ENUM ('pending', 'active', 'done');
CREATE TABLE tasks (
id BIGSERIAL PRIMARY KEY,
status status NOT NULL
);
MySQL:
CREATE TABLE tasks (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
status ENUM('pending', 'active', 'done') NOT NULL
);
SQLite:
CREATE TABLE tasks (
id INTEGER PRIMARY KEY,
status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'done'))
);
Label ordering
Database enum types have a declaration order that affects ORDER BY behavior.
Toasty manages this order with two rules:
- Initial creation: Labels are ordered by the Rust enum’s variant declaration order.
- Subsequent migrations: Toasty preserves the existing label order from the previous schema snapshot. New variants are appended at the end. Reordering variants in the Rust source does not trigger any DDL and does not change the database label order.
This means the label order is a one-time decision made at creation. If you need to change the order later, you must do so manually outside of Toasty.
Adding a variant
Adding a new variant to the Rust enum:
#![allow(unused)]
fn main() {
// Before
enum Status { Pending, Active, Done }
// After
enum Status { Pending, Active, Done, Cancelled }
}
New variants are appended after all existing labels, regardless of where they appear in the Rust enum definition.
PostgreSQL:
ALTER TYPE status ADD VALUE 'cancelled';
MySQL:
ALTER TABLE tasks MODIFY COLUMN status
ENUM('pending', 'active', 'done', 'cancelled') NOT NULL;
SQLite:
SQLite does not support ALTER TABLE ... ALTER COLUMN. Toasty uses its
existing table recreation strategy (create new table, copy data, drop old,
rename) to update the CHECK constraint with the new label list.
MySQL requires rewriting the full enum definition on every change. Both MySQL and SQLite rewrites are handled automatically, preserving the existing label order and appending the new label at the end.
Renaming a variant
Toasty does not support renaming enum variant labels. Changing a variant’s
#[column(variant = "...")] label is a migration error. To rename a label,
add the new variant, migrate existing data manually, then remove the old
variant (once variant removal is supported).
Removing a variant
Toasty does not support removing enum variants. Removing a variant from the Rust enum while the label still exists in the database schema is a migration error. Destructive schema changes like this require a broader design for handling data loss scenarios and are out of scope for this feature.
Converting from integer discriminants
Switching an existing enum from #[column(variant = N)] (INTEGER) to string
labels requires a migration that converts the column.
PostgreSQL:
CREATE TYPE status AS ENUM ('pending', 'active', 'done');
ALTER TABLE tasks
ALTER COLUMN status TYPE status USING (
CASE status
WHEN 1 THEN 'pending'
WHEN 2 THEN 'active'
WHEN 3 THEN 'done'
END
)::status;
The integer-to-label mapping comes from the previous schema snapshot stored in the migration state.
MySQL:
ALTER TABLE tasks MODIFY COLUMN status
ENUM('pending', 'active', 'done') NOT NULL;
MySQL’s MODIFY COLUMN handles the type change. For integer conversions,
Toasty issues an intermediate step to map integers to their label strings
before converting the column type.
Converting from plain string to native enum
Switching from #[column(type = text)] (plain string) to native enum
storage (removing the type override) requires converting the column.
PostgreSQL:
CREATE TYPE status AS ENUM ('pending', 'active', 'done');
ALTER TABLE tasks
ALTER COLUMN status TYPE status USING status::status;
MySQL:
ALTER TABLE tasks MODIFY COLUMN status
ENUM('pending', 'active', 'done') NOT NULL;
SQLite uses its table recreation strategy to replace the TEXT column with a TEXT + CHECK column.
Querying
The query API is the same regardless of discriminant type. Toasty handles the type casting internally:
#![allow(unused)]
fn main() {
// All of these work identically across all discriminant types
Task::filter(Task::fields().status().eq(Status::Active))
Task::filter(Task::fields().status().is_pending())
Task::filter(Task::fields().status().ne(Status::Done))
Task::filter(Task::fields().status().in_list([Status::Pending, Status::Active]))
}
SQL generated for queries
Queries compare against the enum label as a string literal:
-- .eq(Status::Active)
SELECT * FROM tasks WHERE status = 'active';
-- .in_list([Status::Pending, Status::Active])
SELECT * FROM tasks WHERE status IN ('pending', 'active');
This works across all backends. On PostgreSQL and MySQL the database casts the string literal to the enum type automatically. On SQLite and DynamoDB the column is already a string.
Ordering
Toasty does not support ordering comparisons (>, <, etc.) on enum fields.
The query API provides eq, ne, in_list, and variant checks only.
PostgreSQL and MySQL define a sort order for enum values based on their
position in the type definition, not alphabetically. SQLite and DynamoDB
sort enum columns as plain strings (lexicographic). Toasty does not expose
or manage this ordering. Users who query the database directly should be
aware that ORDER BY behavior on enum columns varies by backend.
Inserting
Inserts supply the label as a string literal on all backends:
INSERT INTO tasks (status) VALUES ('pending');
Compile-Time Validation
| Condition | Result |
|---|---|
| All string or default labels | Valid (native enum storage) |
#[column(type = text)] or #[column(type = varchar)] | Valid (plain string storage) |
#[column(variant = N)] with integers | Valid (integer storage) |
| Mix of integer and string variant values | Compile error |
| Duplicate labels (including derived defaults) | Compile error |
Empty string label #[column(variant = "")] | Compile error |
| Label longer than 63 bytes | Compile error (PostgreSQL’s NAMEDATALEN limit) |
Portability
Native enum storage works across all backends. Each backend uses its best available representation (see Database Support). You can develop against SQLite locally and deploy to PostgreSQL or MySQL without changing the enum definition.
The difference between native enum storage and plain string storage
(#[column(type = text)]) is that native enum adds database-level validation
where the backend supports it. The stored values are string labels in both
cases — there is no data incompatibility between them.
Shared enum types
Multiple models can reference the same enum.
On PostgreSQL, Toasty creates the CREATE TYPE once and reuses it across
tables:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Priority { Low, Medium, High }
#[derive(toasty::Model)]
struct Task {
#[key] #[auto] id: i64,
priority: Priority,
}
#[derive(toasty::Model)]
struct Bug {
#[key] #[auto] id: i64,
priority: Priority,
}
}
PostgreSQL:
CREATE TYPE priority AS ENUM ('low', 'medium', 'high');
CREATE TABLE tasks (
id BIGSERIAL PRIMARY KEY,
priority priority NOT NULL
);
CREATE TABLE bugs (
id BIGSERIAL PRIMARY KEY,
priority priority NOT NULL
);
MySQL:
CREATE TABLE tasks (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
priority ENUM('low', 'medium', 'high') NOT NULL
);
CREATE TABLE bugs (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
priority ENUM('low', 'medium', 'high') NOT NULL
);
Toasty tracks that the PostgreSQL type already exists and does not attempt to create it twice during migrations. On MySQL each table carries its own inline definition.
Examples
Unit enum with defaults
#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
enum Color {
Red,
Green,
Blue,
}
#[derive(Debug, toasty::Model)]
struct Widget {
#[key]
#[auto]
id: i64,
name: String,
color: Color,
}
}
PostgreSQL:
CREATE TYPE color AS ENUM ('red', 'green', 'blue');
CREATE TABLE widgets (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
color color NOT NULL
);
-- Insert
INSERT INTO widgets (name, color) VALUES ('Sprocket', 'red');
-- Query
SELECT * FROM widgets WHERE color = 'green';
Unit enum with explicit labels
#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
enum Status {
#[column(variant = "pending")]
Pending,
#[column(variant = "active")]
Active,
#[column(variant = "done")]
Done,
}
#[derive(Debug, toasty::Model)]
struct Task {
#[key]
#[auto]
id: i64,
title: String,
status: Status,
}
}
PostgreSQL:
CREATE TYPE status AS ENUM ('pending', 'active', 'done');
CREATE TABLE tasks (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
status status NOT NULL
);
MySQL:
CREATE TABLE tasks (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
title TEXT NOT NULL,
status ENUM('pending', 'active', 'done') NOT NULL
);
Unit enum with plain string storage
#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
#[column(type = text)]
enum Status {
#[column(variant = "pending")]
Pending,
#[column(variant = "active")]
Active,
#[column(variant = "done")]
Done,
}
}
-- Same on all SQL backends
CREATE TABLE tasks (
id ... PRIMARY KEY,
status TEXT NOT NULL
);
No enum type or CHECK constraint is created. The column is a plain TEXT.
Data-carrying enum
#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
enum ContactMethod {
#[column(variant = "email")]
Email { address: String },
#[column(variant = "phone")]
Phone { country: String, number: String },
}
#[derive(Debug, toasty::Model)]
struct User {
#[key]
#[auto]
id: i64,
name: String,
contact: ContactMethod,
}
}
#![allow(unused)]
fn main() {
// Create
let user = User::create()
.name("Alice")
.contact(ContactMethod::Email { address: "alice@example.com".into() })
.exec(&mut db)
.await?;
// Query
let email_users = User::filter(User::fields().contact().is_email())
.exec(&mut db)
.await?;
// Update
user.update()
.contact(ContactMethod::Phone {
country: "US".into(),
number: "555-0100".into(),
})
.exec(&mut db)
.await?;
}
Toasty ORM - Development Roadmap
This roadmap outlines potential enhancements and missing features for the Toasty ORM.
Overview
Toasty is an easy-to-use ORM for Rust that supports both SQL and NoSQL databases. This roadmap documents potential future work and feature gaps.
Feature Areas
Composite Keys
Composite Key Support (partial implementation)
- Composite foreign key optimization in query simplification
- Composite PK handling in expression rewriting and IN-list operations
- HasMany/BelongsTo relationships with composite foreign keys referencing composite primary keys
- Junction table / many-to-many patterns with composite keys
- DynamoDB driver: batch delete/update with composite keys, composite unique indexes
- Comprehensive test coverage for all composite key combinations
Query Capabilities
Query Ordering, Limits & Pagination
- Multi-column ordering convenience method (
.then_by()) - Direct
.limit()method for non-paginated queries .last()convenience method
- String operations: contains, starts with, ends with, LIKE (partial AST support)
- NOT IN
- Case-insensitive matching
- BETWEEN / range queries
- Relation filtering (filter by associated model fields)
- Field-to-field comparison
- Arithmetic operations in queries (add, subtract, multiply, divide, modulo)
- Aggregate queries and GROUP BY / HAVING
Data Types
Extended Data Types
- Embedded struct & enum support (partial implementation)
- Serde-serialized types (JSON/JSONB columns for arbitrary Rust types)
- Embedded collections (arrays, maps, sets, etc.)
Relationships & Loading
Partial Model Loading
- Allow models to have fields that are not loaded by default (e.g. a large
bodycolumn on anArticlemodel) - Fields opt-in via a
#[deferred]attribute and must be wrapped in aDeferred<T>type - By default, queries skip deferred fields; callers opt-in with
.include(Article::body)(same API as relation preloading) - Accessing a
Deferred<T>that was not loaded either returns an error or panics with a clear message - Works with primitive types, embedded structs, and embedded enums — just a subset of columns in the same table
#![allow(unused)] fn main() { #[toasty::model] struct Article { #[key] #[auto] id: u64, title: String, author: BelongsTo<User>, #[deferred] body: Deferred<String>, // not loaded unless explicitly included } // Load metadata only (no body column fetched) let articles = Article::all().collect(&db).await?; // Load with body let articles = Article::all().include(Article::body).collect(&db).await?; }
Relationships
- Many-to-many relationships
- Polymorphic associations
- Nested preloading (multi-level
.include()support)
Query Building
Query Features
- Subquery improvements
- Better conditional/dynamic query building ergonomics
Database Function Expressions
- Allow database-side functions (e.g.
NOW(),CURRENT_TIMESTAMP) as expressions in create and update operations - User API: field setters accept
toasty::stmthelpers liketoasty::stmt::now()that resolve tocore::stmt::ExprFuncvariants#![allow(unused)] fn main() { // Set updated_at to the database's current time instead of a Rust-side value user.update() .updated_at(toasty::stmt::now()) .exec(&db) .await?; // Also usable in create operations User::create() .name("Alice") .created_at(toasty::stmt::now()) .exec(&db) .await?; } - Extend
ExprFuncenum intoasty-corewith new function variants (e.g.Now) - SQL serialization for each function across supported databases (
NOW()for PostgreSQL/MySQL,datetime('now')for SQLite) - Codegen: update field setter generation to accept both value types and function expressions
- Future: support additional scalar functions (e.g.
COALESCE,LOWER,UPPER,LENGTH)
Raw SQL Support
- Execute arbitrary SQL statements directly
- Parameterized queries with type-safe bindings
- Raw SQL fragments within typed queries (escape hatch for complex expressions)
Data Modification
Upsert
- Insert-or-update: atomic
INSERT ... ON CONFLICT DO UPDATE(PostgreSQL/SQLite),ON DUPLICATE KEY UPDATE(MySQL),MERGE(SQL Server/Oracle) - Insert-or-ignore (
DO NOTHING/INSERT IGNORE) - Conflict target: by column(s), by constraint name, partial indexes (PostgreSQL)
- Column update control: update all non-key columns, named subset, or raw SQL expression
- Access to the proposed row via
EXCLUDEDpseudo-table in the update expression - Bulk upsert (multi-row
VALUES) - DynamoDB:
PutItem(unconditional replace) vs.UpdateItemwith condition expression
Mutation Result Information
- Return affected row counts from update operations (how many records were updated)
- Return affected row counts from delete operations (how many records were deleted)
- Better result types that provide operation metadata
- Distinguish between “no rows matched” vs “rows matched but no changes needed”
Transactions
Atomic Batch Operations
- Cross-database atomic batch API
- Supported across SQL and NoSQL databases
- Type-safe operation batching
- All-or-nothing semantics
SQL Transaction API
- Manual transaction control for SQL databases
- BEGIN/COMMIT/ROLLBACK support
- Savepoints and nested transactions
- Isolation level configuration
Schema Management
Migrations
- Schema migration system
- Migration generation
- Rollback support
- Schema versioning
- CLI tools for schema management
Toasty Runtime Improvements
Concurrent Task Execution
- Replace the current ad-hoc background task with a proper in-flight task manager
- Execute independent parts of an execution plan concurrently
- Track and coordinate multiple in-flight tasks within a single query execution
Cancellation & Cleanup
- Detect when the caller drops the future representing query completion
- Perform clean cancellation on drop (rollback any incomplete transactions)
- Ensure no resource leaks or orphaned database state on cancellation
Internal Instrumentation & Metrics
- Instrument time spent in each execution phase (planning, simplification, execution, serialization)
- Track CPU time consumed by query planning to detect expensive plans
- Provide internal metrics for diagnosing performance bottlenecks
Performance
- Dedicated post-lowering optimization pass for expensive predicate analysis (run once, not per-node)
- Equivalence classes for transitive constraint reasoning (
a = b AND b = 5impliesa = 5) - Structured constraint representation (constant bindings, range bounds, exclusion sets)
- Targeted predicate normalization without full DNF conversion
Stored Procedures (Pre-Compiled Query Plans)
- Compile query plans once and execute them many times with different parameter values
- Skip the full compilation pipeline (simplification, lowering, HIR/MIR planning) on repeated calls
- Parameterized statement AST with
Paramslots for value substitution at execution time - Pairs with database-level prepared statements for end-to-end optimization
Optimization Features
- Bulk inserts/updates
- Query caching
- Connection pooling improvements
Developer Experience
Ergonomic Macros
toasty::query!()- Succinct query syntax that translates to builder DSL#![allow(unused)] fn main() { // Instead of: User::all().filter(...).order_by(...).collect(&db).await toasty::query!(User, filter: ..., order_by: ...).collect(&db).await }toasty::create!()- Concise record creation syntax#![allow(unused)] fn main() { // Instead of: User::create().name("Alice").age(30).exec(&db).await toasty::create!(User, name: "Alice", age: 30).exec(&db).await }toasty::update!()- Simplified update syntax#![allow(unused)] fn main() { // Instead of: user.update().name("Bob").age(31).exec(&db).await toasty::update!(user, name: "Bob", age: 31).exec(&db).await }
Tooling & Debugging
- Query logging
Safety & Security
Sensitive Value Flagging
- Flag sensitive fields (e.g. passwords, tokens, secrets) so they are automatically redacted in logs and debug output
- Attribute-based opt-in:
#[sensitive]on model fields marks values that must never appear in plaintext outside the database - All logging, query tracing, and error messages strip or mask flagged values
- Prevents accidental credential leakage in application logs, query dumps, and diagnostics
Trusted vs Untrusted Input
- Distinguish between values originating from untrusted user input and values produced internally by the query engine (e.g. literal numbers, generated keys)
- Engine-produced values can skip escaping/parameterization since they are known-safe, reducing unnecessary overhead
- Untrusted input continues to be parameterized or escaped to prevent SQL injection
- Enables more efficient SQL generation without weakening safety guarantees for external data
Notes
The roadmap documents describe potential enhancements and missing features. For information about what’s currently implemented, refer to the user guide or test the API directly.
Composite Key Support
Overview
Toasty has partial composite key support. Basic CRUD operations work for models with composite primary keys (both field-level #[key] and model-level #[key(partition = ..., local = ...)]), but several engine optimizations, relationship patterns, and driver operations panic or fall back when encountering composite keys.
This document catalogs the gaps, surveys how other ORMs handle composite keys, identifies common SQL patterns that require composite key support, and proposes a phased implementation plan.
Current State
What Works
Schema definition — Two syntaxes for composite keys:
#![allow(unused)]
fn main() {
// Field-level: multiple #[key] attributes
#[derive(Debug, toasty::Model)]
struct Foo {
#[key]
one: String,
#[key]
two: String,
}
// Model-level: partition/local keys (designed for DynamoDB compatibility)
#[derive(Debug, toasty::Model)]
#[key(partition = user_id, local = id)]
struct Todo {
#[auto]
id: uuid::Uuid,
user_id: uuid::Uuid,
title: String,
}
}
Generated query methods for composite keys:
filter_by_<field1>_and_<field2>()— filter by both key fieldsget_by_<field1>_and_<field2>()— get a single record by both keysfilter_by_<partition_field>()— filter by partition key alone- Comparison operators on local keys:
gt(),ge(),lt(),le(),ne(),eq()
Database support:
- SQL databases (SQLite, PostgreSQL, MySQL): composite primary keys via field-level
#[key] - DynamoDB: partition/local key syntax (max 2 keys: 1 partition + 1 local)
Test coverage:
one_model_query— partition/local key queries with range operatorshas_many_crud_basic::has_many_when_fk_is_composite— HasMany with composite FK (working)embedded— composite keys with embedded struct fieldsexamples/composite-key/— end-to-end example application
What Does Not Work
The following locations contain todo!(), assert!(), or panic!() that block composite key usage:
Engine Simplification (5 locations)
| File | Line | Issue |
|---|---|---|
engine/simplify/expr_binary_op.rs | 23-25 | todo!("handle composite keys") when simplifying equality on model references with composite PKs |
engine/simplify/expr_binary_op.rs | 43-45 | todo!("handle composite keys") when simplifying binary ops on composite FK fields |
engine/simplify/expr_in_list.rs | 30-32 | todo!() when optimizing IN-list expressions for models with composite PKs |
engine/simplify/lift_in_subquery.rs | 92-96 | assert_eq!(len, 1, "TODO: composite keys") — subquery lifting restricted to single-field FKs |
engine/simplify/lift_in_subquery.rs | 109-111, 145-148, 154-157 | Three more todo!("composite keys") in BelongsTo and HasOne subquery lifting |
engine/simplify/rewrite_root_path_expr.rs | 18-19 | todo!("composite primary keys") when rewriting path expressions with key constraints |
Engine Lowering (2 locations)
| File | Line | Issue |
|---|---|---|
engine/lower/insert.rs | 90-92 | todo!() when lowering inserts with BelongsTo relations that have composite FKs |
engine/lower.rs | 893-896 | Unhandled else branch when lowering relationships with composite FKs |
DynamoDB Driver (4 locations)
| File | Line | Issue |
|---|---|---|
driver-dynamodb/op/update_by_key.rs | 197 | assert!(op.keys.len() == 1) — batch update limited to single key |
driver-dynamodb/op/delete_by_key.rs | 119-121 | panic!("only 1 key supported so far") — batch delete limited to single key |
driver-dynamodb/op/delete_by_key.rs | 33 | panic!("TODO: support more than 1 unique index") |
driver-dynamodb/op/create_table.rs | 113 | assert_eq!(1, index.columns.len()) — composite unique indexes unsupported |
Stubbed Tests (2 tests)
| File | Test | Status |
|---|---|---|
has_many_crud_basic.rs | has_many_when_pk_is_composite | Empty — not implemented |
has_many_crud_basic.rs | has_many_when_fk_and_pk_are_composite | Empty — not implemented |
Design Constraints
- Auto-increment is intentionally forbidden with composite keys. The schema verifier rejects
#[auto(increment)]on composite PK tables. UUID auto-generation is the supported alternative. - DynamoDB limits composite keys to 2 columns (1 partition + 1 local). This is a DynamoDB limitation, not a Toasty limitation.
How Other ORMs Handle Composite Keys
Rust ORMs
Diesel — First-class composite key support. #[diesel(primary_key(col1, col2))] on the struct; find() accepts a tuple (val1, val2); Identifiable returns a tuple reference. BelongsTo works with composite keys via explicit foreign_key attribute. Compile-time type checking through generated code.
SeaORM — Supports composite keys via multiple #[sea_orm(primary_key)] field attributes. PrimaryKeyTrait::ValueType is a tuple. find_by_id() and delete_by_id() accept tuples. DAO pattern works fully. Composite foreign keys are less ergonomic but functional.
Python ORMs
SQLAlchemy — Gold standard for composite key support. Multiple primary_key=True columns define a composite PK. session.get(Model, (a, b)) for lookups. ForeignKeyConstraint at the table level handles composite FKs cleanly. Identity map uses tuples. All features (eager/lazy loading, cascades, relationships) work uniformly with composite keys.
Django — Added CompositePrimaryKey in Django 5.2 (2025) after years of surrogate-key-only design. pk returns a tuple. Model.objects.get(pk=(1, 2)) works. Composite FK support is still limited. Ecosystem (admin, REST frameworks, third-party packages) is catching up.
Tortoise ORM — No composite PK support. Surrogate key + unique constraint is the only option.
JavaScript/TypeScript ORMs
Prisma — @@id([field1, field2]) defines composite PKs. Auto-generates compound field names (field1_field2) for findUnique/update/delete. Multi-field @relation(fields: [...], references: [...]) for composite FKs. Fully type-safe generated client.
TypeORM — Multiple @PrimaryColumn() decorators. All operations use object-based where clauses ({ field1: val1, field2: val2 }). @JoinColumn accepts an array for composite FKs. save() does upsert based on all PK fields.
Sequelize — Supports composite PK definition but findByPk() does not work with composite keys (must use findOne({ where })). Composite FK support requires workarounds or raw SQL.
Drizzle — primaryKey({ columns: [col1, col2] }) in the table config callback. foreignKey({ columns: [...], foreignColumns: [...] }) for composite FKs. No special find-by-PK method; all queries use explicit where + and(). SQL-first philosophy.
Java/Kotlin
Hibernate/JPA — Two approaches: @IdClass (flat fields + separate ID class) and @EmbeddedId (nested object). PK class must implement Serializable, equals(), hashCode(). @JoinColumns (plural) for composite FKs. @MapsId connects relationship fields to embedded ID fields. Full relationship support.
Exposed (Kotlin) — PrimaryKey(col1, col2) in the table object. Only the DSL (SQL-like) API supports composite keys; the DAO (EntityClass) API does not. Relationships require manual joins.
Go ORMs
GORM — Multiple gorm:"primaryKey" tags. Composite FKs via foreignKey:Col1,Col2;references:Col1,Col2. Zero-value problem: PK column with value 0 is treated as “not set.”
Ent — No composite PK support by design (graph semantics, every node has a single ID). Unique composite indexes are the workaround.
Ruby
ActiveRecord (Rails 7.1+) — primary_key: [:col1, :col2] in migrations, self.primary_key = [:col1, :col2] in model. find([a, b]) for lookups. query_constraints: [:col1, :col2] for composite FK associations. Pre-7.1 required the composite_primary_keys gem.
Cross-ORM Summary
| ORM | Composite PK | Composite FK | Find by PK | Relationship Support |
|---|---|---|---|---|
| Diesel (Rust) | Yes | Yes | Tuple | Full |
| SeaORM (Rust) | Yes | Partial | Tuple | Full |
| SQLAlchemy (Python) | Yes | Yes | Tuple | Full |
| Django (Python) | 5.2+ | Limited | Tuple | Partial |
| Prisma (TS) | Yes | Yes | Generated compound | Full |
| TypeORM (TS) | Yes | Yes | Object | Full |
| Sequelize (JS) | Yes | Partial | Broken | Partial |
| Drizzle (TS) | Yes | Yes | Manual where | Manual |
| Hibernate/JPA | Yes | Yes | ID class | Full |
| GORM (Go) | Yes | Yes | Where clause | Full |
| ActiveRecord (Ruby) | 7.1+ | 7.1+ | Array | Partial |
Key takeaway: Mature ORMs (Diesel, SQLAlchemy, Hibernate) treat composite keys as first-class citizens where all operations work uniformly. The most common API pattern is tuple-based identity (find((a, b))). Composite foreign keys are universally harder than composite PKs — even established ORMs have rougher edges there.
Common SQL Patterns Requiring Composite Keys
1. Junction Tables (Many-to-Many)
The most common use case. The junction table’s PK is the combination of FKs to both related tables.
CREATE TABLE enrollment (
student_id INTEGER NOT NULL REFERENCES student(id),
course_id INTEGER NOT NULL REFERENCES course(id),
enrolled_at TIMESTAMP DEFAULT NOW(),
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id)
);
Junction tables often accumulate extra attributes (grade, enrolled_at, role) that make them first-class entities requiring full CRUD support, not just a hidden link table.
Toasty gap: Many-to-many relationships are listed as a separate roadmap item. Composite key support is a prerequisite — junction tables are inherently composite-keyed.
2. Multi-Tenant Data Isolation
Tenant ID appears as the first column in every composite PK, enabling partition-level isolation and efficient tenant-scoped queries.
CREATE TABLE tenant_document (
tenant_id UUID NOT NULL REFERENCES tenant(id),
document_id UUID NOT NULL DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
PRIMARY KEY (tenant_id, document_id)
);
-- All queries are scoped: WHERE tenant_id = $1 AND ...
Why composite PKs: Enforces isolation at the database level. PK index prefix enables efficient tenant-scoped queries. Maps directly to DynamoDB’s partition/local key model.
Toasty gap: The #[key(partition = ..., local = ...)] syntax already models this. The gaps are in relationship handling when both sides use composite keys.
3. Time-Series Data
CREATE TABLE sensor_reading (
sensor_id INTEGER NOT NULL,
recorded_at TIMESTAMP NOT NULL,
value DOUBLE PRECISION NOT NULL,
PRIMARY KEY (sensor_id, recorded_at)
);
Why composite PKs: Natural ordering by sensor then time. Range scans on recorded_at within a sensor are efficient. Supports table partitioning by time ranges.
4. Hierarchical Data (Closure Table)
CREATE TABLE category_closure (
ancestor_id INTEGER NOT NULL REFERENCES category(id),
descendant_id INTEGER NOT NULL REFERENCES category(id),
depth INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (ancestor_id, descendant_id)
);
5. Composite Foreign Keys Referencing Composite PKs
A child table references a parent with a composite PK — all parent PK columns appear in the child as FK columns.
CREATE TABLE order_item (
order_id INTEGER NOT NULL REFERENCES "order"(id),
item_number INTEGER NOT NULL,
PRIMARY KEY (order_id, item_number)
);
CREATE TABLE order_item_shipment (
id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL,
item_number INTEGER NOT NULL,
shipment_id INTEGER NOT NULL REFERENCES shipment(id),
FOREIGN KEY (order_id, item_number)
REFERENCES order_item(order_id, item_number)
);
Toasty gap: This is the hardest pattern. The engine simplification and lowering layers assume single-field FKs in multiple places. Fixing this is the core of the composite key work.
6. Versioned Records
CREATE TABLE document_version (
document_id INTEGER NOT NULL REFERENCES document(id),
version INTEGER NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (document_id, version)
);
7. Composite Unique Constraints vs Composite Primary Keys
Some applications prefer a surrogate PK with a composite unique constraint:
-- Surrogate PK + composite unique
CREATE TABLE enrollment (
id SERIAL PRIMARY KEY,
student_id INTEGER NOT NULL,
course_id INTEGER NOT NULL,
UNIQUE (student_id, course_id)
);
Trade-offs: surrogate PKs simplify FKs (single column) and URL design, but composite PKs are more storage-efficient and semantically meaningful. ORMs that don’t support composite PKs (Django pre-5.2, Tortoise, Ent) force the surrogate pattern.
Toasty should support both patterns — composite PKs for direct use and composite unique constraints for the surrogate approach.
Implementation Plan
Phase 1: Engine Simplification — Composite PK/FK Handling
Fix the todo!() panics in the engine simplification layer so that queries involving composite keys pass through without crashing, even if not fully optimized.
Files:
engine/simplify/expr_binary_op.rs— Handle composite PKs and FKs in equality simplification. For composite keys, generate an AND of per-field comparisons.engine/simplify/expr_in_list.rs— Handle IN-list for composite PKs. Generate(col1, col2) IN ((v1, v2), (v3, v4))or equivalent AND/OR tree.engine/simplify/rewrite_root_path_expr.rs— Rewrite path expressions for composite PKs.
Approach: Where a single-field operation currently destructures let [field] = &fields[..], extend to iterate over all fields and combine with AND expressions.
Phase 2: Subquery Lifting for Composite FKs
Extend the subquery lifting optimization to handle composite foreign keys in BelongsTo and HasOne relationships.
Files:
engine/simplify/lift_in_subquery.rs— Remove theassert_eq!(len, 1)and handle multi-field FKs. For the optimization path, generate AND of per-field comparisons. For the fallback IN subquery path, generate tuple-based IN expressions or multiple correlated conditions.
Approach: The existing single-field logic maps fk_field.source -> fk_field.target. For composite keys, do the same for each field pair and combine with AND.
Phase 3: Engine Lowering — Composite FK Relationships
Fix insert and relationship lowering to handle composite FKs.
Files:
engine/lower/insert.rs— When lowering BelongsTo in insert operations, set all FK fields from the related record’s PK fields, not just one.engine/lower.rs— Handle composite FKs in relationship lowering. Generate multi-column join conditions.
Phase 4: DynamoDB Driver — Batch Operations with Composite Keys
Files:
driver-dynamodb/op/update_by_key.rs— Support batch updates with multiple keys (iterate and issue individual UpdateItem calls if needed).driver-dynamodb/op/delete_by_key.rs— Support batch deletes. Remove the single-key panic.driver-dynamodb/op/create_table.rs— Support composite unique indexes (Global Secondary Indexes with multiple key columns where DynamoDB allows it).
Phase 5: Test Coverage
Fill in the stubbed tests and add new ones covering all composite key combinations:
Existing stubs to implement:
has_many_when_pk_is_composite— Parent has composite PK, child has single FK pointing to ithas_many_when_fk_and_pk_are_composite— Both sides have composite keys
New tests to add:
| Test | Description |
|---|---|
composite_pk_crud | Full CRUD (create, read, update, delete) on a model with 2+ key fields |
composite_pk_three_fields | Composite PK with 3 fields to test beyond the 2-field case |
composite_fk_belongs_to | BelongsTo where the FK is composite (references a composite PK) |
composite_fk_has_one | HasOne with composite FK |
composite_key_pagination | Cursor-based pagination with composite PK ordering |
composite_key_scoped_queries | Scoped queries (e.g., user.todos().filter_by_id(...)) with composite keys |
composite_key_update_non_key_fields | Update non-key fields on a composite-keyed model |
composite_key_unique_constraint | Composite unique constraint (not PK) behavior |
junction_table_pattern | Many-to-many junction table with composite PK and extra attributes |
multi_tenant_pattern | Tenant-scoped models with (tenant_id, entity_id) composite PKs |
Design Decisions
Tuple-Based Identity
Following Diesel and SQLAlchemy’s lead, composite key identity should be represented as tuples. The current generated methods (get_by_field1_and_field2(val1, val2)) are a good API.
AND Composition for Multi-Field Conditions
When a single-field operation like pk_field = value needs to become a composite operation, the standard approach is:
pk_field1 = value1 AND pk_field2 = value2
This maps cleanly to SQL WHERE clauses and DynamoDB key conditions. The engine’s stmt::ExprAnd already supports this.
IN-List with Composite Keys
For batch lookups, composite IN can be expressed as:
-- Row-value syntax (PostgreSQL, MySQL 8.0+, SQLite)
WHERE (col1, col2) IN ((v1a, v2a), (v1b, v2b))
-- Equivalent OR-of-ANDs (universal)
WHERE (col1 = v1a AND col2 = v2a) OR (col1 = v1b AND col2 = v2b)
The OR-of-ANDs form is more portable across databases. The engine should generate this form and let the SQL serializer optimize to row-value syntax where supported.
Composite FK Optimization
The subquery lifting optimization (lift_in_subquery.rs) currently rewrites:
-- Before: subquery
user_id IN (SELECT id FROM users WHERE name = 'Alice')
-- After: direct comparison
user_id = <alice_id>
For composite FKs, the rewrite becomes:
-- Before: correlated subquery
(order_id, item_number) IN (SELECT order_id, item_number FROM order_items WHERE ...)
-- After: direct comparison
order_id = <val1> AND item_number = <val2>
The same optimization logic applies — just iterated over each FK field pair.
Testing Strategy
- All new tests go in the integration suite (
toasty-driver-integration-suite) to run against all database backends - Use the existing
#[driver_test]macro for multi-database testing - Use the matrix testing infrastructure (
compositedimension) where appropriate - Each phase should have passing tests before moving to the next phase
- No unit tests in source code per project convention
Query Ordering, Limits & Pagination
Overview
Toasty provides cursor-based pagination using keyset pagination, which offers consistent performance and works well across both SQL and NoSQL databases. The implementation converts pagination cursors into WHERE clauses rather than using OFFSET, avoiding the performance issues of traditional offset-based pagination.
Potential Future Work
Multi-column Ordering Convenience
Add .then_by() method for chaining multiple order clauses:
#![allow(unused)]
fn main() {
let users = User::all()
.order_by(User::FIELDS.status().asc())
.then_by(User::FIELDS.created_at().desc())
.paginate(10)
.collect(&db)
.await?;
}
Current workaround requires manual construction:
#![allow(unused)]
fn main() {
use toasty::stmt::OrderBy;
let order = OrderBy::from([
Post::FIELDS.status().asc(),
Post::FIELDS.created_at().desc(),
]);
let posts = Post::all()
.order_by(order)
.collect(&db)
.await?;
}
Implementation:
- File:
toasty-macros/src/expand/query.rs - Add
.then_by()method to query builder - Complexity: Medium
Direct Limit Method
Expose .limit() for non-paginated queries:
#![allow(unused)]
fn main() {
let recent_posts: Vec<Post> = Post::all()
.order_by(Post::FIELDS.created_at().desc())
.limit(5)
.collect(&db)
.await?;
}
Implementation:
- File:
toasty-macros/src/expand/query.rs - Generate
.limit()method - Complexity: Low
Last Convenience Method
Get the last matching record:
#![allow(unused)]
fn main() {
let last_user: Option<User> = User::all()
.order_by(User::FIELDS.created_at().desc())
.last(&db)
.await?;
}
Implementation:
- File:
toasty-macros/src/expand/query.rs - Generate
.last()method - Complexity: Low
Testing
Additional Test Coverage
Tests that could be added:
-
Multi-column ordering
- Verify correct ordering with multiple columns
- Test tie-breaking behavior
-
Direct
.limit()method (when implemented)- Non-paginated queries with limit
- Verify correct number of results
-
.last()convenience method (when implemented)- Returns last matching record
- Returns None when no matches
-
Edge cases
- Empty results with pagination
- Single page results (no next/prev cursors)
- Pagination beyond last page
- Large page sizes
- Page size of 1
Database-Specific Considerations
SQL Databases
- MySQL: Uses
LIMIT nfor pagination (keyset approach via WHERE) - PostgreSQL: Uses
LIMIT nfor pagination (keyset approach via WHERE) - SQLite: Uses
LIMIT nfor pagination (keyset approach via WHERE)
All SQL databases use keyset pagination (WHERE clauses with cursors) rather than OFFSET for consistent performance.
NoSQL Databases
- DynamoDB:
- Limited ordering support (only on sort keys)
- Pagination via LastEvaluatedKey
- Cursor-based approach maps well to DynamoDB’s native pagination
- Needs validation and testing
How Keyset Pagination Works
Instead of using OFFSET, Toasty converts cursors to WHERE clauses:
-- Traditional OFFSET (slow for large offsets)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET 10000;
-- Toasty's cursor approach (always fast)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10;
This provides:
- Consistent Performance: O(log n) regardless of page number
- Stable Results: New records don’t shift pagination boundaries
- Database Agnostic: Works efficiently on NoSQL databases
- Real-time Friendly: Handles concurrent insertions gracefully
Notes
- Cursors (
stmt::Expr) can be serialized at the application level if needed for web APIs - Pagination requires an explicit ORDER BY clause to ensure consistent results
- Multi-column ordering works today via manual
OrderByconstruction - The
.then_by()convenience method would improve ergonomics but isn’t essential
Query Constraints & Filtering
Overview
This document identifies gaps in Toasty’s query constraint support compared to mature ORMs, and outlines potential additions for building web applications.
Terminology
A “query constraint” refers to any predicate used in the WHERE clause of a query. In Toasty, constraints are built using:
- Generated filter methods (
Model::filter_by_<field>()) for indexed/key fields - Generic
.filter()method acceptingExpr<bool>for arbitrary conditions Model::FIELDS.<field>()paths combined with comparison methods (.eq(),.gt(), etc.)
Core AST Support Without User API
These expression types exist in toasty-core (crates/toasty-core/src/stmt/expr.rs) and have SQL serialization, but lack a typed user-facing API on Path<T> or Expr<T>:
| Expression | Core AST | SQL Serialized | User API | Notes |
|---|---|---|---|---|
| LIKE | ExprPattern::Like | Yes | None | SQL serialization exists |
| Begins With | ExprPattern::BeginsWith | Yes | None | Converted to LIKE 'prefix%' in SQL |
| EXISTS | ExprExists | Yes | None on user API | Used internally by engine |
| COUNT | ExprFunc::Count | Yes | None | Internal use only |
ORM Comparison
The following table compares Toasty’s constraint support against 8 mature ORMs, highlighting missing features:
| Feature | Toasty | Prisma | Drizzle | Django | SQLAlchemy | Diesel | SeaORM | Hibernate | |—|—|—|—|—|—|—|—|—|—| | Set Operations | | | | | | | | | | NOT IN | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Range | | | | | | | | | | BETWEEN | No | Via gt+lt | Yes | Yes | Yes | Yes | Yes | Yes | | String Operations | | | | | | | | | | LIKE | AST only | Via contains | Yes | Yes | Yes | Yes | Yes | Yes | | Contains (substring) | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Starts with | AST only | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Ends with | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Case-insensitive (ILIKE) | No | Yes | Yes | Yes | Yes | Pg only | No | Manual | | Regex | No | No | No | Yes | Yes | No | No | No | | Full-text search | No | Preview | No | Yes (Pg) | Dialect | Crate | No | Extension | | Relation Filtering | | | | | | | | | | Filter by related fields | No | Yes | Via join | Yes | Yes | Via join | Via join | Via join | | Has related (some/none/every) | No | Yes | Via exists | Via exists | Yes | Via exists | Via join | Via exists | | Aggregation | | | | | | | | | | COUNT / SUM / AVG / etc. | No | Limited | Yes | Yes | Yes | Yes | Yes | Yes | | GROUP BY | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | HAVING | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Advanced | | | | | | | | | | Field-to-field comparison | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Arithmetic in queries | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Raw SQL escape hatch | No | Full query | Inline | Multiple | Inline | Inline | Inline | Native query | | JSON field queries | No | Limited | Via raw | Yes | Yes | Pg | Via raw | No | | CASE / WHEN | No | No | No | Yes | Yes | No | No | Yes | | Dynamic/conditional filters | No | Spread undef | Pass undef | Chain | Chain | BoxableExpr | add_option | Build list |
Potential Future Work
Features with Existing Internal Support
These features have core AST and SQL serialization but need user-facing APIs:
String Pattern Matching
- Core AST:
ExprPattern::BeginsWithandExprPattern::Likeexist with SQL serialization - Needed:
- Add
ExprPattern::EndsWithandExprPattern::Containsto core AST - Add
.contains(),.starts_with(),.ends_with()onPath<String> - Add
.like()for direct pattern matching - Handle LIKE special character escaping (
%,_)
- Add
- Files:
crates/toasty/src/stmt/path.rs,crates/toasty-core/src/stmt/expr.rs - Use case: Search functionality (e.g., search users by name fragment)
NOT IN
- Current:
INexists but no negated form - Needed:
ExprNotInListor negate theInListexpression, plus.not_in_list()user API - Files:
crates/toasty/src/stmt/path.rs,crates/toasty-core/src/stmt/expr.rs - Use case: Exclusion lists (e.g., “exclude these IDs from results”)
Features Needing New Implementation
Case-Insensitive String Matching
- Current: No support at any layer
- Needed: ILIKE support in SQL serialization (PostgreSQL native, LOWER() wrapper for SQLite/MySQL), plus user API
- Design consideration: How to handle cross-database differences (ILIKE is Pg-only, LOWER()+LIKE is universal but slower)
- Reference: Prisma (
mode: 'insensitive'), Django (__iexact,__icontains) - Use case: User-facing search (e.g., email lookup, name search)
BETWEEN / Range Queries
- Current: Users must combine
.ge()and.le()manually - Needed: Syntactic sugar over AND(ge, le), or a dedicated
ExprBetween - File:
crates/toasty/src/stmt/path.rs - Reference: Drizzle (
between()), Django (__range), Diesel (.between()) - Use case: Date ranges, price ranges, numeric filtering
Relation/Association Filtering
- Current: Scoped queries exist but no way to filter a top-level query by related model fields
- Needed: JOIN or EXISTS subquery generation in the engine, plus user API design
- Complexity: High - requires significant engine work
- Reference: Prisma (
some/none/every), Django (__traversal), SQLAlchemy (.any()/.has()) - Use case: Filtering parents by child attributes (e.g., “users who have at least one order over $100”)
Field-to-Field Comparison
- Current:
Path::eq()requiresIntoExpr<T>, which accepts values but should also accept paths - Needed: Ensure
Path<T>implementsIntoExpr<T>and codegen supports cross-field comparisons - Reference: Django (
F()expressions), SQLAlchemy (column comparison) - Use case: Comparing two columns (e.g., “updated_at > created_at”, “balance > minimum_balance”)
Arithmetic Operations in Queries
- Current: No support -
BinaryOponly includes comparison operators (Eq, Ne, Gt, Ge, Lt, Le) - Needed:
- Add arithmetic operators to AST:
Add,Subtract,Multiply,Divide,Modulo - SQL serialization for arithmetic expressions (standard across databases)
- User API to build arithmetic expressions (e.g.,
.add(),.multiply(), operator overloading, or expression builder) - Type handling for arithmetic results (ensure type safety)
- Add arithmetic operators to AST:
- Files:
crates/toasty-core/src/stmt/op_binary.rs,crates/toasty-core/src/stmt/expr.rs,crates/toasty/src/stmt/path.rs - Reference:
- Django:
F('price') * F('quantity') > 100 - SQLAlchemy:
column('price') * column('quantity') > 100 - Diesel:
price.eq(quantity * 2) - Drizzle:
sqlprice * quantity > 100``
- Django:
- Use cases:
- Computed comparisons:
WHERE age <= 2 * years_in_school - Price calculations:
WHERE price * quantity > 1000 - Time differences:
WHERE (end_time - start_time) > 3600 - Percentage calculations:
WHERE (actual / budget) * 100 > 110 - Complex business rules:
WHERE (base_price * (1 - discount_rate)) > minimum_price
- Computed comparisons:
- Design considerations:
- Should arithmetic create new expression types or extend
BinaryOp? - How to handle type coercion (int vs float, time arithmetic)?
- Support for parentheses and operator precedence
- Whether to support on SELECT side (computed columns) or just WHERE clauses initially
- Should arithmetic create new expression types or extend
Aggregate Queries
- Current:
ExprFunc::Countexists internally but is not user-facing - Needed: User-facing API, return type handling, integration with GROUP BY
- Complexity: High - requires significant API design
- Reference: Django’s annotation system, SQLAlchemy’s
func - Use case: Dashboards, analytics, summary views, pagination metadata
GROUP BY / HAVING
- Current: No support at any layer
- Needed: AST additions, SQL generation, engine support, user API
- Complexity: High
- Use case: Aggregate queries, reports, analytics, dashboards
Raw SQL Escape Hatch
- Current: No support
- Needed: Safe API for parameterized raw SQL fragments within typed queries
- Design consideration: Full raw queries vs. raw fragments within typed queries vs. both
- Reference: Drizzle (
sql`...`templates), SQLAlchemy (text()), Diesel (sql()) - Use case: Edge cases that the ORM can’t express
Dynamic / Conditional Query Building
- Current: Users can chain
.filter()calls, but no ergonomic way to skip filters when parameters areNone - Needed: Pattern for optional filters
- Reference: SeaORM (
Condition::add_option()), Prisma (spread undefined), Diesel (BoxableExpression) - Use case: Search forms, filter UIs, API endpoints with optional parameters
Full-Text Search
- Current: No support
- Complexity: High - database-specific implementations (PostgreSQL tsvector, MySQL FULLTEXT, SQLite FTS5)
- Design consideration: May be best as database-specific extensions rather than a unified API
- Use case: Content-heavy applications (blogs, e-commerce, documentation sites)
JSON Field Queries
- Current: No support
- Complexity: High - needs path traversal syntax, type handling, database-specific operators
- Dependency: Depends on JSON/JSONB data type support
- Reference: Django (
field__key__subkey), SQLAlchemy (column['key']) - Use case: Flexible/schemaless data within relational databases
Advanced / Niche Features
Regex Matching
- Use case: Power-user filtering, data validation queries
- Reference: Django (
__regex,__iregex), SQLAlchemy (regexp_match())
Array/Collection Operations
- Use case: PostgreSQL array columns, MongoDB array fields
- Dependency: Requires array type support first
- Reference: Prisma (
has,hasEvery,hasSome), Django (ArrayField lookups)
CASE/WHEN Expressions
- Use case: Conditional logic within queries for complex business rules
- Reference: Django (
When()/Case()), SQLAlchemy (case())
Subquery Comparisons (ALL/ANY/SOME)
- Use case: Advanced filtering like “price > ALL(SELECT price FROM competitors)”
- Reference: Hibernate, SQLAlchemy (
all_(),any_())
IS DISTINCT FROM
- Use case: NULL-safe comparisons without special-casing IS NULL
- Reference: SQLAlchemy (only ORM with native support)
Implementation Considerations
Recommended Approach
Based on the analysis above, the following groupings maximize user value:
Group 1: Expose Existing Internals Items with core AST and SQL serialization that only need user-facing methods:
.not_in_list()onPath<T>(negate existingInList)
Estimated scope: ~50 lines of user-facing API code + integration tests
Group 2: String Operations Partial AST support that needs completion and exposure:
- Add
ExprPattern::EndsWithandExprPattern::Containsto core AST - Add SQL serialization for new pattern variants
- Add
.contains(),.starts_with(),.ends_with()toPath<String> - Handle LIKE special character escaping
Estimated scope: ~200 lines across core + SQL + user API
Group 3: Ergonomic Improvements
- Case-insensitive matching (ILIKE / LOWER() wrapper)
.between()convenience method.like()direct exposure- Conditional/optional filter building helpers
Group 4: Structural Features Requires deeper engine work:
- Relation filtering (JOIN/EXISTS generation)
- Aggregate functions (user-facing COUNT/SUM/etc.)
- GROUP BY / HAVING
- Raw SQL escape hatch
Reference Implementation Goals
A comprehensive query constraint system would allow users to:
- Search strings by substring, prefix, and suffix (case-sensitive and case-insensitive)
- Use NOT IN with literal lists and subqueries
- Filter by related model attributes
- Use at least basic aggregate queries (COUNT)
- Fall back to raw SQL for anything the ORM can’t express
This would put Toasty on par with the filtering capabilities of Diesel and SeaORM, and cover the vast majority of queries needed by typical web applications.
Query Engine Optimization Roadmap
Overview
The query engine currently performs simplification as a single VisitMut pass that
applies local rewrite rules bottom-up. This works well for straightforward
transformations (constant folding, tuple decomposition, association rewriting),
but it has structural limitations as the optimizer takes on more complex work.
This document tracks improvements to the query engine’s optimization infrastructure, focusing on predicate simplification and the compilation pipeline.
Current State
Simplification Pass
The simplifier (engine/simplify.rs) implements VisitMut and applies rules in
a single bottom-up traversal. Each node is visited once, simplified, and then
its parent is simplified with the updated children.
What works well:
- Local rewrites: constant folding, boolean identity, tuple decomposition
- Association rewriting and subquery lifting
- Match elimination (distributing binary ops over match arms)
Structural limitations:
- Rules fire during the walk, so ordering matters. A rule that produces expressions consumable by another rule only works if the consumer fires later in the same walk or the walk is re-run.
- Global analysis (e.g., detecting contradictions across an entire AND conjunction) must be done inline during the walk, mixing local and global concerns.
- Expensive analyses run on every AND node encountered, even when only a small fraction would benefit.
Contradicting Equality Detection
The simplifier currently detects a = c1 AND a = c2 (where c1 != c2) inline in
simplify_expr_and. This is O(n^2) in the number of equality predicates within a
single AND. While operand lists are typically small, the analysis runs on every
AND node during the walk, including intermediate nodes that are about to be
restructured by other rules.
Planned Improvements
Phase 1: Post-Lowering Optimization Pass
Move expensive predicate analysis out of the per-node simplifier and into a dedicated pass that runs once after lowering, against the HIR representation. At this point the statement is fully resolved to table-level expressions and the predicate tree is stable — no more association rewrites or field resolution changes will restructure it.
This pass would handle:
- Contradicting equality pruning
- Redundant predicate elimination
- Tautology detection
ExprLetinlining (currently done at the end oflower_returning; should move here so all post-lowering expression rewrites live in one place)
Why after lowering: Before lowering, predicates reference model-level fields and contain relationship navigation that the lowering phase rewrites. Running global analysis before this rewriting is wasted work — the predicate tree will change. After lowering, the predicates are in their final structural form (column references, subqueries), so analysis results are stable.
Phase 2: Equivalence Classes
Build equivalence classes from equality predicates before running constraint
analysis. When the optimizer sees a = b AND b = c, it should know that a,
b, and c are all equivalent, enabling:
- Transitive contradiction detection:
a = b AND b = 5 AND a = 7is a contradiction (a must be both 5 and 7), even though no single pair of predicates directly conflicts. - Predicate implication:
a = 5 AND a > 3— the second predicate is implied and can be dropped. - Join predicate inference: If
a = band a filter constrainsa, the same constraint applies tob.
Equivalence classes are a standard technique in query optimizers. The idea is to union-find expressions that are constrained to be equal, then check each class for conflicting constant bindings or range constraints.
Phase 3: Structured Constraint Analysis
Replace ad-hoc pairwise comparisons with a more structured representation of constraints. For each expression (or equivalence class), maintain:
- Constant binding: The expression must equal a specific value
- Range bounds: Upper/lower bounds from inequality predicates
- NOT-equal set: Values the expression cannot be (from
!=predicates)
With this structure, contradiction detection becomes a property check rather than a search: an expression with two different constant bindings, or a constant binding outside its range bounds, is immediately contradictory.
Predicate Normalization (Not Full DNF)
Full conversion to disjunctive normal form (DNF) — where the entire predicate becomes an OR of ANDs — risks exponential blowup. A predicate with N AND-connected clauses of M OR-options each expands to M^N terms. This makes full DNF impractical as a general-purpose transformation.
Instead, apply targeted normalization:
- Flatten associative operators: Merge nested
AND(AND(...), ...)andOR(OR(...), ...)into flat lists (already done). - Canonicalize comparison direction: Ensure constants are on the right side of comparisons (already done).
- Limited distribution: Distribute AND over OR only in specific cases where it enables index utilization or constraint extraction, with a size budget to prevent blowup.
- OR-of-equalities to IN-list: Convert
a = 1 OR a = 2 OR a = 3toa IN (1, 2, 3)for more efficient execution.
The goal is to normalize enough for the constraint analysis to work without paying the exponential cost of full DNF.
Design Principles
- Run expensive analysis once, not per-node. The current simplifier intermixes cheap local rewrites with expensive global analysis. Separate them.
- Analyze after the predicate tree is stable. Post-lowering is the right point — predicates are resolved to columns and won’t be restructured.
- Build structure, then query it. Constructing equivalence classes and constraint summaries up front makes individual checks cheap.
- Budget-limited transformations. Any rewrite that can expand expression size (distribution, case expansion) must have a size limit.