Toasty Architecture Overview

Project Structure

Toasty is an ORM for Rust that supports SQL and NoSQL databases. The codebase is a Cargo workspace with separate crates for each layer.

Crates

1. toasty

User-facing crate with query engine and runtime.

Key Components:

engine/: Multi-phase query compilation and execution pipeline
- See Query Engine Architecture for detailed documentation
stmt/: Typed statement builders (wrappers around toasty_core::stmt types)
relation/: Relationship abstractions (HasMany, BelongsTo, HasOne)
model.rs: Model trait and ID generation

Query Execution Pipeline (high-level):

Statement AST → Simplify → Lower → Plan → Execute → Results

The engine compiles queries into a mini-program of actions executed by an interpreter. For details on HIR, MIR, and the full compilation pipeline, see Query Engine Architecture.

2. toasty-core

Shared types used by all other crates: schema representations, statement AST, and driver interface.

Key Components:

schema/: Model and database schema definitions
- app/: Model-level definitions (fields, relations, constraints)
- db/: Database-level table and column definitions
- mapping/: Maps between models and database tables
- builder/: Schema construction utilities
- verify/: Schema validation
stmt/: Statement AST nodes for queries, inserts, updates, deletes
driver/: Driver interface, capabilities, and operations

3. toasty-macros (code generation)

The toasty-macros crate contains both the proc-macro entry points and the code generation logic. It generates Rust code from the #[derive(Model)] and #[derive(Embed)] macros.

Key Components:

schema/: Parses model attributes into schema representation
expand/: Generates implementations for models
- model.rs: Model trait implementation
- query.rs: Query builder methods
- create.rs: Create/insert builders
- update.rs: Update builders
- relation.rs: Relationship methods
- fields.rs: Field accessors
- filters.rs: Filter method generation
- schema.rs: Runtime schema generation

4. toasty-driver-*

Database-specific driver implementations.

Supported Databases:

toasty-driver-sqlite: SQLite implementation
toasty-driver-postgresql: PostgreSQL implementation
toasty-driver-mysql: MySQL implementation
toasty-driver-dynamodb: DynamoDB implementation

5. toasty-sql

Converts statement AST to SQL strings. Used by SQL-based drivers.

Key Components:

serializer/: SQL generation with dialect support
- flavor.rs: Database-specific SQL dialects
- statement.rs: Statement serialization
- expr.rs: Expression serialization
- ty.rs: Type serialization
stmt/: SQL-specific statement types

Toasty Query Engine

This document provides a high-level overview of the Toasty query execution engine for developers working on engine internals. It describes the multi-phase pipeline that transforms user queries into database operations.

Overview

The Toasty engine is a multi-database query compiler and runtime that executes ORM operations across SQL and NoSQL databases. It transforms a user’s query (represented as a Statement AST) into a sequence of executable actions through multiple compilation phases.

Execution Model

The final output is a mini program executed by an interpreter. Think of it like a small virtual machine or bytecode interpreter, though there is no control flow (yet):

Instructions (Actions): Operations like “execute this SQL”, “filter these results”, “merge child records into parents”
Variables: Storage slots, or registers, that hold intermediate results between instructions
Linear Execution: Instructions run in sequence (no control flow - no branches or loops, yet). Eventually, the interpreter will be smart about concurrency and execute independent operations in parallel when possible.
Interpreter: The engine executor reads each instruction, fetches inputs from variables, performs the operation, and stores outputs back to variables

For example, loading users with their todos:

SELECT users.id, users.name, (
    SELECT todos.id, todos.title 
    FROM todos 
    WHERE todos.user_id = users.id
) FROM users WHERE ...

compiles to a program like:

$0 = ExecSQL("SELECT * FROM users WHERE ...")
$1 = ExecSQL("SELECT * FROM todos WHERE user_id IN ...")
$2 = NestedMerge($0, $1, by: user_id)
return $2

The compilation pipeline below transforms user queries into this instruction/variable representation. Each phase brings the query closer to this final executable form.

Compilation Pipeline

User Query (Statement AST)
    ↓
[Verification] - Validate statement structure (debug builds only)
    ↓
[Simplification] - Normalize and optimize the statement AST
    ↓
[Lowering] - Convert to HIR for dependency analysis
    ↓
[Planning] - Build MIR operation graph
    ↓
[Execution Planning] - Convert to action sequence with variables
    ↓
[Execution] - Run actions against database driver
    ↓
Result Stream

Phase 1: Simplification

Location: engine/simplify.rs

The simplification phase normalizes and optimizes the statement AST before planning.

Key Transformations

Association Rewriting: Converts relationship navigation (e.g., user.todos()) into explicit subqueries with foreign key filters
Subquery Lifting: Transforms IN (SELECT ...) expressions into more efficient join-like operations
Expression Normalization: Simplifies complex expressions (e.g., flattening nested ANDs/ORs, constant folding)
Path Expression Rewriting: Resolves field paths and relationship traversals into explicit column references
Empty Query Detection: Identifies queries that will return no results

Example: Association Simplification

#![allow(unused)]
fn main() {
// user.todos().delete() generates:

Delete {
    from: Todo,
    via: User::todos,  // relationship traversal
    ...
}

// After simplification:

Delete {
    from: Todo,
    filter: todo.user_id IN (SELECT id FROM users WHERE ...)
}
}

Converting relationship navigation into explicit filters early means downstream phases only need to handle standard query patterns with filters and subqueries - no special-case logic for each relationship type.

Phase 2: Lowering

Location: engine/lower.rs

Lowering converts a simplified statement into HIR (High-level Intermediate Representation) - a collection of related statements with tracked dependencies.

Toasty tries to maximize what the target database can handle natively, only decomposing queries when necessary. For example, a query like User::find_by_name("John").todos().all() contains a subquery. SQL databases can execute this as SELECT * FROM todos WHERE user_id IN (SELECT id FROM users WHERE name = 'John'). DynamoDB cannot handle subqueries, so lowering splits this into two statements: first fetch user IDs, then query todos with those IDs.

The HIR tracks a dependency graph between statements - which statements depend on results from others, and which columns flow between them. This graph can contain cycles when preloading associations. For example:

SELECT users.id, users.name, (
    SELECT todos.id, todos.title 
    FROM todos 
    WHERE todos.user_id = users.id
) FROM users WHERE ...

The users query must execute first to provide IDs for the todos subquery, but the todos results must be merged back into the user records. This creates a cycle: users → todos → users.

This lowering phase handles:

Statement Decomposition: Breaking queries into sub-statements when the database can’t handle them directly
Dependency Tracking: Which statements must execute before others
Argument Extraction: Identifying values passed between statements (e.g., a loaded model’s ID used in a child query’s filter)
Relationship Handling: Processing relationship loads and nested queries

Lowering Algorithm

Lowering transforms model-level statements to table-level statements through a visitor pattern that rewrites each part of the statement AST:

Table Resolution: InsertTarget::Model, UpdateTarget::Model, etc. become their corresponding table references
Returning Clause Transformation: Returning::Model is replaced with Returning::Expr containing the expanded column expressions
Field Reference Resolution: Model field references are converted to table column references
Include Expansion: Association includes become subqueries in the returning clause

The TableToModel mapping (built during schema construction) drives the transformation. It contains an expression for each model field that maps to its corresponding table column(s). This supports more than a 1-1 mapping—a model field can be derived from multiple columns or a column can map to multiple fields. Association fields are initialized to Null in this mapping.

When lowering encounters a Returning::Model { include } clause:

Call table_to_model.lower_returning_model() to get the base column expressions
For each path in the include list, call build_include_subquery() to generate a subquery that selects the associated records
Replace the Null placeholder in the returning expression with the generated subquery

Lowering Examples

Example 1: Simple query

Given a model with a renamed column:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key] #[auto] id: u64,
    #[column(name = "first_and_last_name")]
    name: String,
    email: String,
}
}

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Note: At model-level, no specific fields are selected

// After lowering
SELECT id, first_and_last_name, email FROM users WHERE id = ?
}

Example 2: Query with association

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
  INCLUDE todos

// After lowering
SELECT id, first_and_last_name, email, (
    SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
}

Phase 3: Planning

Location: engine/plan.rs

Planning converts HIR into MIR (Middle-level Intermediate Representation) - a directed acyclic graph of operations, both database queries and in-memory transformations. Edges represent data dependencies: an operation cannot execute until all operations it depends on have completed and produced their results.

Since the HIR graph can contain cycles, planning must break them to produce a DAG. This is done by introducing intermediate operations that batch-load data and merge results (e.g., NestedMerge).

Operation Types

The MIR supports various operation types (see engine/mir.rs for details):

SQL operations:

ExecStatement - Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)
ReadModifyWrite - Optimistic locking (read, modify, conditional write). Exists as a separate operation because the read result must be processed in-memory to compute the write, which ExecStatement cannot express.

Key-value operations (NoSQL):

GetByKey, DeleteByKey, UpdateByKey - Direct key access
QueryPk, FindPkByIndex - Key lookups via queries or indexes

In-memory operations:

Filter, Project - Transform and filter results
NestedMerge - Merge child records into parent records
Const - Constant values

Phase 4: Execution Planning

Location: engine/plan/execution.rs

Execution planning converts the MIR logical plan into a concrete sequence of actions that can be executed. This phase:

Assigns variable slots for storing intermediate results
Converts each MIR operation into an execution action
Maintains topological ordering to ensure dependencies execute first

Action Types

Actions mirror MIR operations but include concrete variable bindings:

SQL actions:

ExecStatement: Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)
ReadModifyWrite: Optimistic locking (read, modify, conditional write)

Key-value actions (NoSQL):

GetByKey: Batch fetch by primary key
DeleteByKey: Delete records by primary key
UpdateByKey: Update records by primary key
QueryPk: Query primary keys
FindPkByIndex: Find primary keys via secondary index

In-memory actions:

Filter: Apply in-memory filter to a variable’s data
Project: Transform records
NestedMerge: Merge child records into parent records
SetVar: Set a variable to a constant value

Phase 5: Execution

Location: engine/exec.rs

The execution phase is the interpreter that runs the compiled program. It iterates through actions, reading inputs from variables, performing operations, and storing outputs back to variables.

Execution Loop

The interpreter follows a simple pattern:

Initialize variable storage
For each action in sequence:
- Load input data from variables
- Perform the operation (database query or in-memory transform)
- Store the result in the output variable
Return to the user the result from the final variable (the last action’s output)

Variable Lifetime

The engine tracks how many times each variable is referenced by downstream actions. A variable may be used by multiple actions (e.g., the same user records merged with both todos and comments). When the last action that needs a variable completes, the variable’s value is dropped to free memory.

Driver Interaction

The execution phase is the only part of the engine that communicates with database drivers. The driver interface is intentionally simple: a single exec() method that accepts an Operation enum. This enum includes variants for both SQL operations (QuerySql, Insert) and key-value operations (GetByKey, QueryPk, FindPkByIndex, DeleteByKey, UpdateByKey).

Each driver implements whichever operations it supports. SQL drivers handle QuerySql natively while key-value drivers handle GetByKey, QueryPk, etc. The planner uses driver.capability() to determine which operations to generate for each database type.

Toasty Type System Architecture

Overview

Toasty uses Rust’s type system in the public API with both concrete types and generics. The query engine tracks the type of value each statement evaluates to using stmt::Type. This document describes how types flow through the system and the key components involved.

Type System Boundaries

Toasty has two distinct type systems with different responsibilities:

1. Rust-Level Type System (Compile-Time Safety)

At the Rust level, each model is a distinct type:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    name: String,
    email: String,
}

#[derive(Model)]
struct Todo {
    #[key]
    #[auto]
    id: u64,
    user_id: u64,
    title: String,
}

// Toasty generates type-safe field access preventing type mismatches:
User::get_by_email(&db, "john@example.com").await?;  // ✓ String matches email field
User::filter_by_id(&user_id).filter(User::FIELDS.name().eq("John")).all(&db).await?;  // ✓ String matches name field

// Type system prevents field/model confusion:
// User::FIELDS.title()  // ← Compile error! User has no title field
// Todo::FIELDS.email()  // ← Compile error! Todo has no email field
// User::FIELDS.name().eq(&todo_id)  // ← Compile error! u64 doesn't match String
}

The query builder API maintains this type safety through generics and traits, preventing you from accidentally mixing model types or referencing non-existent fields. The API uses generic types (Statement<M>, Select<M>, etc.) that wrap toasty_core::stmt types.

2. Query Engine Type System (Runtime)

When db.exec(statement) is called, the generic <M> parameter is erased:

#![allow(unused)]
fn main() {
// Generated query builder returns a typed wrapper
let query: FindUserById = User::find_by_id(&id);

// .into() converts to Statement<User>
let statement: Statement<User> = query.into();

// At db.exec() - generic is erased, .untyped is extracted
pub async fn exec<M: Model>(&self, statement: Statement<M>) -> Result<ValueStream> {
    engine::exec(self, statement.untyped).await  // <- Only toasty_core::stmt::Statement
}
}

At this boundary, the statement becomes untyped (no Rust generic), but the engine tracks the type of value the statement evaluates to using stmt::Type. Initially, this remains at the model-level—a query for User evaluates to Type::List(Type::Model(user_model_id)). During lowering, these convert to structural record types for database execution.

Type Flow Through the System

Rust API → Query Builder → Engine Entry → Lowering/Planning → Execution
    ↓           ↓              ↓               ↓                  ↓
Distinct    Type-Safe      Type::Model     Type::Record       stmt::Value
Types       Generics       (no generics)                      (typed)
(compile)   (compile)      (runtime)       (runtime)          (runtime)

At lowering, statements that evaluate to Type::Model(model_id) are converted to evaluate to Type::Record([field_types...]). This conversion enables the engine to work with concrete field types for database operations.

Detailed Architecture

Query Engine Entry Point

When the engine receives a toasty_core::stmt::Statement, it processes through verification, lowering, planning, and execution:

#![allow(unused)]
fn main() {
pub(crate) async fn exec(&self, stmt: Statement) -> Result<ValueStream> {
    if cfg!(debug_assertions) {
        self.verify(&stmt);
    }

    // Lower the statement to High-level intermediate representation
    let hir = self.lower_stmt(stmt)?;

    // Translate into a series of driver operations
    let plan = self.plan_hir_statement(hir)?;

    // Execute the plan
    self.exec_plan(plan).await
}
}

Lowering Phase (Model-to-Table Transformation)

The lowering phase transforms statements from model-level to table-level representations.

Example 1: Simple query

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// Note: At model-level, no specific fields are selected

// After lowering
SELECT id, name, email FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([Type::Id, Type::String, Type::String]))
}

Example 2: Query with association

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User INCLUDE todos WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// where todos field is Type::List(Type::Model(todo_model_id))

// After lowering
SELECT id, name, email, (
    SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([
//   Type::Id, Type::String, Type::String,
//   Type::List(Type::Record([Type::Id, Type::String, Type::Id]))
// ]))
}

Planning and Variable Types

During planning, the engine assigns variables to hold intermediate results (see Query Engine Architecture for details on the execution model). Each variable is registered with its type, which is always Type::List(...) or Type::Unit.

Execution

At execution time, the VarStore holds the type information from planning. When storing a value stream in a variable, the store associates the expected type with it. The value stream ensures each value it yields conforms to that type. This type information carries through to the final result returned to the user.

Type Inference

While statements entering the engine have known types, planning constructs new expressions—projections, filters, and merge qualifications—whose types aren’t explicitly declared. The engine must infer these types from the expression structure to register variables correctly.

Type inference is handled by ExprContext, which walks expression trees and determines their result types based on the schema. For example, a column reference’s type comes from the schema definition, and a record expression’s type is built from its field types.

#![allow(unused)]
fn main() {
// Create context for type inference
let cx = stmt::ExprContext::new_with_target(&*self.engine.schema, stmt);

// Infer type of an expression reference
let ty = cx.infer_expr_reference_ty(expr_reference);

// Infer type of a full expression with argument types
let ret = ExprContext::new_free().infer_expr_ty(expr.as_expr(), &args);
}

Design

Design documents for Toasty.

Batch Query Execution

Overview

Batch queries let users send multiple independent queries to the database in a single round-trip. The results come back as a typed tuple matching the input queries.

#![allow(unused)]
fn main() {
let (active_users, recent_posts) = toasty::batch((
    User::find_by_active(true),
    Post::find_recent(100),
)).exec(&db).await?;

// active_users: Vec<User>
// recent_posts: Vec<Post>
}

The batch composes all queries into a single Statement whose returning expression is a record of subqueries. This means batch execution flows through the existing exec path — no new executor methods, no new driver operations.

This design covers SQL databases only. DynamoDB support is out of scope.

New Trait: `IntoStatement<T>`

A single new trait bridges query builders to Statement<T>:

#![allow(unused)]
fn main() {
pub trait IntoStatement<T> {
    fn into_statement(self) -> Statement<T>;
}
}

Query builders implement this for their model type. For example, UserQuery implements IntoStatement<User>:

#![allow(unused)]
fn main() {
impl IntoStatement<User> for UserQuery {
    fn into_statement(self) -> Statement<User> {
        self.stmt.into()
    }
}
}

The codegen already produces IntoSelect impls for query builders. IntoStatement can be blanket-implemented for anything that implements IntoSelect:

#![allow(unused)]
fn main() {
impl<T: IntoSelect> IntoStatement<T::Model> for T {
    fn into_statement(self) -> Statement<T::Model> {
        self.into_select().into()
    }
}
}

Tuple implementations

Tuples of IntoStatement types implement IntoStatement by composing their inner statements into a single select whose returning expression is a record of subqueries:

#![allow(unused)]
fn main() {
impl<T1, T2, A, B> IntoStatement<(Vec<T1>, Vec<T2>)> for (A, B)
where
    A: IntoStatement<T1>,
    B: IntoStatement<T2>,
{
    fn into_statement(self) -> Statement<(Vec<T1>, Vec<T2>)> {
        let stmt_a = self.0.into_statement().untyped;
        let stmt_b = self.1.into_statement().untyped;

        // Build: SELECT (stmt_a), (stmt_b)
        let query = stmt::Query::values(stmt::Expr::record([
            stmt::Expr::subquery(stmt_a),
            stmt::Expr::subquery(stmt_b),
        ]));

        Statement::from_raw(query.into())
    }
}
}

The resulting statement is equivalent to SELECT (subquery_1), (subquery_2). At the Toasty AST level this is a Query whose returning body is a Record([Expr::Stmt, Expr::Stmt]). The engine handles each subquery independently during execution and packs the results into a single Value::Record.

Tuple impls for arities 2 through 8 are generated with a macro.

`Load` for Tuples and `Vec<T>`

To deserialize the composed result, Load is implemented for Vec<T> and for tuples:

#![allow(unused)]
fn main() {
impl<T: Load> Load for Vec<T> {
    fn load(value: stmt::Value) -> Result<Self> {
        match value {
            Value::List(items) => items
                .into_iter()
                .map(T::load)
                .collect(),
            _ => Err(Error::type_conversion(value, "Vec<T>")),
        }
    }
}

impl<A: Load, B: Load> Load for (A, B) {
    fn load(value: stmt::Value) -> Result<Self> {
        match value {
            Value::Record(mut record) => Ok((
                A::load(record[0].take())?,
                B::load(record[1].take())?,
            )),
            _ => Err(Error::type_conversion(value, "(A, B)")),
        }
    }
}
}

With these impls, Load for (Vec<User>, Vec<Post>) works automatically: the outer tuple impl splits the record, then each Vec<T> impl iterates the list and loads individual model instances.

User-Facing API

#![allow(unused)]
fn main() {
pub fn batch<T, Q: IntoStatement<T>>(queries: Q) -> Batch<T>
where
    T: Load,
{
    Batch {
        stmt: queries.into_statement(),
    }
}

pub struct Batch<T> {
    stmt: Statement<T>,
}

impl<T: Load> Batch<T> {
    pub async fn exec(self, executor: &mut dyn Executor) -> Result<T> {
        use ExecutorExt;
        let stream = executor.exec(self.stmt).await?;
        let value = stream.next().await
            .ok_or_else(|| Error::record_not_found("batch returned no results"))??;
        T::load(value)
    }
}
}

Batch::exec calls the regular ExecutorExt::exec method. The composed statement flows through the standard engine pipeline. The result is a single value (a record of lists) that T::load deserializes into the typed tuple.

Execution Flow

User code:
    toasty::batch((UserQuery, PostQuery)).exec(&db)

IntoStatement for (A, B):
    SELECT (SELECT ... FROM users WHERE ...), (SELECT ... FROM posts ...)

Engine pipeline (standard exec path):
    lower → plan → exec

    The engine recognizes Expr::Stmt subqueries in the returning
    expression and executes each independently.

Result:
    Value::Record([
        Value::List([user1, user2, ...]),
        Value::List([post1, post2, ...]),
    ])

Load for (Vec<User>, Vec<Post>):
    (A::load(record[0]), B::load(record[1]))
    → (Vec<User>::load(list), Vec<Post>::load(list))
    → (vec![User::load(v1), ...], vec![Post::load(v1), ...])

`Statement` Changes

Statement<M> needs a way to construct from a raw stmt::Statement without requiring M: Model:

#![allow(unused)]
fn main() {
impl<M> Statement<M> {
    /// Build a statement from a raw untyped statement.
    ///
    /// Used by batch composition where M may be a tuple, not a model.
    pub(crate) fn from_raw(untyped: stmt::Statement) -> Self {
        Self {
            untyped,
            _p: PhantomData,
        }
    }
}
}

The existing Statement::from_untyped requires M: Model (via IntoSelect). from_raw has no bound on M and is pub(crate) so only internal code uses it.

Engine Support

The engine needs to handle a Query whose returning expression is a record of Expr::Stmt subqueries where each subquery returns multiple rows.

The lowerer already handles Expr::Stmt for association preloading (INCLUDE), where subqueries get added to the dependency graph and executed as part of the plan. Batch queries follow the same pattern: each Expr::Stmt in the returning record becomes an independent subquery in the plan, and the exec phase collects results into a Value::Record of Value::Lists.

If the existing lowerer does not handle bare subqueries in a returning record (outside of an INCLUDE context), a small extension is needed to recognize this pattern and plan it the same way.

Implementation Plan

Phase 1: `IntoStatement` trait and `Load` impls

Add IntoStatement<T> trait to crates/toasty/src/stmt/
Add blanket impl IntoStatement<T::Model> for T: IntoSelect
Add Load for Vec<T> and Load for (A, B) (and higher arities via macro)
Add Statement::from_raw
Export IntoStatement from lib.rs and codegen_support

Phase 2: Batch API

Add toasty::batch() function and Batch<T> struct
Add tuple impls of IntoStatement<(Vec<T1>, Vec<T2>, ...)> (via macro)
Wire Batch::exec through the standard ExecutorExt::exec path

Phase 3: Engine support

Verify that the lowerer handles Expr::Stmt subqueries in a returning record correctly (it may already work via the INCLUDE path)
If not, extend the lowerer to plan bare record-of-subqueries statements
Verify the exec phase packs subquery results into Value::Record of Value::Lists

Phase 4: Integration tests

Batch two selects on different models
Batch a select that returns rows with a select that returns empty
Batch with filters, ordering, and limits
Batch inside a transaction
Batch of a single query (degenerates to normal execution)

Files Modified

File	Change
`crates/toasty/src/stmt/into_statement.rs`	New: `IntoStatement<T>` trait, blanket impl
`crates/toasty/src/stmt.rs`	Add `Statement::from_raw`, re-export `IntoStatement`
`crates/toasty/src/load.rs`	Add `Load` impls for `Vec<T>` and tuples
`crates/toasty/src/batch.rs`	Add `batch()`, `Batch<T>`, tuple `IntoStatement` impls
`crates/toasty/src/lib.rs`	Re-export `batch`, `Batch`, `IntoStatement`
`crates/toasty/src/engine/lower.rs`	Handle record-of-subqueries in returning (if needed)

DynamoDB: OR Predicates in Index Key Conditions

Problem

DynamoDB’s KeyConditionExpression does not support OR — neither for partition keys nor sort keys. This means queries like WHERE user_id = 1 OR user_id = 2 on an indexed field are currently broken for DynamoDB.

The engine must detect OR in index key conditions and fan them out into N individual DynamoDB Query calls — one per OR branch — then concatenate the results.

A secondary motivation: the batch-load mechanism used for nested association preloads (rewrite_stmt_query_for_batch_load_nosql) produces ANY(MAP(arg[input], pred)), which at exec time expands to OR via simplify_expr_any. This hits the same DynamoDB restriction and is addressed by the same fix.

Where OR Can Reach a Key Condition

Only two engine actions use KeyConditionExpression:

QueryPk — queries the primary table when exact PK keys cannot be extracted
FindPkByIndex — queries a GSI to retrieve primary keys

GetByKey uses BatchGetItem (explicit key values, no expression), so OR is never relevant there. pk = v1 OR pk = v2 on the primary key produces IndexPlan.key_values = Some([v1, v2]), routing to GetByKey directly — no issue.

`QueryPk`

OR reaches QueryPk.pk_filter when IndexPlan.key_values is None:

User-specified OR on sort key: WHERE pk = v AND (sk >= s1 OR sk >= s2) — range predicates have no extractable key values.
Batch-load (e.g. a HasMany where the FK is the partition key of the child’s composite primary key): rewrite_stmt_query_for_batch_load_nosql produces ANY(MAP(arg[input], fk = arg[0])). The list is a runtime input, so key_values is None. At exec time simplify_expr_any expands it to OR.

`FindPkByIndex`

FindPkByIndex.filter is the output of partition_filter, which isolates index key conditions from non-key conditions. partition_filter on AND distributes cleanly: status = active AND (user_id = 1 OR user_id = 2) produces index_filter = user_id = 1 OR user_id = 2 and result_filter = status = active.

OR reaches it in the same two ways as QueryPk:

User-specified OR: WHERE user_id = 1 OR user_id = 2 on a GSI partition key.
Batch-load: same ANY(MAP(arg[input], pred)) expansion path as above.

Mixed OR Operands

partition_filter currently has a todo!() for OR operands that contain both index and non-index parts — e.g. (pk = 1 AND status = a) OR pk = 2.

This is in scope. Strategy:

Extract key conditions from each OR branch to build the fan-out: ANY(MAP([1, 2], pk = arg[0]))
Apply the full original predicate as an in-memory post-filter: (pk = 1 AND status = a) OR pk = 2

This is conservative but correct, and consistent with how post_filter is already used.

Canonical Form: `ANY(MAP(key_list, per_call_pred))`

All OR cases are represented uniformly as ANY(MAP(key_list, per_call_pred)):

key_list — one entry per required Query call; each entry has one value per key column (scalar for partition-key-only, Value::Record for partition + sort key)
per_call_pred — the key condition for one call, referencing element fields as arg[0], arg[1], …

Single key column — user_id = 1 OR user_id = 2:

ANY(MAP([1, 2], user_id = arg[0]))

Composite key — (todo_id = t1 AND step_id >= s1) OR (todo_id = t2 AND step_id >= s2):

ANY(MAP([(t1, s1), (t2, s2)], todo_id = arg[0] AND step_id >= arg[1]))

Batch-load — ANY(MAP(arg[input], todo_id = arg[0])) — already in canonical form; no structural change needed, only the exec fan-out behavior changes.

Design

1. Capability Flag

#![allow(unused)]
fn main() {
/// Whether OR is supported in index key conditions (e.g. DynamoDB KeyConditionExpression).
pub index_or_predicate: bool,
}

DynamoDB: false. All other backends: true (SQL backends never use these actions).

2. `IndexPlan` Output Contract

#![allow(unused)]
fn main() {
pub(crate) struct IndexPlan<'a> {
    pub(crate) index: &'a Index,

    /// Filter to push to the index. Guaranteed form:
    ///
    /// | Condition                          | Form                                             |
    /// |------------------------------------|--------------------------------------------------|
    /// | No OR                              | plain expr — `user_id = 1`                       |
    /// | OR, `index_or_predicate = true`    | `Expr::Or([branch1, branch2, ...])`              |
    /// | OR, `index_or_predicate = false`   | `ANY(MAP(Value::List([v1, ...]), per_call_pred))` |
    /// | Batch-load (any capability)        | `ANY(MAP(arg[input], per_call_pred))`            |
    pub(crate) index_filter: stmt::Expr,

    /// Non-index conditions applied in-memory after results return from each call.
    pub(crate) result_filter: Option<stmt::Expr>,

    /// Full original predicate applied after all fan-out results are collected.
    /// Set for mixed OR operands — see §"Mixed OR Operands".
    pub(crate) post_filter: Option<stmt::Expr>,

    /// Literal key values for direct lookup: a `Value::List` of `Value::Record` entries,
    /// one per lookup. Set by `partition_filter` when all key columns have literal equality
    /// matches. When `Some`, the planner routes to `GetByKey` and ignores `index_filter`.
    /// May coexist with a canonical `ANY(MAP(...))` `index_filter` — both are produced
    /// simultaneously by `partition_filter`; the planner always prefers `GetByKey`.
    pub(crate) key_values: Option<stmt::Value>,
}
}

Planner routing (primary key path):

key_values.is_some()          → GetByKey (BatchGetItem)
index_filter = ANY(MAP(...))  → fan-out via QueryPk × N
otherwise                     → single QueryPk call

3. Key Value Extraction in `index_match`

partition_filter extracts literal key values during filter partitioning, setting key_values when all key columns have literal equality matches. This replaces the current try_build_key_filter (kv.rs) post-hoc re-analysis of index_filter.

What moves into index_match: walking each OR branch, reading the RHS of each key column’s equality predicate, assembling Value::List([Value::Record([v0, ...]), ...]).

What stays in the planner: constructing eval::Func from key_values to drive the GetByKey operation — a mechanical wrap requiring no further expression analysis.

Why this matters for ordering: if partition_filter produced the canonical ANY(MAP([1,2], pk=arg[0])) form first, the downstream try_build_key_filter Or arm would never fire, silently breaking the GetByKey path for primary key OR queries. Extracting key values inside partition_filter eliminates this conflict — both outputs are produced together.

4. Planner Invariant

When !capability.index_or_predicate, neither FindPkByIndex.filter nor QueryPk.pk_filter contains Expr::Or. OR is always restructured into ANY(MAP(arg[i], per_call_pred)) by partition_filter before reaching the exec layer.

Batch-load path — ANY(MAP(...)) is already produced upstream; the invariant holds. Only the exec fan-out needs fixing.

User-specified OR path — partition_filter produces canonical form directly. The planner consumes IndexPlan.index_filter as-is; no rewrite in plan_secondary_index_execution or plan_primary_key_execution. For mixed OR operands, partition_filter additionally sets IndexPlan.post_filter to the full original predicate.

5. Exec Fan-out

Both action_find_pk_by_index and action_query_pk receive the same treatment.

After substituting inputs into the filter, check for ANY(MAP(arg[i], per_call_pred)):

If present: iterate over input[i] element by element; substitute each into per_call_pred and issue one driver call; concatenate results. Do not call simplify_expr_any — it would re-expand to OR.
Otherwise: unchanged single-call path.

6. DynamoDB Driver

Revert the temporary OR-splitting workaround in exec_find_pk_by_index. The driver is a dumb executor of a single valid key condition.

Summary of Changes

Location	Change
`Capability`	Add `index_or_predicate: bool`; `false` for DynamoDB
`IndexPlan`	Add `key_values: Option<stmt::Value>` field
`index_match` / `partition_filter`	`Or` arm: produce canonical `ANY(MAP(...))` when `!index_or_predicate`; extract `key_values`; fix mixed OR `todo!()`
`plan_primary_key_execution`	Route on `key_values` / `ANY(MAP(...))` instead of calling `try_build_key_filter`
`plan_secondary_index_execution`	No rewrite needed; consumes `IndexPlan.index_filter` as-is
`kv.rs` / `try_build_key_filter`	Remove (literal case now handled by `index_match`)
`action_find_pk_by_index`	Fan out over `ANY(MAP(...))` — one driver call per element; skip `simplify_expr_any`
`action_query_pk`	Same fan-out treatment
DynamoDB `exec_find_pk_by_index`	Revert OR-splitting workaround

Data-Carrying Enum Implementation Design

Builds on unit enum support (#355). See docs/design/enums-and-embedded-structs.md for the user-facing design.

Value Stream Encoding

Unit and data variants are encoded differently in the value stream:

Unit variant: Value::I64(discriminant) — unchanged from unit enum encoding
Data variant: Value::Record([I64(discriminant), ...active_field_values])

Only the active variant’s fields appear in the record; inactive variant columns (NULL in the DB) are not included. Primitive::load dispatches on the value type:

I64(d)      => unit variant with discriminant d
Record(r)   => data variant; r[0] is the discriminant, r[1..] are fields

Schema Changes

EnumVariant gains a fields: Vec<Field> — the same Field type used by EmbeddedStruct. Field indices are assigned globally across all variants within the enum, keeping FieldId { model: enum_id, index } as a unique identifier consistent with how EmbeddedStruct works. The primary_key, auto, and constraints members of Field are always false/None/[] for variant fields.

Primitive::ty() changes based on variant content:

Unit-only enum → Type::I64 (unchanged)
Any data variant present → Type::Model(Self::id()), same as embedded structs

Codegen Changes

Parsing: toasty-macros/src/schema/ parses variant fields and includes them in EmbeddedEnum registration so the runtime schema is complete.

Primitive::load: generated arms dispatch on value type first (I64 vs Record), then on the discriminant within each branch. Data variant arms load each field from its positional index in the record.

IntoExpr: unit variants emit Value::I64(disc) as today; data variants emit Value::Record([I64(disc), field_exprs...]).

{Enum}Fields struct: all enums (unit-only and data-carrying) generate a {Enum}Fields struct with is_{variant}() methods for discriminant-only filtering. For data-carrying enums, is_{variant}() uses project(path, [0]) to extract the discriminant from the record representation. For unit-only enums, it compares the path directly. The struct also delegates comparison methods (eq, ne, etc.) to Path<Self>.

Engine: `Expr::Match`

Both table_to_model and model_to_table are expressed using:

Match { subject: Expr, arms: [(pattern: Value, expr: Expr)], else_expr: Expr }

Expr::Match is never serialized to SQL — it is either evaluated in the engine (for writes) or eliminated by the simplifier before the plan stage (for reads/queries).

table_to_model

For an enum field, table_to_model emits a Match on the discriminator column. Each arm produces the value shape Primitive::load expects: unit arms emit I64(disc), data arms emit Record([I64(disc), ...field_col_refs]).

else branch: `Expr::Error`

The else branch of an enum Match represents the case where the discriminant column holds an unrecognized value — semantically unreachable for well-formed data.

For data-carrying enums, the else branch is Record([disc_col, Error, ...Error]) — the same Record shape as data arms, but with Expr::Error in every field slot. This design is critical for the simplifier: projections distribute uniformly into the else branch, and field-slot projections yield Expr::Error (correct: accessing a field on an unknown variant is an error), while discriminant projections ([0]) yield disc_col (the same as every arm). This enables the uniform-arms optimization to fire after projection.

For unit-only enums with data variants, else is Expr::Error directly.

model_to_table

Runs the inverse: the incoming value (I64 or Record) is matched on its discriminant, and each arm emits a flat record of all enum columns in DB order — setting the discriminator and active variant fields, and NULLing every inactive variant column. This NULL-out is mandatory: because writes may not have a loaded model, the engine has no knowledge of the prior variant and must clear all non-active columns unconditionally.

Simplifier Rules

Project into Match (expr_project.rs)

Distributes a projection into each Match arm AND the else branch:

project(Match(subj, [p => e, ...], else), [i])
  → Match(subj, [p => project(e, [i]), ...], else: project(else, [i]))

Projection is pushed into the else branch unconditionally — Expr::Error inside a Record is handled naturally (projecting [0] out of Record([disc, Error]) yields disc; projecting [1] yields Error).

Uniform arms (expr_match.rs)

When all arms AND the else branch produce the same expression, the Match is redundant:

Match(subj, [1 => disc, 2 => disc], else: disc)  →  disc

The else branch MUST equal the common arm expression for this rule to fire. This makes the transformation provably correct — no branch is dropped that could produce a different value.

Match elimination in binary ops (expr_binary_op.rs)

Distributes a binary op over match arms, producing an OR of guarded comparisons. The else branch is included with a negated guard:

Match(subj, [p1 => e1, p2 => e2], else: e3) == rhs
  → OR(subj == p1 AND e1 == rhs,
       subj == p2 AND e2 == rhs,
       subj != p1 AND subj != p2 AND e3 == rhs)

Each term is fully simplified inline. Terms that fold to false/null are pruned. No special handling is needed for the else branch — it is always included and existing simplification rules handle Expr::Error naturally (see below).

`Expr::Error` semantics

Expr::Error is treated as “unreachable” — not as a poison value that propagates. No special Error propagation rules exist. Instead, existing rules eliminate Error through the surrounding context:

Data-carrying enum else: Record([disc, Error, ...]). After tuple decomposition, the guard disc != p1 AND disc != p2 contradicts the decomposed disc == c from the comparison target. The contradicting equality rule (a == c AND a != c → false) folds the AND to false.
false AND (Error == x): The false short-circuit in AND eliminates the term without needing to simplify Error == x.
Record([1, Error]) == Record([0, "alice"]): Tuple decomposition produces 1 == 0 AND Error == "alice". The 1 == 0 → false folds the AND to false.

In all well-formed cases, the guard constraints around Error cause the branch to be pruned without requiring Error-specific rules.

Type inference for `Expr::Error`

Expr::Error infers as Type::Unknown. TypeUnion::insert skips Unknown, so an Error branch in a Match doesn’t widen the inferred type union.

Variant-only filter flow

is_email() generates eq(project(path, [0]), I64(1)). After lowering:

eq(project(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
                 else: Record([disc, Error])), [0]),
   I64(1))

Project-into-Match distributes [0] into all branches including else
project(Record([disc, addr]), [0]) → disc (for each arm)
project(Record([disc, Error]), [0]) → disc (for else)
Uniform-arms fires: all arms AND else produce disc → folds to disc
Result: eq(disc, I64(1)) — a clean disc_col = 1 predicate

Full-value equality filter flow

contact().eq(ContactInfo::Email { address: "alice@example.com" }) generates eq(path, Record([I64(1), "alice@example.com"])). After lowering:

eq(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
         else: Record([disc, Error])),
   Record([I64(1), "alice@example.com"]))

Match elimination distributes eq into each arm AND else as OR
disc == 1 AND Record([disc, addr]) == Record([I64(1), "alice"]) → simplifies
disc == 2 AND Record([disc, num]) == Record([I64(1), "alice"]) → false (pruned)
Else: disc != 1 AND disc != 2 AND Record([disc, Error]) == Record([I64(1), "alice"]) → tuple decomposition: disc != 1 AND disc != 2 AND disc == 1 AND Error == "alice" → contradicting equality (disc == 1 AND disc != 1) → false (pruned)
Result: disc_col = 1 AND addr_col = 'alice@example.com'

Correctness Sharp Edges

Whole-variant replacement must NULL all inactive columns. The engine has no knowledge of the prior variant for query-based updates, so the model_to_table arms unconditionally NULL every column they do not own.

NULL discriminators are disallowed. The discriminator column carries NOT NULL, consistent with unit enums today. Option<Enum> support is a future concern.

Unknown discriminants fail at load time. An unrecognized discriminant (e.g. from a newer schema version) produces a runtime error via Expr::Error. Removing a variant requires a data migration.

No DB-level integrity for active variant fields. All variant columns are nullable (to accommodate inactive variants), so a NULL in a required active field is caught only at load time by Primitive::load, not at write time.

DynamoDB

Equivalent encoding to be determined when implementing the DynamoDB driver phase.

Implementation Status

Completed

Schema: fields: Vec<Field> on EnumVariant; codegen parsing; Primitive::ty() returns Type::Model for data-carrying enums.
Value encoding: Primitive::load() dispatches on I64 vs Record; IntoExpr emits Record for data variants.
Expr::Match + Expr::Error: Match/MatchArm AST nodes with visitors, eval, and simplifier integration. Expr::Error for unreachable branches. build_table_to_model_field_enum uses Record([disc, Error, ...]) for the else branch.
Simplifier: project-into-Match distribution; uniform-arms folding (with else-branch check); Match-to-OR elimination in binary ops; case distribution for binary ops with Match operands.
{Enum}Fields codegen: all enums generate a fields struct with is_{variant}() methods and delegated comparison methods.
Integration tests: CRUD for data-carrying enums; full-value equality filter; variant-only filter (is_email()); unit enum variant filter (is_pending()).
Variant+field filter (contact().email().matches(|e| e.address().eq("x"))): per-variant field accessors with closure-based .matches() API.
OR tautology elimination: is_variant(x, 0) or is_variant(x, 1) covering all variants of an enum folds to true in the OR simplifier.

Remaining

Partial updates: within-variant partial update builder.
DynamoDB: equivalent encoding in the DynamoDB driver.

Open Questions

SparseRecord / reload: within-variant partial updates are supported, so SparseRecord and reload are needed for enum variant fields. Determine how reload should handle a SparseRecord scoped to a specific variant’s fields — the in-memory model must update only the changed fields without disturbing the discriminant or other variant columns.
Shared columns: variants sharing a column via #[column("name")] is in the user-facing design. Schema parsing should record shared columns in Phase 1; full query support is a follow-on.

Enum and Embedded Struct Support

Addresses Issue #280.

Scope

Add support for:

Enum types as model fields (unit, tuple, struct variants)
Embedded structs (no separate table, stored inline)

Both use #[derive(toasty::Embed)].

Storage Strategy

Flattened storage:

Enums: Discriminator column + nullable columns per variant field
- INTEGER discriminator with required #[column(variant = N)] on each variant
- Works uniformly across all databases (PostgreSQL, MySQL, SQLite, DynamoDB)
Embedded structs: No discriminator, just flattened fields
Newtype structs (struct Email(String)): Single unnamed field, maps to one column with the parent field’s name (no prefix). Supports #[key], #[unique], and #[index] on the parent model field.

Unit-only enums: No columns - stored as the INTEGER value itself.

Post-MVP: Native ENUM types for PostgreSQL/MySQL discriminators (optimization).

Column Naming

Newtype structs: {field} — no suffix. A newtype has one unnamed field, so the column uses the parent field name directly (e.g., email: Email → column email).

Multi-field embedded structs: {field}_{name} (e.g., address: Address with field city → column address_city).

Enums: {field}_{variant}_{name}

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    critter: Creature,  // field name
}

#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },      // variant, field
    #[column(variant = 2)]
    Lizard { habitat: String },
}

// Columns:
// - critter (discriminator)
// - critter_human_profession
// - critter_lizard_habitat
}

Customization

Rename field (at enum definition):

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },
    #[column(variant = 2)]
    Lizard {
        #[column("lizard_env")]  // Must include variant scope
        habitat: String,
    },
}
// → critter_lizard_env (field prefix "critter" is prepended)
}

Custom column names for enum variant fields must include the variant scope. The pattern becomes {field}_{custom_name} where custom_name should include the variant portion.

Rename field prefix (per use):

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    #[column("creature_type")]
    critter: Creature,
}
// → creature_type (discriminator)
// → creature_type_human_profession (field prefix replaced for all columns)
// → creature_type_lizard_habitat
}

The #[column("name")] attribute on the parent struct’s field replaces the field prefix for all generated columns.

Customize discriminator type (on enum definition):

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")]
enum Creature { ... }
}

The #[column(type = "...")] attribute on the enum type customizes the database type for the discriminator column (e.g., “bigint”, “smallint”, “tinyint”).

Tuple Variants

Numeric field naming: {field}_{variant}_{index}

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
    #[column(variant = 1)]
    Phone(String, String),
}
// Columns: contact, contact_phone_0, contact_phone_1
}

Customize with #[column("...")]:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
    #[column(variant = 1)]
    Phone(
        #[column("phone_country")]
        String,
        #[column("phone_number")]
        String,
    ),
}
// Columns: contact, contact_phone_country, contact_phone_number
}

Nested Types

Path flattened with underscores:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
    #[column(variant = 1)]
    Mail { address: Address },
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}

// → contact_mail_address_city
// → contact_mail_address_street
}

Shared Columns Across Variants

Multiple variants can share the same column by specifying the same #[column("name")]:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
    #[key]
    #[auto]
    id: u64,
    creature: Creature,
}

#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human {
        #[column("name")]
        name: String,
        profession: String,
    },
    #[column(variant = 2)]
    Animal {
        #[column("name")]
        name: String,
        species: String,
    },
}

// Columns:
// - creature (discriminator)
// - creature_name (shared between Human and Animal)
// - creature_human_profession
// - creature_animal_species
}

Requirements:

Fields sharing a column must have compatible types (validated at schema build time)
The shared column name must be identical across variants
Compatible types: same primitive type, or compatible type conversions
Shared columns are still nullable at the database level (NULL when variant doesn’t use that field)

Discriminator Types

MVP: INTEGER discriminator for all databases

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },
    #[column(variant = 2)]
    Lizard { habitat: String },
}
}

All variants require #[column(variant = N)] with unique integer values. Compile error if missing.

Customize discriminator type:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")]  // Or "smallint", "tinyint", etc.
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },
    #[column(variant = 2)]
    Lizard { habitat: String },
}
}

The #[column(type = "...")] attribute on the enum customizes the database type for the discriminator column.

Post-MVP: Native ENUM types for PostgreSQL/MySQL

CREATE TYPE creature AS ENUM ('Human', 'Lizard');

Can customize with #[column(variant = "name")] on variants.

NULL Handling

Inactive variant fields are NULL.

-- When critter = 'Human':
critter_human_profession = 'Knight'
critter_lizard_habitat = NULL

For Option<T> fields: Check discriminator first, then interpret NULL.

Usage

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    address: Address,  // embedded struct
    status: Status,    // embedded enum
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}

#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active { since: DateTime },
}
}

Registration: Automatic. Registering a model transitively registers all models reachable through its fields, including nested embedded types and relation targets.

Relations: Forbidden in embedded types (compile error).

Examples

Basic Enum

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
    #[key]
    #[auto]
    id: u64,
    status: Status,
}

#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active,
    #[column(variant = 3)]
    Done,
}
}

Schema:

CREATE TABLE task (
    id INTEGER PRIMARY KEY,
    status INTEGER NOT NULL
);
-- 1=Pending, 2=Active, 3=Done (requires #[column(variant = N)])

Data-Carrying Enum

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    contact: ContactMethod,
}

#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Phone { country: String, number: String },
}
}

Schema:

CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    contact INTEGER NOT NULL,
    contact_email_address TEXT,
    contact_phone_country TEXT,
    contact_phone_number TEXT
);

Embedded Struct

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    address: Address,
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}
}

Schema:

CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    address_street TEXT NOT NULL,
    address_city TEXT NOT NULL,
    address_zip TEXT NOT NULL
);

Nested Enum + Embedded

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Mail { address: Address },
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}
}

Schema:

-- contact: ContactInfo
contact INTEGER NOT NULL,
contact_email_address TEXT,
contact_mail_address_street TEXT,
contact_mail_address_city TEXT

Querying

Basic variant checks

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
    #[key]
    #[auto]
    id: u64,
    status: Status,
}

#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active,
    #[column(variant = 3)]
    Done,
}

// Query by variant (shorthand)
Task::all().filter(Task::FIELDS.status().is_pending())
Task::all().filter(Task::FIELDS.status().is_active())

// Equivalent using .matches() without field conditions
Task::all().filter(
    Task::FIELDS.status().matches(Status::VARIANTS.pending())
)
}

Field access on variant fields

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    contact: ContactMethod,
}

#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Phone { country: String, number: String },
}

// Match specific variants and access their fields
User::all().filter(
    User::FIELDS.contact().matches(
        ContactMethod::VARIANTS.email().address().contains("@gmail")
    )
)

User::all().filter(
    User::FIELDS.contact().matches(
        ContactMethod::VARIANTS.phone().country().eq("US")
    )
)

// Shorthand for variant-only checks (no field conditions)
User::all().filter(User::FIELDS.contact().is_email())
User::all().filter(User::FIELDS.contact().is_phone())

// Equivalent using .matches()
User::all().filter(
    User::FIELDS.contact().matches(ContactMethod::VARIANTS.email())
)
}

Embedded struct field constraints

Embedded struct fields can be accessed directly for filtering, ordering, and other query operations:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    address: Address,
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

// Filter by embedded struct fields
User::all().filter(User::FIELDS.address().city().eq("Seattle"))
User::all().filter(User::FIELDS.address().zip().like("98%"))

// Multiple constraints on embedded struct
User::all().filter(
    User::FIELDS.address().city().eq("Seattle")
        .and(User::FIELDS.address().zip().like("98%"))
)

// Order by embedded struct fields
User::all().order_by(User::FIELDS.address().city().asc())

// Select embedded struct fields (projection)
User::all()
    .select(User::FIELDS.id())
    .select(User::FIELDS.address().city())
}

Nested embedded structs

For nested embedded types, continue chaining field accessors:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Company {
    #[key]
    #[auto]
    id: u64,
    headquarters: Office,
}

#[derive(toasty::Embed)]
struct Office {
    name: String,
    location: Address,
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

// Access nested embedded struct fields
Company::all().filter(
    Company::FIELDS.headquarters().location().city().eq("Seattle")
)

Company::all().filter(
    Company::FIELDS.headquarters().name().eq("Main Office")
        .and(Company::FIELDS.headquarters().location().zip().like("98%"))
)
}

Combining enum and embedded struct constraints

When an enum variant contains an embedded struct, use .matches() to specify the variant, then access the embedded struct’s fields:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    contact: ContactInfo,
}

#[derive(toasty::Embed)]
enum ContactInfo {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Mail { address: Address },
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}

// Filter by embedded struct fields within enum variant
User::all().filter(
    User::FIELDS.contact().matches(
        ContactInfo::VARIANTS.mail().address().city().eq("Seattle")
    )
)

// Multiple constraints on embedded struct within variant
User::all().filter(
    User::FIELDS.contact().matches(
        ContactInfo::VARIANTS.mail()
            .address().city().eq("Seattle")
            .address().street().contains("Main")
    )
)
}

Constraints with shared columns

When enum variants share columns, constraints apply based on the variant being matched:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
    #[key]
    #[auto]
    id: u64,
    creature: Creature,
}

#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human {
        #[column("name")]
        name: String,
        profession: String,
    },
    #[column(variant = 2)]
    Animal {
        #[column("name")]
        name: String,
        species: String,
    },
}

// Query the shared "name" field for a specific variant
Character::all().filter(
    Character::FIELDS.creature().matches(
        Creature::VARIANTS.human().name().eq("Alice")
    )
)

// Query across variants using the shared column
// (finds any creature with this name, regardless of variant)
Character::all().filter(
    Character::FIELDS.creature().name().eq("Bob")
)

// Variant-specific field
Character::all().filter(
    Character::FIELDS.creature().matches(
        Creature::VARIANTS.human().profession().eq("Knight")
    )
)
}

Updating

Update builders provide two methods per field:

.field(value) - Direct value assignment
.with_field(|f| ...) - Closure-based update

The .with_* methods provide a uniform API across all field types and enable:

Embedded types: Partial updates (only set specific nested fields)
Primitives: Future type-specific operations (e.g., NumericUpdate::increment())
Enums: Update variant fields without changing the discriminator

Whole replacement

Setting an embedded struct field on an update replaces all of its columns:

#![allow(unused)]
fn main() {
// Loaded model update — sets address_street, address_city, address_zip
user.update()
    .address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
    .exec(&db).await?;

// Query-based update — same behavior, no model loaded
User::filter_by_id(id).update()
    .address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
    .exec(&db).await?;
}

Partial updates

Each field (primitive or embedded) generates a companion {Type}Update<'a> type that provides a view into the update statement’s assignments. These update types hold a reference to the statement and a projection path, allowing them to directly mutate the statement as fields are set. This enables efficient nested updates without intermediate allocations.

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

// AddressUpdate<'a> is generated automatically by `#[derive(toasty::Embed)]`
// StringUpdate<'a> is generated for primitive String fields
}

Embedded types:

#![allow(unused)]
fn main() {
// Whole replacement — sets all address columns
user.update()
    .address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
    .exec(&db).await?;

// Partial update — only address_city is SET
user.update()
    .with_address(|a| {
        a.set_city("Seattle");
    })
    .exec(&db).await?;

// Multiple sub-fields — only address_city and address_zip are SET
user.update()
    .with_address(|a| {
        a.set_city("Seattle");
        a.set_zip("98101");
    })
    .exec(&db).await?;

// Query-based partial update
User::filter_by_id(id).update()
    .with_address(|a| a.set_city("Seattle"))
    .exec(&db).await?;
}

Primitive types:

#![allow(unused)]
fn main() {
// Direct value
user.update()
    .name("Alice")
    .exec(&db).await?;

// Via closure (enables future type-specific operations)
user.update()
    .with_name(|n| {
        n.set("Alice");
    })
    .exec(&db).await?;
}

For now, primitive update builders only provide .set(). Future enhancements could add type-specific operations like NumericUpdate::increment(), StringUpdate::append(), etc.

Partial updates with nested embedded structs

Nested embedded structs also generate {Type}Update<'a> types. The .with_* methods can be nested naturally:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Office {
    name: String,
    location: Address,
}

// Update only headquarters_location_city
company.update()
    .with_headquarters(|h| {
        h.with_location(|a| {
            a.set_city("Seattle");
        });
    })
    .exec(&db).await?;

// Update headquarters_name and headquarters_location_zip
company.update()
    .with_headquarters(|h| {
        h.with_name(|n| n.set("West Coast HQ"));
        h.with_location(|a| {
            a.set_zip("98101");
        });
    })
    .exec(&db).await?;
}

Enum updates

Enums use whole-variant replacement. Setting an enum field replaces the discriminator and all variant columns:

#![allow(unused)]
fn main() {
// Replace the entire enum value — sets discriminator + variant fields,
// NULLs out fields from the previous variant
user.update()
    .contact(ContactMethod::Email { address: "new@example.com".into() })
    .exec(&db).await?;
}

For data-carrying variants, use .with_contact() to update fields within the current variant without changing the discriminator:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Phone { country: String, number: String },
}

// Update only the phone number, leave country and discriminator unchanged
user.update()
    .with_contact(|c| {
        c.phone(|p| {
            p.with_number(|n| n.set("555-1234"));
        });
    })
    .exec(&db).await?;

// Update email variant
User::filter_by_id(id).update()
    .with_contact(|c| {
        c.email(|e| {
            e.with_address(|a| a.set("new@example.com"));
        });
    })
    .exec(&db).await?;
}

ContactMethodUpdate<'a> has one method per variant (e.g., .phone(), .email()). Each method accepts a closure that receives a builder scoped to that variant’s fields. The discriminator is not changed by partial updates.

Mapping Layer Formalization

Problem

Toasty’s mapping layer connects model-level fields to database-level columns. A model field’s type may differ from its storage type (e.g., Timestamp stored as i64 or text). The mapping must be a bijection — every model value encodes to exactly one stored value and decodes back losslessly. The bijection operates at the record level, not per-field: n model fields may map to m database columns (e.g., multiple fields JSON-encoded into a single column).

The bijection alone is not sufficient. When lowering expressions (filters, ORDER BY, arithmetic) to the database, we need to know whether a given operator can be pushed through the encoding. This is the question of whether the encoding is a homomorphism with respect to that operator:

For arithmetic: encode(a ⊕ b) = encode(a) ⊕' encode(b)
For comparisons: a < b ⟺ encode(a) <' encode(b)

If yes, the operator can be evaluated in storage space (efficient, index-friendly). If no, the database must first decode to the model type (SQL CAST) or the operation must be evaluated application-side.

These are two decoupled concerns:

Bijection — can we round-trip values? (required for correctness)
Operator homomorphism — which operators preserve semantics through the encoding? (determines what can be pushed to the DB)

A mapping with no homomorphic operators is still valid — you can store and retrieve. You just can’t push any filters or ordering to the database.

Examples

Timestamp as `i64` (epoch seconds)

encode(ts) = ts.epoch_seconds()
decode(n)  = Timestamp::from_epoch_seconds(n)

Bijection: ✓ — lossless round-trip.

== homomorphic: ✓ — ts1 == ts2 ⟺ encode(ts1) == encode(ts2)

< homomorphic: ✓ — ts1 < ts2 ⟺ encode(ts1) < encode(ts2)

Epoch seconds preserve temporal ordering under integer comparison, so range queries (<, >, BETWEEN) can operate directly on the raw column.

+ homomorphic: ✓ — encode(ts + 234s) = encode(ts) + 234

Integer addition over epoch seconds preserves timestamp arithmetic.

Timestamp as `text` (ISO 8601)

encode(ts) = ts.to_iso8601()
decode(s)  = Timestamp::parse_iso8601(s)

Bijection: ✓ — lossless round-trip (assuming canonical formatting).

== homomorphic: ✓ — injective encoding preserves equality.

< homomorphic: fragile — lexicographic order matches temporal order only for fixed-width UTC formats. Not generally safe.

+ homomorphic: ✗ — text + 234 is meaningless.

String with case inversion

encode(s) = s.invert_case()    // "Hello" → "hELLO"
decode(s) = s.invert_case()    // "hELLO" → "Hello"

Bijection: ✓ — case inversion is its own inverse.

== homomorphic: ✓ — injective, so equality is preserved. Encode the search term the same way and compare.

< homomorphic: ✗ — ordering is reversed between cases:

"ABC" < "abc"                   (A=65 < a=97)
encode("ABC") = "abc"
encode("abc") = "ABC"
"abc" > "ABC"                   — ordering reversed

A valid mapping, but useless for range queries in storage space.

Bijection by Construction

For arbitrary functions, bijectivity is undecidable. Instead of detecting it, we construct mappings from known-bijective primitives and composition rules that preserve bijectivity. If a mapping is built entirely from these, it is guaranteed valid.

Composition rules

Sequential: f ∘ g is a bijection if both f and g are.
Parallel/product: (f(a), g(b)) is a bijection if both f and g are.

These compose freely — complex mappings built from simple bijective pieces are automatically valid. Homomorphism properties, however, may be lost at each composition step and must be tracked separately.

Dimensionality: multiple fields → one column

Two fields may map to the same column if and only if the model constrains them to always hold the same value (an equivalence class). In this case no information is lost and the mapping remains a bijection — but only over the restricted domain where the constraint holds. Without such a constraint, collapsing two independent fields into one column destroys injectivity.

This gives us computed fields as a natural consequence. Two fields can reference the same column through different bijective transformations:

regular:  String → column              (identity)
inverted: String → invert_case(column) (bijection)

Because the transformations are bijections, both fields are readable AND writable. Writing regular = "Hello" stores "Hello" in the column; inverted automatically becomes "hELLO". Writing inverted = "hELLO" applies the inverse to store "Hello"; regular is automatically "Hello". Data flow in both directions is fully determined by the bijection — no special computed-field machinery needed.

Computed Fields

Storage is the source of truth. Each field is a view of the underlying column(s) through its bijection. Computed fields are a direct consequence: when multiple fields reference the same column through different bijections, each field is a different view of the same stored data.

Schema representation

Each field stores a bijection pair:

field_to_column: encode — compute column value from field value (inverse)
column_to_field: decode — compute field value from column value (forward)

A reverse index maps each column to the set of fields that reference it.

Write propagation

When a field is set, the column value is determined, which determines all sibling fields:

Compute column value: col = field_a.field_to_column(new_value)
For each sibling field on the same column: field_b = field_b.column_to_field(col)

The composed transform between two fields sharing a column is: field_b.column_to_field(field_a.field_to_column(value))

Conflict detection

If the user sets two fields that share a column in the same operation, the resulting column values must agree. If field_a.field_to_column(val_a) ≠ field_b.field_to_column(val_b), the write is invalid and must be rejected.

Bijective Primitives

Three categories of bijective primitives, each with encode/decode halves:

Type reinterpretation

Converts a single value between two types with the same information content. Implemented as Expr::Cast in both directions.

Current pairs:

Timestamp ↔ String (ISO 8601)
Uuid ↔ String
Uuid ↔ Bytes
Date ↔ String
Time ↔ String
DateTime ↔ String
Zoned ↔ String
Timestamp ↔ DateTime
Timestamp ↔ Zoned
Zoned ↔ DateTime
Decimal ↔ String
BigDecimal ↔ String
Integer widening/narrowing (i8 ↔ i16 ↔ i32 ↔ i64, etc.)

Affine transformations

Arithmetic transformations by a constant. Each is a bijection with a known inverse.

x + k — inverse: x - k
x * k (k ≠ 0) — inverse: x / k
x * k + c (k ≠ 0) — inverse: (x - c) / k

Homomorphism properties (for x + k as representative):

== homomorphic: ✓ — a == b ⟺ (a+k) == (b+k)
< homomorphic: ✓ — a < b ⟺ (a+k) < (b+k)
+ homomorphic: ✗ — encode(a+b) = a+b+k ≠ encode(a)+encode(b) = a+b+2k

Note: x * k for negative k reverses ordering (< not homomorphic).

Product (record)

Packs/unpacks multiple independent values into a fixed-size tuple.

Encode: Expr::Record — combine values into a tuple
Decode: Expr::Project — extract by index

Bijective because each component is independent and individually recoverable. Used for embedded structs (fields flattened into columns).

Coproduct (tagged union)

Encodes/decodes a discriminated union (enum) where the discriminant partitions the domain into disjoint subsets.

Encode: Expr::Project — extract discriminant and per-variant fields
Decode: Expr::Match — branch on discriminant, reconstruct variant via Expr::Record

Bijective if and only if:

Arms are exhaustive (cover all discriminant values)
Arms are disjoint (no overlapping discriminants)
Each arm’s body is individually a bijection

This is a coproduct of bijections: if f_i: A_i → B_i is a bijection for each variant i, the combined mapping on the tagged union Σ_i A_i → Σ_i B_i is also a bijection.

Operator Homomorphism

Operator inventory

Current Toasty binary operators (BinaryOp): ==, !=, <, <=, >, >=.

Arithmetic operators (+, -) are not yet in the AST but are needed for computed fields and interval arithmetic.

For homomorphism analysis, != is the negation of ==, and >=/<= are derivable from </>. So the independent set is: ==, <, +.

Per-primitive homomorphism

Type reinterpretation:

Encoding	`==`	`<`	`+`
Timestamp ↔ String	✓	✓ (¹)	✗
Uuid ↔ String	✓	✗	n/a
Uuid ↔ Bytes	✓	✗	n/a
Date ↔ String	✓	✓ (¹)	✗
Time ↔ String	✓	✓ (¹)	✗
DateTime ↔ String	✓	✓ (¹)	✗
Zoned ↔ String	✓	✗	✗
Timestamp ↔ DateTime	✓	✓	✓
Timestamp ↔ Zoned	✓	✓	✓
Zoned ↔ DateTime	✓	✓	✓
Decimal ↔ String	✓	✗	✗
BigDecimal ↔ String	✓	✗	✗
Integer widening	✓	✓	✓

(¹) Requires canonical fixed-width serialization format. Lexicographic ordering matches semantic ordering only if Toasty guarantees consistent formatting (no variable-length subsecond digits, no timezone offset variations, etc.).

All type reinterpretations are injective, so == is always preserved. < and + depend on whether the target type’s native operations align with the source type’s semantics.

Affine transformations:

Encoding	`==`	`<`	`+`
`x + k`	✓	✓	✗
`x * k` (k>0)	✓	✓	✗
`x * k` (k<0)	✓	✗ (reversed)	✗
`x * k + c`	✓	✓ if k>0	✗

Product (record):

Operator	Homomorphic?
`==`	✓ — if each component preserves `==`
`<`	conditional — requires lexicographic comparison and each component preserves `<`
`+`	✓ — if each component preserves `+` (component-wise)

Coproduct (tagged union):

Operator	Homomorphic?
`==`	✓ — if discriminant + each arm preserves `==`
`<`	generally ✗ — cross-variant comparison is usually meaningless
`+`	✗ — arithmetic across variants undefined

Homomorphism under composition

Sequential (g ∘ f): if both f and g are homomorphic for an operator, so is the composition. Proof: a op b ⟺ f(a) op f(b) ⟺ g(f(a)) op g(f(b)).

Parallel/product ((f(a), g(b))): preserves == if both f and g do. Preserves < only if tuple comparison is lexicographic and both preserve <.

Coproduct: preserves == if each arm does. Does not generally preserve <.

Cross-encoding comparisons

When two operands use different encodings (e.g., field₁ uses Timestamp→i64, field₂ uses Timestamp→i64+offset), can_distribute does not directly apply. The comparison encode₁(a) op encode₂(b) mixes two encodings and may not preserve semantics.

Fallback: decode both to model space and compare there.

decode₁(col₁) op decode₂(col₂)

This always produces correct results but may require SQL CAST or application-side evaluation.

Database independence

can_distribute does not take a database parameter. Database capabilities determine which bijection is selected (e.g., PostgreSQL has native timestamps → identity mapping; SQLite does not → Timestamp↔i64). Once the bijection is chosen, can_distribute is purely a property of that bijection and the operator.

The only edge case is if two databases use the same types but their operators behave differently (e.g., string collation affecting <). This can be handled by treating such behavioral differences as part of the encoding rather than adding a database parameter.

Precision / Domain Restriction

Lossy encodings like #[column(type = timestamp(2))] involve two distinct steps:

Domain restriction (lossy, write-time): the user’s full-precision value is truncated to the representable domain. This is many-to-one — multiple inputs collapse to the same output. It is not part of the mapping.
Encoding (bijective): over the restricted domain (values with ≤2 fractional digits), the mapping is a perfect bijection — lossless round-trip.

The mapping framework only governs step 2. Step 1 is a write-time concern: when the user assigns a value, it gets projected into the representable domain. Analogous to integer narrowing (i64 → i32): the mapping between i32 values and the stored column is bijective; the loss happens if you store a value outside i32 range.

Nullability

Option<T> with None → NULL is a coproduct bijection:

Domain partition: Option<T> = None | Some(T) — two disjoint cases.
Encoding: None → NULL, Some(v) → encode(v) — each arm is individually bijective (unit↔NULL is trivially so; Some delegates to T’s encoding).
Decoding: NULL → None, non-NULL → Some(decode(v)).

This satisfies the coproduct conditions (exhaustive, disjoint, per-arm bijective).

NULL breaks standard `==`

SQL uses three-valued logic: NULL = NULL evaluates to NULL (falsy), not TRUE. This means the standard == operator is not homomorphic over the nullable encoding — the model-level None == None is true, but NULL = NULL is not.

NULL-safe operators

All Toasty target databases provide a NULL-safe equality operator:

Database	Operator
PostgreSQL	`IS NOT DISTINCT FROM`
MySQL	`<=>`
SQLite	`IS`

Using the NULL-safe operator restores == homomorphism: a == b ⟺ encode(a) IS NOT DISTINCT FROM encode(b).

Operator mapping

This means homomorphism is not just a property of (encoding, operator) — it is a property of the triple (encoding, model_op, storage_op). The lowerer may need to emit a different SQL operator than the one the user wrote:

Non-nullable field: model == → SQL =
Nullable field: model == → SQL IS NOT DISTINCT FROM (or <=>, IS)

can_distribute should return the storage-level operator to use, not just a boolean. Signature sketch:

can_distribute(encoding, model_op) -> Option<storage_op>

None means the operator cannot be pushed to the DB. Some(op) means it can, using the specified storage operator.

Ordering

NULL ordering is also database-specific (NULLS FIRST vs NULLS LAST). The lowerer must ensure consistent behavior across backends, potentially by emitting explicit NULLS FIRST/NULLS LAST clauses.

Lowering Algorithm

The lowerer transforms a model-level expression tree into a storage-level expression tree. The input contains field references and model-level literals. The output contains column references and storage-level values.

Core: lowering a binary operator

lower_binary_op(op, lhs, rhs):
    // 1. Identify field references and look up their encodings
    //    from the schema/mapping.
    lhs_encoding = lookup_encoding(lhs) if lhs is FieldRef, else None
    rhs_encoding = lookup_encoding(rhs) if rhs is FieldRef, else None

    // 2. Determine if the operator can distribute through the encoding.
    //    For single-column primitive encodings:
    if both are FieldRefs with same encoding:
        match can_distribute(encoding, op):
            Some(storage_op):
                // Both fields share the encoding — compare columns directly.
                emit: column_lhs storage_op column_rhs
            None:
                // Decode both to model space.
                emit: decode(column_lhs) op decode(column_rhs)

    if one is FieldRef, other is Literal:
        match can_distribute(field_encoding, op):
            Some(storage_op):
                // Encode the literal, compare in storage space.
                emit: column storage_op encode(literal)
            None:
                // Decode the column to model space.
                emit: decode(column) op literal

    if both are Literals:
        // Const-evaluate in model space.
        emit: literal_lhs op literal_rhs

Encoding the literal

encode(literal) applies the field’s field_to_column bijection to the model-level value, producing a storage-level value. For a UUID↔text encoding: encode(UUID("abc-123")) → "abc-123".

Decoding the column

decode(column_ref) applies the field’s column_to_field bijection to the column reference, wrapping it in the appropriate SQL expression. For UUID↔text: decode(uuid_col) → CAST(uuid_col AS UUID).

If the database lacks the model type (e.g., SQLite has no UUID), decode is not expressible in SQL. The operation must be evaluated application-side or the query rejected.

Multi-column encodings (product / coproduct)

For fields that span multiple columns, == expands structurally:

lower_binary_op(==, coproduct_field, literal):
    encoded = encode(literal)
    // encoded is a tuple: (disc_val, col1_val, col2_val, ...)

    // Expand into per-column comparisons:
    result = TRUE
    for each (column, encoded_value) in zip(field.columns, encoded):
        col_encoding = encoding_for(column)  // e.g., nullable text
        match can_distribute(col_encoding, ==):
            Some(storage_op):
                result = result AND (column storage_op encoded_value)
            None:
                result = result AND (decode(column) == encoded_value)
    emit: result

ORDER BY

lower_order_by(field):
    encoding = lookup_encoding(field)
    match can_distribute(encoding, <):
        Some(_):
            // Ordering is preserved in storage space.
            emit: ORDER BY column
        None:
            // Must decode to model space for correct ordering.
            emit: ORDER BY decode(column)

SELECT returning

Always decode — application needs model-level values:

lower_select_returning(field):
    emit: decode(column)  // column_to_field bijection

INSERT / UPDATE

Always encode — database needs storage-level values:

lower_insert_value(field, value):
    emit: encode(value)  // field_to_column bijection

Examples

WHERE uuid_col == UUID("abc-123"), UUID stored as text:

LHS is FieldRef → encoding: UUID↔text, column: uuid_col
RHS is literal: UUID("abc-123")
can_distribute(UUID↔text, ==) → Some(=)
Encode literal: "abc-123"
Output: uuid_col = 'abc-123'

WHERE uuid_col < UUID("abc-123"), UUID stored as text:

LHS is FieldRef → encoding: UUID↔text, column: uuid_col
RHS is literal: UUID("abc-123")
can_distribute(UUID↔text, <) → None
Decode column: CAST(uuid_col AS UUID)
Output: CAST(uuid_col AS UUID) < UUID('abc-123')
(If DB lacks UUID type → application-side evaluation or reject)

WHERE contact == Contact::Phone { number: "123" }, coproduct encoding:

LHS is FieldRef → coproduct encoding, columns: disc, phone_number, email_address
RHS is literal → encode: (0, "123", NULL)
Expand per-column:
- disc = 0 (can_distribute(i64, ==) → Some(=))
- phone_number = '123' (can_distribute(nullable text, ==) → Some(=))
- email_address IS NULL (can_distribute(nullable text, ==) → Some(IS))
Output: disc = 0 AND phone_number = '123' AND email_address IS NULL

Schema Representation

Each field’s mapping is stored as a structured Bijection tree. This is the single source of truth — encode/decode expressions are derived from it.

Bijection enum

#![allow(unused)]
fn main() {
enum Bijection {
    /// No transformation — field type == column type.
    Identity,

    /// Lossless cast between two types with the same information content.
    /// e.g., UUID↔text, Timestamp↔i64, integer widening.
    Cast { from: Type, to: Type },

    /// x*k + c (k ≠ 0). Inverse: (x - c) / k.
    Affine { k: Value, c: Value },

    /// Option<T> → nullable column.
    /// Wraps an inner bijection with None↔NULL.
    Nullable(Box<Bijection>),

    /// Embedded struct → multiple columns.
    /// Each component is an independent bijection on one field↔column pair.
    Product(Vec<Bijection>),

    /// Enum → discriminant column + per-variant columns.
    Coproduct {
        discriminant: Box<Bijection>,
        variants: Vec<CoproductArm>,
    },

    /// Composition: apply `inner` first, then `outer`.
    /// encode = outer.encode(inner.encode(x))
    /// decode = inner.decode(outer.decode(x))
    Compose {
        inner: Box<Bijection>,
        outer: Box<Bijection>,
    },
}

struct CoproductArm {
    discriminant_value: Value,
    body: Bijection, // typically Product for data-carrying variants
}
}

Methods on Bijection

#![allow(unused)]
fn main() {
impl Bijection {
    /// Encode a model-level value to a storage-level value.
    fn encode(&self, value: Value) -> Value;

    /// Produce a decode expression: given a column reference (or tuple of
    /// column references), return a model-level expression.
    fn decode(&self, column_expr: Expr) -> Expr;

    /// Query whether `model_op` can be pushed through this encoding.
    /// Returns the storage-level operator to use, or None if the
    /// operation must fall back to model space.
    fn can_distribute(&self, model_op: BinaryOp) -> Option<StorageOp>;

    /// Number of columns this bijection spans.
    fn column_count(&self) -> usize;
}
}

can_distribute is defined recursively:

Identity: always Some(model_op) — no transformation.
Cast: lookup in the per-pair homomorphism table.
Affine: == → Some(=). < → Some(<) if k > 0, None if k < 0.
Nullable: delegates to inner, may change op (e.g., == → IS NOT DISTINCT FROM).
Product: == → Some(=) if all components return Some. < → only if lexicographic and all components support <.
Coproduct: == → Some if discriminant + each arm returns Some. < → generally None.
Compose: Some only if both inner and outer return Some.

Per-field mapping

#![allow(unused)]
fn main() {
struct FieldMapping {
    bijection: Bijection,
    columns: Vec<ColumnId>, // columns this field maps to (1 for primitive, N for product/coproduct)
}
}

The model-level mapping::Model holds a FieldMapping per field, plus a reverse index from columns to fields (for computed field propagation).

Verification

The framework should be formally verified using Lean 4 + Mathlib. Mathlib already provides the algebraic vocabulary (bijections, homomorphisms, products, coproducts). The plan:

Define the primitives and composition rules in Lean
Prove the general theorems once (composition preserves bijection, coproduct conditions, etc.)
For each concrete primitive, state and prove its homomorphism properties
Lean checks everything mechanically

Engine-Level Pagination Design

Overview

This document describes the implementation of engine-level pagination in Toasty. The key principle is that pagination logic (limit+1 strategy, cursor extraction, etc.) should be handled by the engine, not in application-level code. This allows the engine to leverage database-specific capabilities (e.g., DynamoDB’s native cursor support) while providing compatibility for databases that don’t have native support (e.g., SQL databases).

Architecture Context

Statement System

toasty_core::stmt::Statement represents a superset of SQL - “Toasty-flavored SQL”
Contains both SQL concepts AND Toasty application-level concepts (models, paths, pagination)
Limit::PaginateForward is a Toasty-level concept that must be transformed by the engine before reaching SQL generation
By the time statements reach toasty-sql, they must contain ONLY valid SQL

Engine Pipeline

Planner: Transforms Toasty statements into a pipeline of actions
Actions: Executed by the engine, store results in VarStore
VarStore: Stores intermediate results between pipeline steps
ExecResponse: Final result containing values and optional metadata

Existing Patterns

eval::Func: Pre-computed transformations that execute during pipeline execution
partition_returning: Separates database-handled expressions from in-memory evaluations
Output::project: Transforms raw database results before storing in VarStore

Design

Core Types

#![allow(unused)]
fn main() {
// In engine.rs
pub struct ExecResponse {
    pub values: ValueStream,
    pub metadata: Option<Metadata>,
}

pub struct Metadata {
    pub next_cursor: Option<Expr>,
    pub prev_cursor: Option<Expr>,
    pub query: Query,
}

// In engine/plan/exec_statement.rs
pub struct ExecStatement {
    pub input: Option<Input>,
    pub output: Option<Output>,
    pub stmt: stmt::Statement,
    pub conditional_update_with_no_returning: bool,
    
    /// Pagination configuration for this query
    pub pagination: Option<Pagination>,
}

pub struct Pagination {
    /// Original limit before +1 transformation
    pub limit: u64,
    
    /// Function to extract cursor from a row
    /// Takes row as arg[0], returns cursor value(s)
    pub extract_cursor: eval::Func,
}
}

VarStore Changes

The VarStore needs to be updated to store ExecResponse instead of ValueStream:

#![allow(unused)]
fn main() {
pub(crate) struct VarStore {
    slots: Vec<Option<ExecResponse>>,
}
}

This allows pagination metadata to flow through the pipeline and be returned from engine::exec.

Implementation Plan

Phase 1: Update VarStore to ExecResponse [Mechanical Change]

This phase is a purely mechanical change to update the VarStore infrastructure. No pagination logic yet.

Update VarStore (engine/exec/var_store.rs):
- Change storage type from ValueStream to ExecResponse
- Update load() to return ExecResponse
- Update store() to accept ExecResponse
- Update dup() to clone entire ExecResponse (including metadata)
Update all action executors to wrap their results in ExecResponse:
- For now, all actions will use metadata: None
- Each action’s result becomes: ExecResponse { values, metadata: None }
- Actions to update:
  - action_associate
  - action_batch_write
  - action_delete_by_key
  - action_exec_statement
  - action_find_pk_by_index
  - action_get_by_key
  - action_insert
  - action_query_pk
  - action_update_by_key
  - action_set_var
Update pipeline execution (engine/exec.rs):
- exec_pipeline returns ExecResponse
- Handle VarStore returning ExecResponse
Update main engine (engine.rs):
- exec::exec now returns ExecResponse directly
- Remove the temporary wrapping logic

This phase establishes the infrastructure without any behavioral changes. All existing tests should continue to pass.

Phase 2: Add Pagination to ExecStatement [Task 2]

Add Pagination struct to engine/plan/exec_statement.rs
Add pagination: Option<Pagination> field to ExecStatement
No execution changes yet - just the structure

Phase 3: Planner Support for SQL Pagination [Task 3]

In planner/select.rs, add pagination planning logic:

#![allow(unused)]
fn main() {
impl Planner<'_> {
    fn plan_select_sql(...) {
        // ... existing logic ...
        
        // Handle pagination
        let pagination = if let Some(Limit::PaginateForward { limit, after }) = &stmt.limit {
            Some(self.plan_pagination(&mut stmt, &mut project, limit)?)
        } else {
            None
        };
        
        self.push_action(plan::ExecStatement {
            input,
            output: Some(plan::Output { var: output, project }),
            stmt: stmt.into(),
            conditional_update_with_no_returning: false,
            pagination,
        });
    }
    
    fn plan_pagination(
        &mut self,
        stmt: &mut stmt::Query,
        project: &mut eval::Func,
        limit_expr: &stmt::Expr,
    ) -> Result<Pagination> {
        let original_limit = self.extract_limit_value(limit_expr)?;
        
        // Get ORDER BY clause (required for pagination)
        let order_by = stmt.order_by.as_ref()
            .ok_or_else(|| anyhow!("Pagination requires ORDER BY"))?;
        
        // Check if ORDER BY is unique
        let is_unique = self.is_order_by_unique(order_by, stmt);
        
        // If not unique, append primary key as tie-breaker
        if !is_unique {
            self.append_pk_to_order_by(stmt)?;
        }
        
        // Ensure ORDER BY fields are in returning clause
        let (added_indices, original_field_count) = 
            self.ensure_order_by_in_returning(stmt)?;
        
        // Build cursor extraction function
        let extract_cursor = self.build_cursor_extraction_func(
            stmt,
            &added_indices,
        )?;
        
        // Modify project function if we added fields
        if !added_indices.is_empty() {
            self.adjust_project_for_pagination(
                project,
                original_field_count,
                added_indices.len(),
            );
        }
        
        // Transform limit to +1 for next page detection
        *stmt.limit.as_mut().unwrap() = Limit::Offset {
            limit: (original_limit + 1).into(),
            offset: None,
        };
        
        Ok(Pagination {
            limit: original_limit,
            extract_cursor,
        })
    }
}
}

Key helper methods:

is_order_by_unique: Checks if ORDER BY fields form a unique constraint
append_pk_to_order_by: Adds primary key as tie-breaker
ensure_order_by_in_returning: Adds ORDER BY fields to SELECT if missing
build_cursor_extraction_func: Creates eval::Func to extract cursor
adjust_project_for_pagination: Modifies project to filter out added fields

Phase 4: Executor Implementation [Task 4]

In engine/exec/exec_statement.rs:

#![allow(unused)]
fn main() {
impl Exec<'_> {
    pub(super) async fn action_exec_statement(
        &mut self,
        action: &plan::ExecStatement,
    ) -> Result<()> {
        // ... existing logic to execute statement ...
        
        let res = if let Some(pagination) = &action.pagination {
            self.handle_paginated_query(res, pagination, &action.stmt).await?
        } else {
            ExecResponse {
                values: /* normal value stream */,
                metadata: None,
            }
        };
        
        self.vars.store(out.var, res);
        Ok(())
    }
    
    async fn handle_paginated_query(
        &mut self,
        rows: Rows,
        pagination: &Pagination,
        stmt: &Statement,
    ) -> Result<ExecResponse> {
        // Collect limit+1 rows
        let mut buffer = Vec::new();
        let mut count = 0;
        
        match rows {
            Rows::Values(stream) => {
                for await value in stream {
                    buffer.push(value?);
                    count += 1;
                    if count > pagination.limit {
                        break;
                    }
                }
            }
            _ => return Err(anyhow!("Pagination requires row results")),
        }
        
        // Check if there's a next page
        let has_next = buffer.len() > pagination.limit as usize;
        
        // Extract cursor if there's a next page
        let next_cursor = if has_next {
            // Get cursor from the LAST item we're keeping
            let last_kept = &buffer[pagination.limit as usize - 1];
            let cursor_value = pagination.extract_cursor.eval(&[last_kept.clone()])?;
            
            // Truncate buffer to requested limit
            buffer.truncate(pagination.limit as usize);
            
            Some(stmt::Expr::Value(cursor_value))
        } else {
            None
        };
        
        Ok(ExecResponse {
            values: ValueStream::from_vec(buffer),
            metadata: Some(Metadata {
                next_cursor,
                prev_cursor: None, // TODO: implement in future
                query: stmt.as_query().cloned().unwrap_or_default(),
            }),
        })
    }
}
}

Phase 5: Clean Up Application Layer [Task 5]

Remove the limit+1 logic from Paginate::collect:

#![allow(unused)]
fn main() {
pub async fn collect(self, db: &Db) -> Result<Page<M>> {
    // Simply delegate to db.paginate - engine handles pagination
    db.paginate(self.query).await
}
}

Update Db::paginate to use the metadata from ExecResponse:

#![allow(unused)]
fn main() {
pub async fn paginate<M: Model>(&self, statement: stmt::Select<M>) -> Result<Page<M>> {
    let exec_response = engine::exec(self, statement.untyped.clone().into()).await?;
    
    // Convert value stream to models
    let mut cursor = Cursor::new(self.schema.clone(), exec_response.values);
    let mut items = Vec::new();
    while let Some(item) = cursor.next().await {
        items.push(item?);
    }
    
    // Extract pagination metadata
    let (next_cursor, prev_cursor) = match exec_response.metadata {
        Some(metadata) => (metadata.next_cursor, metadata.prev_cursor),
        None => (None, None),
    };
    
    Ok(Page::new(items, statement, next_cursor, prev_cursor))
}
}

Key Design Decisions

Single Source of Truth: The extract_cursor function is the only place that knows how to extract cursors. No redundant order_by_indices.
Type Safety: Cursor extraction function uses actual inferred types from the schema, not Type::Any.
Automatic Tie-Breaking: The planner automatically appends primary key to ORDER BY when needed for uniqueness.
Transparent Field Addition: ORDER BY fields are added to returning clause transparently, and filtered out via the project function.
Metadata Threading: ExecResponse flows through VarStore, preserving metadata through the pipeline.

Testing Strategy

Unit Tests: Test cursor extraction function generation
Integration Tests: Test pagination with various ORDER BY configurations
Database Tests: Ensure SQL generation is correct (no PaginateForward in SQL)
End-to-End Tests: Verify pagination works across different databases

Future Enhancements

Previous Page Support: Implement prev_cursor extraction and PaginateBackward
DynamoDB Native Pagination: Leverage LastEvaluatedKey instead of limit+1
Complex ORDER BY: Support expressions beyond simple column references
Optimization: Cache cursor extraction functions for common patterns

Serialized Field Implementation Design

Builds on the #[serialize] bookkeeping already in place (attribute parsing, SerializeFormat enum, FieldPrimitive.serialize field). This document covers the runtime serialization/deserialization codegen.

User-Facing API

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: uuid::Uuid,

    name: String,

    #[serialize(json)]
    tags: Vec<String>,

    // nullable: the column may be NULL. The Rust type must be Option<T>.
    // None maps to NULL; Some(v) is serialized as JSON.
    #[serialize(json, nullable)]
    metadata: Option<HashMap<String, String>>,

    // Non-nullable Option: the entire Option value is serialized as JSON.
    // Some(v) → `v` as JSON, None → `null` as JSON text (column is NOT NULL).
    #[serialize(json)]
    extra: Option<String>,
}
}

Fields annotated with #[serialize(json)] are stored as JSON text in a single database column. The field’s Rust type must implement serde::Serialize and serde::DeserializeOwned. The database column type defaults to String/TEXT.

Nullability

By default, serialized fields are not nullable. The entire Rust value — including Option<T> — is serialized as-is into JSON text stored in a NOT NULL column. This means None becomes the JSON text null, and Some(v) becomes the JSON serialization of v.

To make the database column nullable, add nullable to the attribute: #[serialize(json, nullable)]. When nullable is set:

The Rust type must be Option<T>.
None maps to a SQL NULL (no value stored).
Some(v) serializes v as JSON text.

This is an explicit opt-in because the two behaviors are meaningfully different: a user may legitimately want to serialize None as JSON null text in a NOT NULL column (e.g., for a JSON API field where null is a valid value distinct from “no row”).

Value Encoding

A serialized field stores a JSON string in the database. The value stream uses Value::String for serialized fields, not the field’s logical Rust type.

Rust value ──serde_json::to_string──► Value::String(json) ──► DB column (TEXT)
DB column (TEXT) ──► Value::String(json) ──serde_json::from_str──► Rust value

Schema Changes

For serialized fields, field_ty bypasses <T as Primitive>::field_ty() and constructs FieldPrimitive directly with ty: Type::String. The user’s Rust type T does not need to implement Primitive — it only needs Serialize + DeserializeOwned.

Nullability is determined by the nullable flag in the attribute, not by inspecting the Rust type.

Remove `serialize` from `Primitive::field_ty`

Today Primitive::field_ty accepts a serialize argument so it can thread SerializeFormat into the FieldPrimitive it builds. With this design, serialized fields never go through Primitive::field_ty — codegen constructs the FieldPrimitive directly. That means the serialize parameter is dead for all callers and should be removed.

#![allow(unused)]
fn main() {
// Primitive trait (before):
fn field_ty(
    storage_ty: Option<db::Type>,
    serialize: Option<SerializeFormat>,
) -> FieldTy;

// Primitive trait (after):
fn field_ty(storage_ty: Option<db::Type>) -> FieldTy;
}

The default implementation drops the serialize field from the constructed FieldPrimitive (it is always None when going through the trait). Embedded type overrides (Embed, enum) already ignore both parameters.

Codegen changes:

#![allow(unused)]
fn main() {
// Non-serialized field (calls through the trait):
field_ty = quote!(<#ty as Primitive>::field_ty(#storage_ty));
nullable = quote!(<#ty as Primitive>::NULLABLE);

// Serialized field (constructed directly):
field_ty = quote!(FieldTy::Primitive(FieldPrimitive {
    ty: Type::String,
    storage_ty: #storage_ty,
    serialize: Some(SerializeFormat::Json),
}));
nullable = #serialize_nullable; // literal bool from attribute
}

No type-level hack is needed — the nullable flag is parsed from the attribute at macro expansion time and threaded through to schema registration as a literal bool.

Codegen Changes

`Primitive::load` / `Model::load`

For serialized fields, the generated load code reads a String from the record and deserializes it. The behavior depends on whether nullable is set:

#![allow(unused)]
fn main() {
// Non-nullable (default) — works for any T including Option<T>:
field_name: {
    let json_str = <String as Primitive>::load(record[i].take())?;
    serde_json::from_str(&json_str)
        .map_err(|e| Error::from_args(
            format_args!("failed to deserialize field '{}': {}", "field_name", e)
        ))?
},

// Nullable (#[serialize(json, nullable)]) — T must be Option<U>:
field_name: {
    let value = record[i].take();
    if value.is_null() {
        None
    } else {
        let json_str = <String as Primitive>::load(value)?;
        Some(serde_json::from_str(&json_str)
            .map_err(|e| Error::from_args(
                format_args!("failed to deserialize field '{}': {}", "field_name", e)
            ))?)
    }
},
}

Non-serialized fields are unchanged: <T as Primitive>::load(record[i].take())?.

Reload (root model and embedded)

Reload match arms follow the same pattern: load as String, then deserialize. For nullable fields, check null first.

Create builder setters

Serialized field setters accept the concrete Rust type (not impl IntoExpr<T>, since T does not implement IntoExpr) and serialize to a String expression:

#![allow(unused)]
fn main() {
// Non-nullable (default) — accepts T directly (including Option<T>):
pub fn field_name(mut self, field_name: FieldType) -> Self {
    let json = serde_json::to_string(&field_name).expect("failed to serialize");
    self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
    self
}

// Nullable (#[serialize(json, nullable)]) — accepts Option<InnerType>:
pub fn field_name(mut self, field_name: Option<InnerType>) -> Self {
    match &field_name {
        Some(v) => {
            let json = serde_json::to_string(v).expect("failed to serialize");
            self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
        }
        None => {
            self.stmt.set(index, Expr::<String>::from_value(Value::Null));
        }
    }
    self
}
}

Update builder setters

Same pattern as create: accept the concrete type, serialize to JSON, store as String expression.

Dependencies

serde_json is added as an optional dependency of the toasty crate, gated behind the existing serde feature:

# crates/toasty/Cargo.toml
[features]
serde = ["dep:serde_core", "dep:serde_json"]

[dependencies]
serde_json = { workspace = true, optional = true }

Generated code references serde_json through the codegen support module:

#![allow(unused)]
fn main() {
// crates/toasty/src/lib.rs, in codegen_support
#[cfg(feature = "serde")]
pub use serde_json;
}

If a user writes #[serialize(json)] without enabling the serde feature, the generated code fails to compile because codegen_support::serde_json does not exist. The compiler error points at the generated serde_json::from_str call.

Files Modified

File	Change
`crates/toasty/Cargo.toml`	Add `serde_json` optional dep, update `serde` feature
`crates/toasty/src/lib.rs`	Re-export `serde_json` in `codegen_support`
`crates/toasty/src/stmt/primitive.rs`	Remove `serialize` param from `Primitive::field_ty`
`crates/toasty-macros/src/schema/field.rs`	Parse `nullable` flag from `#[serialize(...)]` attribute
`crates/toasty-macros/src/expand.rs`	Update `Embed`/enum `field_ty` overrides to drop `serialize` param
`crates/toasty-macros/src/expand/schema.rs`	Construct `FieldPrimitive` directly for serialized fields; remove `serialize` arg from non-serialized `field_ty` call
`crates/toasty-macros/src/expand/embedded_enum.rs`	Drop `serialize` arg from `field_ty` call
`crates/toasty-macros/src/expand/model.rs`	Deserialize in `expand_load_body()` and `expand_embedded_reload_body()`
`crates/toasty-macros/src/expand/create.rs`	Serialize in create setter for serialized fields
`crates/toasty-macros/src/expand/update.rs`	Serialize in update setter, deserialize in reload arms
`crates/toasty-driver-integration-suite/Cargo.toml`	Add `serde`, `serde_json` deps, enable `serde` feature
`crates/toasty-driver-integration-suite/src/tests/serialize.rs`	Integration tests

Integration Tests

New file serialize.rs in the driver integration suite. Test cases:

Round-trip a Vec<String> field through create and read-back
Round-trip a nullable Option<T> field with Some and None (SQL NULL) values
Non-nullable Option<T> field: None round-trips as JSON null text (not SQL NULL)
Update a serialized field and verify the new value persists
Round-trip a custom struct with serde::Serialize + DeserializeOwned

Static Assertions for `create!` Required Fields

The create! macro does not check that all required fields are specified. Missing a required field compiles successfully but fails at runtime when the database rejects a NULL value in a NOT NULL column. This design adds compile-time checking so that omitting a required field is a compilation error.

Problem

Given these models:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: Id<User>,
    name: String,
    #[has_many]
    todos: HasMany<Todo>,
}

#[derive(Model)]
struct Todo {
    #[key]
    #[auto]
    id: Id<Todo>,
    #[index]
    user_id: Id<User>,
    #[belongs_to(key = user_id, references = id)]
    user: BelongsTo<User>,
    title: String,
}
}

This compiles today but panics at runtime:

#![allow(unused)]
fn main() {
// Missing `name` — no compile error
let user = toasty::create!(User { }).exec(&mut db).await?;
}

Approach

Per-level validation with monomorphization

Each model carries a flat CreateMeta constant that lists only its own fields — no pointers to other models’ metadata. Validation happens one nesting level at a time, using the compiler’s type inference at each level to resolve the target model.

This avoids const evaluation cycles entirely. A const in Rust must be fully evaluated before it exists, so if User::CREATE_META contained a &'static reference to Todo::CREATE_META and vice versa, the compiler would detect a cycle and reject it. By keeping each model’s metadata flat and resolving cross-model references at each nesting level through monomorphization, no model’s const ever needs to reference another.

`CreateMeta` struct

A simple struct in toasty::schema::create_meta (re-exported through codegen_support) describes the fields a model exposes on creation:

#![allow(unused)]
fn main() {
pub struct CreateMeta {
    pub fields: &'static [CreateField],
    pub model_name: &'static str,
}

pub struct CreateField {
    pub name: &'static str,
    pub required: bool,
}
}

Each field’s required flag is computed at compile time using the Field::NULLABLE trait constant, so the proc macro does not need to parse Option<T> syntactically:

#![allow(unused)]
fn main() {
// generated by #[derive(Model)]
const CREATE_META: CreateMeta = CreateMeta {
    fields: &[
        CreateField { name: "name", required: !<String as Field>::NULLABLE },
        CreateField { name: "bio",  required: !<Option<String> as Field>::NULLABLE },
    ],
    model_name: "User",
};
}

<String as Field>::NULLABLE is false, so required is true. <Option<String> as Field>::NULLABLE is true, so required is false.

A const fn helper performs the actual checking:

#![allow(unused)]
fn main() {
pub const fn assert_create_fields(meta: &CreateMeta, provided: &[&str]) {
    // panics at compile time listing the missing field
}
}

This uses byte-level string comparison (str::as_bytes() in a while loop) since const fn cannot call trait methods like PartialEq.

`ValidateCreate` trait

A #[doc(hidden)] trait carries the CreateMeta reference. This trait is the single mechanism used for validation at every level — typed creates, scoped creates, and nested creates all use it through monomorphization:

#![allow(unused)]
fn main() {
#[doc(hidden)]
pub trait ValidateCreate {
    const CREATE_META: &'static CreateMeta;
}
}

The derive macro generates ValidateCreate impls for:

Fields structs (UserFields<Origin>, TodoFieldsList<Origin>) — so that nested field accessors like User::fields().todos() return a type that carries the target model’s metadata.
Relation scope types (Many, One, OptionOne) — so that scoped expressions like user.todos() return a type that carries the target model’s metadata.

Each impl simply references the target model’s CREATE_META:

#![allow(unused)]
fn main() {
// On the fields struct for Todo (generated by derive)
impl<__Origin> ValidateCreate for TodoFieldsList<__Origin> {
    const CREATE_META: &'static CreateMeta = &Todo::CREATE_META;
}

// On the relation scope type (generated by derive)
impl ValidateCreate for Many {
    const CREATE_META: &'static CreateMeta = &Todo::CREATE_META;
}
}

Because ValidateCreate is separate from Scope and Model, it carries no other obligations and can be implemented on any generated type without affecting the existing trait hierarchy.

`Model` trait

CREATE_META remains an associated constant on Model as well. This is the canonical owned constant — the ValidateCreate impls reference it:

#![allow(unused)]
fn main() {
pub trait Model {
    // ...existing associated types and methods...
    const CREATE_META: CreateMeta;
}
}

CREATE_META is removed from the Scope trait. Scoped validation now goes through ValidateCreate instead.

Which fields are included

CreateMeta.fields contains all primitive fields that are:

Not #[auto]
Not #[default(...)]
Not #[update(...)]

Each of these fields has required set to !<T as Field>::NULLABLE, so Option<T> fields are included but marked as not required.

These fields are always excluded from the list entirely:

Relation fields (BelongsTo, HasMany, HasOne)
FK source fields (fields referenced by a #[belongs_to(key = ...)] on the same model)

FK source fields are excluded from CreateMeta.fields because in a top-level create they are set implicitly when you provide the BelongsTo relation. In a nested or scoped create the parent context fills them in.

For the models above:

Model	Required	Not required	Excluded
`User`	`name`		`id` (auto), `todos` (relation)
`Todo`	`title`		`id` (auto), `user_id` (FK source), `user` (relation)

File layout

crates/toasty/src/schema/create_meta.rs  — CreateMeta, CreateField, const fn helpers
crates/toasty/src/schema.rs              — pub mod create_meta; pub use ...
crates/toasty/src/lib.rs                 — codegen_support re-exports

Typed creates

create!(User { name: "Alice" }) expands to:

#![allow(unused)]
fn main() {
{
    const _CREATE: () = toasty::codegen_support::assert_create_fields(
        &<User as toasty::codegen_support::Model>::CREATE_META,
        &["name"],
    );
    User::create().name("Alice")
}
}

The const _CREATE: () block forces compile-time evaluation. If assert_create_fields panics, the compiler reports the panic message as an error at the create! call site.

Scoped creates

create!(in user.todos() { title: "buy milk" }) is harder because the macro does not know the scope type — it only has the expression user.todos().

The workaround uses monomorphization-time const evaluation. The macro generates a local generic struct bounded on ValidateCreate whose associated constant contains the assertion, then forces monomorphization by calling a helper function that infers the type from the expression:

#![allow(unused)]
fn main() {
{
    let __scope = user.todos();

    struct __Check<__S: toasty::codegen_support::ValidateCreate>(
        std::marker::PhantomData<__S>,
    );
    impl<__S: toasty::codegen_support::ValidateCreate> __Check<__S> {
        const __ASSERT: () = toasty::codegen_support::assert_create_fields(
            __S::CREATE_META,
            &["title"],
        );
    }
    fn __force_check<__S: toasty::codegen_support::ValidateCreate>(_: &__S) {
        let _ = __Check::<__S>::__ASSERT;
    }
    __force_check(&__scope);

    let __scope_fields = toasty::codegen_support::scope_fields(&__scope);
    __scope.create().title("buy milk")
}
}

This works because user.todos() returns a type (e.g. todo::Many) that implements ValidateCreate. When the compiler monomorphizes __Check<todo::Many>::__ASSERT, it evaluates the const expression. If it panics, the error points at the create! call site. No unstable features required.

Nested creates

Nested creates use the same monomorphization trick, but through the fields structs rather than the scope expression. Consider:

#![allow(unused)]
fn main() {
create!(User { name: "Alice", todos: [{ title: "Do it" }] })
}

The create! macro expands this to:

#![allow(unused)]
fn main() {
{
    // Level 0: validate User's fields directly (type is known)
    const _CREATE: () = {
        toasty::codegen_support::assert_create_fields(
            &<User as toasty::codegen_support::Model>::CREATE_META,
            &["name", "todos"],
        );
    };

    let __fields = User::fields();

    // Level 1: validate Todo's fields via monomorphization
    // __fields.todos() returns TodoFieldsList<User>, which impls ValidateCreate
    {
        let __nested = __fields.todos();
        struct __Check<__S: toasty::codegen_support::ValidateCreate>(
            std::marker::PhantomData<__S>,
        );
        impl<__S: toasty::codegen_support::ValidateCreate> __Check<__S> {
            const __ASSERT: () = toasty::codegen_support::assert_create_fields(
                __S::CREATE_META,
                &["title"],
            );
        }
        fn __force<__S: toasty::codegen_support::ValidateCreate>(_: &__S) {
            let _ = __Check::<__S>::__ASSERT;
        }
        __force(&__nested);
    }

    User::create()
        .name("Alice")
        .todos([__fields.todos().create().title("Do it")])
}
}

The key: User::fields().todos() returns TodoFieldsList<User>, which implements ValidateCreate with CREATE_META = &Todo::CREATE_META. The monomorphization trick infers the concrete type and evaluates the const assertion for Todo’s fields.

Arbitrary nesting depth

Each nesting level is an independent const evaluation. For deeper nesting:

#![allow(unused)]
fn main() {
create!(User {
    name: "Alice",
    todos: [{
        title: "Do it",
        categories: [{ name: "Work" }]
    }]
})
}

The macro emits three independent validation blocks:

Level 0: assert_create_fields(&User::CREATE_META, &["name", "todos"]) — direct const, no monomorphization needed.
Level 1: monomorphize on User::fields().todos() (which is TodoFieldsList<User>, targeting Todo) to check ["title", "categories"].
Level 2: monomorphize on Todo::fields().categories() to check ["name"].

No model’s CREATE_META ever references another model’s CREATE_META. Each level resolves the target model through the type system at monomorphization time, not through &'static pointers at const evaluation time.

Why this avoids const cycles

The previous design embedded &'static CreateMeta pointers in a CreateNested struct, so User::CREATE_META contained a reference to Todo::CREATE_META and vice versa. This creates a const evaluation cycle: the compiler must fully evaluate a const before it exists, but evaluating User::CREATE_META requires Todo::CREATE_META which requires User::CREATE_META.

The new design eliminates cross-model references entirely:

#![allow(unused)]
fn main() {
// User::CREATE_META — only knows about User's own fields
const CREATE_META: CreateMeta = CreateMeta {
    fields: &[CreateField { name: "name", required: true }],
    model_name: "User",
};

// Todo::CREATE_META — only knows about Todo's own fields
const CREATE_META: CreateMeta = CreateMeta {
    fields: &[CreateField { name: "title", required: true }],
    model_name: "Todo",
};
}

Cross-model resolution happens at monomorphization time through ValidateCreate impls on the fields structs. Function definitions don’t create const evaluation cycles — only const definitions that reference each other do. So even for self-referential models:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Person {
    #[key] #[auto] id: Id<Person>,
    name: String,
    #[has_many]
    children: HasMany<Person>,
}
}

Person::CREATE_META contains only [CreateField { name: "name", ... }]. The derive generates ValidateCreate for PersonFieldsList<Origin> pointing at &Person::CREATE_META. When the create! macro validates a nested children: [{ name: "Kid" }], it monomorphizes through Person::fields().children() which returns PersonFieldsList<Person>, evaluating Person::CREATE_META — no cycle because Person::CREATE_META doesn’t reference itself.

Batch and tuple creates

TypedBatch (User::[{ name: "A" }, { name: "B" }]): Each item in the batch gets its own assertion since different items can specify different field sets.

Tuple ((User { name: "A" }, Todo { title: "x" })): Each element is a CreateItem and is checked independently.

Code generation changes

`#[derive(Model)]` changes

The derive macro generates:

CREATE_META on impl Model — a flat CreateMeta containing only the model’s own primitive fields (filtered as described in “Which fields are included”).
ValidateCreate impls on the fields structs (UserFields<Origin> and UserFieldsList<Origin>) referencing &<Model>::CREATE_META.
ValidateCreate impls on the relation scope types (Many, One, OptionOne) referencing &<Model>::CREATE_META.

The Scope trait no longer carries CREATE_META.

`create!` macro changes

The expand function in create/expand.rs emits validation at each nesting level:

Typed top-level: a plain const assertion using <Path as Model>::CREATE_META directly.
Scoped top-level: a monomorphization block bounded on ValidateCreate, inferring the type from the scope expression.
Each nested level: a monomorphization block bounded on ValidateCreate, inferring the type from the fields struct accessor (e.g. User::fields().todos()).

The macro walks the parsed FieldSet tree recursively, emitting one validation block per nesting level.

Example error messages

Missing a top-level field:

error[E0080]: evaluation panicked: missing required field `name` in create! for `User`
  --> src/main.rs:10:5
   |
10 |     toasty::create!(User { })
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^ evaluation of `_CREATE` failed inside this call

Missing a nested field:

error[E0080]: evaluation panicked: missing required field `title` in create! for `Todo`
  --> src/main.rs:12:5
   |
12 |     toasty::create!(User { name: "Alice", todos: [{ }] })
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ evaluation of `_CREATE` failed inside this call

Limitations and future work

Embedded model fields are not included in CreateMeta. Fields whose type implements Embed (via #[derive(Embed)]) are skipped because they are not FieldTy::Primitive. A future enhancement should include them.
#[serialize] fields are excluded because their Rust types (e.g. Vec<String>, HashMap<K,V>, custom structs) do not implement the Field trait, so <T as Field>::NULLABLE cannot be evaluated. A future enhancement could infer nullability syntactically or introduce a separate trait bound for serialized fields.
BelongsTo relation fields themselves are not checked. If you write create!(Todo { title: "x" }) without providing user or user_id, it compiles but fails at the database. A future enhancement could add disjunction checking (require user OR user_id in top-level creates). In nested and scoped creates this is not a problem because the parent context provides the FK.
Error messages include the field name but not a file/line pointer to the model definition. The Rust compiler’s error output shows the create! call site, which is the actionable location.

Database Enum Types

Overview

Embedded enums with string labels use the best available enum representation for the target database by default. On databases with native enum types, Toasty uses them. On databases without native enums, Toasty falls back to string columns with constraints where possible, or plain string columns as a last resort.

No annotation is needed to get this behavior — the simplest enum definition gets the best storage automatically:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
    Pending,
    Active,
    Done,
}
}

On PostgreSQL this creates a CREATE TYPE status AS ENUM type. On MySQL it uses an inline ENUM(...) column. On SQLite it uses a TEXT column with a CHECK constraint. On DynamoDB it stores a plain string.

Discriminant types

Toasty supports three discriminant storage strategies for embedded enums:

Enum definition	Storage strategy
String labels (default or explicit)	Native enum representation per backend
`#[column(type = varchar)]` or `#[column(type = text)]`	Plain string column, no DB-level enum enforcement
`#[column(variant = N)]` with integers	INTEGER column

Default: native enum

When an enum uses string labels (either default identifiers or explicit #[column(variant = "label")]), Toasty uses native enum storage:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
    Pending,          // label: 'pending'
    Active,           // label: 'active'
    Done,             // label: 'done'
}
}

This is equivalent to writing #[column(type = enum)] explicitly.

Opting out: plain string column

Use #[column(type = varchar)] or #[column(type = text)] to store the discriminant as a plain string column with no database-level enum type or constraint:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = text)]
enum Status {
    Pending,
    Active,
    Done,
}
}

This stores discriminants in a TEXT column. The database accepts any string value; Toasty is responsible for writing correct values. Use this when you need to interoperate with external tools that write directly to the table, or when you want to avoid database-level enum machinery for any reason.

Integer discriminants

Integer discriminants remain unchanged from existing behavior:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active,
    #[column(variant = 3)]
    Done,
}
}

This stores discriminants as an INTEGER column. Integer and string discriminants cannot be mixed in the same enum.

Variant labels

Toasty converts Rust variant identifiers to snake_case for database labels by default, following the same convention used for table and column names:

Rust variant	Default label
`Pending`	`'pending'`
`InProgress`	`'in_progress'`
`AlmostDone`	`'almost_done'`

Use #[column(variant = "label")] on individual variants to override the default:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
    #[column(variant = "pending")]
    Pending,
    #[column(variant = "active")]
    Active,
    #[column(variant = "done")]
    Done,
}
}

Explicit labels and defaults can coexist:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Status {
    #[column(variant = "in_progress")]
    InProgress,      // stored as 'in_progress' (explicit)
    Done,            // stored as 'done' (default snake_case)
}
}

Database Support

The default native enum strategy adapts to each backend’s capabilities:

Backend	Representation	Validation
PostgreSQL	`CREATE TYPE ... AS ENUM` (named type)	Database rejects invalid values
MySQL	Inline `ENUM('a', 'b', 'c')` column type	Database rejects invalid values
SQLite	`TEXT` column + `CHECK` constraint	Database rejects invalid values
DynamoDB	String attribute	No database-level validation (Toasty validates at the application level)

PostgreSQL

Toasty creates a standalone named type with CREATE TYPE ... AS ENUM and references it from column definitions.

MySQL

Toasty generates ENUM('a', 'b', 'c') as the column type. There is no standalone named type. When the same Rust enum is used in multiple tables, each table gets its own inline ENUM(...) definition.

SQLite

SQLite has no native enum type. Toasty stores the discriminant as a TEXT column with a CHECK constraint that restricts values to the declared labels:

CREATE TABLE tasks (
    id INTEGER PRIMARY KEY,
    status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'done'))
);

This gives database-level validation while remaining compatible with SQLite’s type system.

DynamoDB

DynamoDB has no column type system or constraint mechanism. Toasty stores the discriminant as a string attribute. Validation happens at the Toasty application level only — the database itself accepts any string value.

Generated SQL Schema

PostgreSQL

Toasty creates a PostgreSQL enum type named after the Rust enum in snake_case:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum OrderState {
    #[column(variant = "new")]
    New,
    #[column(variant = "shipped")]
    Shipped,
    #[column(variant = "delivered")]
    Delivered,
}
}

CREATE TYPE order_state AS ENUM ('new', 'shipped', 'delivered');

The discriminant column uses the enum type:

#![allow(unused)]
fn main() {
#[derive(toasty::Model)]
struct Order {
    #[key]
    #[auto]
    id: i64,
    state: OrderState,
}
}

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    state order_state NOT NULL
);

Customizing the PostgreSQL type name

To specify a custom name for the PostgreSQL enum type, use enum with a name argument in the #[column(type = ...)] attribute:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = enum("order_status"))]
enum OrderState {
    New,
    Shipped,
    Delivered,
}
}

CREATE TYPE order_status AS ENUM ('new', 'shipped', 'delivered');

Without this attribute, Toasty derives the type name from the Rust enum name in snake_case.

MySQL

MySQL enum types are defined inline on the column:

CREATE TABLE orders (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    state ENUM('new', 'shipped', 'delivered') NOT NULL
);

The enum("name") syntax is ignored on MySQL since there is no standalone type to name.

SQLite

SQLite uses a TEXT column with a CHECK constraint:

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    state TEXT NOT NULL CHECK (state IN ('new', 'shipped', 'delivered'))
);

Data-carrying enums

Data-carrying enums work the same way on all backends. The discriminant column uses the enum representation; variant fields remain as separate nullable columns:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = "email")]
    Email { address: String },
    #[column(variant = "phone")]
    Phone { country: String, number: String },
}
}

PostgreSQL:

CREATE TYPE contact_method AS ENUM ('email', 'phone');

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    contact contact_method NOT NULL,
    contact_email_address TEXT,
    contact_phone_country TEXT,
    contact_phone_number TEXT
);

MySQL:

CREATE TABLE users (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    contact ENUM('email', 'phone') NOT NULL,
    contact_email_address TEXT,
    contact_phone_country TEXT,
    contact_phone_number TEXT
);

SQLite:

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    contact TEXT NOT NULL CHECK (contact IN ('email', 'phone')),
    contact_email_address TEXT,
    contact_phone_country TEXT,
    contact_phone_number TEXT
);

Migrations

Creating a new enum

When a model with a string-label enum is first migrated, Toasty issues the appropriate DDL.

PostgreSQL:

CREATE TYPE status AS ENUM ('pending', 'active', 'done');
CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    status status NOT NULL
);

MySQL:

CREATE TABLE tasks (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    status ENUM('pending', 'active', 'done') NOT NULL
);

SQLite:

CREATE TABLE tasks (
    id INTEGER PRIMARY KEY,
    status TEXT NOT NULL CHECK (status IN ('pending', 'active', 'done'))
);

Label ordering

Database enum types have a declaration order that affects ORDER BY behavior. Toasty manages this order with two rules:

Initial creation: Labels are ordered by the Rust enum’s variant declaration order.
Subsequent migrations: Toasty preserves the existing label order from the previous schema snapshot. New variants are appended at the end. Reordering variants in the Rust source does not trigger any DDL and does not change the database label order.

This means the label order is a one-time decision made at creation. If you need to change the order later, you must do so manually outside of Toasty.

Adding a variant

Adding a new variant to the Rust enum:

#![allow(unused)]
fn main() {
// Before
enum Status { Pending, Active, Done }

// After
enum Status { Pending, Active, Done, Cancelled }
}

New variants are appended after all existing labels, regardless of where they appear in the Rust enum definition.

PostgreSQL:

ALTER TYPE status ADD VALUE 'cancelled';

MySQL:

ALTER TABLE tasks MODIFY COLUMN status
    ENUM('pending', 'active', 'done', 'cancelled') NOT NULL;

SQLite:

SQLite does not support ALTER TABLE ... ALTER COLUMN. Toasty uses its existing table recreation strategy (create new table, copy data, drop old, rename) to update the CHECK constraint with the new label list.

MySQL requires rewriting the full enum definition on every change. Both MySQL and SQLite rewrites are handled automatically, preserving the existing label order and appending the new label at the end.

Renaming a variant

Toasty does not support renaming enum variant labels. Changing a variant’s #[column(variant = "...")] label is a migration error. To rename a label, add the new variant, migrate existing data manually, then remove the old variant (once variant removal is supported).

Removing a variant

Toasty does not support removing enum variants. Removing a variant from the Rust enum while the label still exists in the database schema is a migration error. Destructive schema changes like this require a broader design for handling data loss scenarios and are out of scope for this feature.

Converting from integer discriminants

Switching an existing enum from #[column(variant = N)] (INTEGER) to string labels requires a migration that converts the column.

PostgreSQL:

CREATE TYPE status AS ENUM ('pending', 'active', 'done');
ALTER TABLE tasks
    ALTER COLUMN status TYPE status USING (
        CASE status
            WHEN 1 THEN 'pending'
            WHEN 2 THEN 'active'
            WHEN 3 THEN 'done'
        END
    )::status;

The integer-to-label mapping comes from the previous schema snapshot stored in the migration state.

MySQL:

ALTER TABLE tasks MODIFY COLUMN status
    ENUM('pending', 'active', 'done') NOT NULL;

MySQL’s MODIFY COLUMN handles the type change. For integer conversions, Toasty issues an intermediate step to map integers to their label strings before converting the column type.

Converting from plain string to native enum

Switching from #[column(type = text)] (plain string) to native enum storage (removing the type override) requires converting the column.

PostgreSQL:

CREATE TYPE status AS ENUM ('pending', 'active', 'done');
ALTER TABLE tasks
    ALTER COLUMN status TYPE status USING status::status;

MySQL:

ALTER TABLE tasks MODIFY COLUMN status
    ENUM('pending', 'active', 'done') NOT NULL;

SQLite uses its table recreation strategy to replace the TEXT column with a TEXT + CHECK column.

Querying

The query API is the same regardless of discriminant type. Toasty handles the type casting internally:

#![allow(unused)]
fn main() {
// All of these work identically across all discriminant types
Task::filter(Task::fields().status().eq(Status::Active))
Task::filter(Task::fields().status().is_pending())
Task::filter(Task::fields().status().ne(Status::Done))
Task::filter(Task::fields().status().in_list([Status::Pending, Status::Active]))
}

SQL generated for queries

Queries compare against the enum label as a string literal:

-- .eq(Status::Active)
SELECT * FROM tasks WHERE status = 'active';

-- .in_list([Status::Pending, Status::Active])
SELECT * FROM tasks WHERE status IN ('pending', 'active');

This works across all backends. On PostgreSQL and MySQL the database casts the string literal to the enum type automatically. On SQLite and DynamoDB the column is already a string.

Ordering

Toasty does not support ordering comparisons (>, <, etc.) on enum fields. The query API provides eq, ne, in_list, and variant checks only.

PostgreSQL and MySQL define a sort order for enum values based on their position in the type definition, not alphabetically. SQLite and DynamoDB sort enum columns as plain strings (lexicographic). Toasty does not expose or manage this ordering. Users who query the database directly should be aware that ORDER BY behavior on enum columns varies by backend.

Inserting

Inserts supply the label as a string literal on all backends:

INSERT INTO tasks (status) VALUES ('pending');

Compile-Time Validation

Condition	Result
All string or default labels	Valid (native enum storage)
`#[column(type = text)]` or `#[column(type = varchar)]`	Valid (plain string storage)
`#[column(variant = N)]` with integers	Valid (integer storage)
Mix of integer and string variant values	Compile error
Duplicate labels (including derived defaults)	Compile error
Empty string label `#[column(variant = "")]`	Compile error
Label longer than 63 bytes	Compile error (PostgreSQL’s `NAMEDATALEN` limit)

Portability

Native enum storage works across all backends. Each backend uses its best available representation (see Database Support). You can develop against SQLite locally and deploy to PostgreSQL or MySQL without changing the enum definition.

The difference between native enum storage and plain string storage (#[column(type = text)]) is that native enum adds database-level validation where the backend supports it. The stored values are string labels in both cases — there is no data incompatibility between them.

Shared enum types

Multiple models can reference the same enum.

On PostgreSQL, Toasty creates the CREATE TYPE once and reuses it across tables:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Priority { Low, Medium, High }

#[derive(toasty::Model)]
struct Task {
    #[key] #[auto] id: i64,
    priority: Priority,
}

#[derive(toasty::Model)]
struct Bug {
    #[key] #[auto] id: i64,
    priority: Priority,
}
}

PostgreSQL:

CREATE TYPE priority AS ENUM ('low', 'medium', 'high');

CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    priority priority NOT NULL
);

CREATE TABLE bugs (
    id BIGSERIAL PRIMARY KEY,
    priority priority NOT NULL
);

MySQL:

CREATE TABLE tasks (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    priority ENUM('low', 'medium', 'high') NOT NULL
);

CREATE TABLE bugs (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    priority ENUM('low', 'medium', 'high') NOT NULL
);

Toasty tracks that the PostgreSQL type already exists and does not attempt to create it twice during migrations. On MySQL each table carries its own inline definition.

Examples

Unit enum with defaults

#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
enum Color {
    Red,
    Green,
    Blue,
}

#[derive(Debug, toasty::Model)]
struct Widget {
    #[key]
    #[auto]
    id: i64,
    name: String,
    color: Color,
}
}

PostgreSQL:

CREATE TYPE color AS ENUM ('red', 'green', 'blue');

CREATE TABLE widgets (
    id BIGSERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    color color NOT NULL
);

-- Insert
INSERT INTO widgets (name, color) VALUES ('Sprocket', 'red');

-- Query
SELECT * FROM widgets WHERE color = 'green';

Unit enum with explicit labels

#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
enum Status {
    #[column(variant = "pending")]
    Pending,
    #[column(variant = "active")]
    Active,
    #[column(variant = "done")]
    Done,
}

#[derive(Debug, toasty::Model)]
struct Task {
    #[key]
    #[auto]
    id: i64,
    title: String,
    status: Status,
}
}

PostgreSQL:

CREATE TYPE status AS ENUM ('pending', 'active', 'done');

CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    status status NOT NULL
);

MySQL:

CREATE TABLE tasks (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    title TEXT NOT NULL,
    status ENUM('pending', 'active', 'done') NOT NULL
);

Unit enum with plain string storage

#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
#[column(type = text)]
enum Status {
    #[column(variant = "pending")]
    Pending,
    #[column(variant = "active")]
    Active,
    #[column(variant = "done")]
    Done,
}
}

-- Same on all SQL backends
CREATE TABLE tasks (
    id ... PRIMARY KEY,
    status TEXT NOT NULL
);

No enum type or CHECK constraint is created. The column is a plain TEXT.

Data-carrying enum

#![allow(unused)]
fn main() {
#[derive(Debug, PartialEq, toasty::Embed)]
enum ContactMethod {
    #[column(variant = "email")]
    Email { address: String },
    #[column(variant = "phone")]
    Phone { country: String, number: String },
}

#[derive(Debug, toasty::Model)]
struct User {
    #[key]
    #[auto]
    id: i64,
    name: String,
    contact: ContactMethod,
}
}

#![allow(unused)]
fn main() {
// Create
let user = User::create()
    .name("Alice")
    .contact(ContactMethod::Email { address: "alice@example.com".into() })
    .exec(&mut db)
    .await?;

// Query
let email_users = User::filter(User::fields().contact().is_email())
    .exec(&mut db)
    .await?;

// Update
user.update()
    .contact(ContactMethod::Phone {
        country: "US".into(),
        number: "555-0100".into(),
    })
    .exec(&mut db)
    .await?;
}

Toasty ORM - Development Roadmap

This roadmap outlines potential enhancements and missing features for the Toasty ORM.

Overview

Toasty is an easy-to-use ORM for Rust that supports both SQL and NoSQL databases. This roadmap documents potential future work and feature gaps.

Feature Areas

Composite Keys

Composite Key Support (partial implementation)

Composite foreign key optimization in query simplification
Composite PK handling in expression rewriting and IN-list operations
HasMany/BelongsTo relationships with composite foreign keys referencing composite primary keys
Junction table / many-to-many patterns with composite keys
DynamoDB driver: batch delete/update with composite keys, composite unique indexes
Comprehensive test coverage for all composite key combinations

Query Capabilities

Query Ordering, Limits & Pagination

Multi-column ordering convenience method (.then_by())
Direct .limit() method for non-paginated queries
.last() convenience method

Query Constraints & Filtering

String operations: contains, starts with, ends with, LIKE (partial AST support)
NOT IN
Case-insensitive matching
BETWEEN / range queries
Relation filtering (filter by associated model fields)
Field-to-field comparison
Arithmetic operations in queries (add, subtract, multiply, divide, modulo)
Aggregate queries and GROUP BY / HAVING

Data Types

Extended Data Types

Embedded struct & enum support (partial implementation)
Serde-serialized types (JSON/JSONB columns for arbitrary Rust types)
Embedded collections (arrays, maps, sets, etc.)

Relationships & Loading

Partial Model Loading

Allow models to have fields that are not loaded by default (e.g. a large body column on an Article model)
Fields opt-in via a #[deferred] attribute and must be wrapped in a Deferred<T> type
By default, queries skip deferred fields; callers opt-in with .include(Article::body) (same API as relation preloading)
Accessing a Deferred<T> that was not loaded either returns an error or panics with a clear message

Works with primitive types, embedded structs, and embedded enums — just a subset of columns in the same table

#![allow(unused)]
fn main() {
#[toasty::model]
struct Article {
    #[key]
    #[auto]
    id: u64,
    title: String,
    author: BelongsTo<User>,
    #[deferred]
    body: Deferred<String>,   // not loaded unless explicitly included
}

// Load metadata only (no body column fetched)
let articles = Article::all().collect(&db).await?;

// Load with body
let articles = Article::all().include(Article::body).collect(&db).await?;
}

Relationships

Many-to-many relationships
Polymorphic associations
Nested preloading (multi-level .include() support)

Query Building

Query Features

Subquery improvements
Better conditional/dynamic query building ergonomics

Database Function Expressions

Allow database-side functions (e.g. NOW(), CURRENT_TIMESTAMP) as expressions in create and update operations

User API: field setters accept toasty::stmt helpers like toasty::stmt::now() that resolve to core::stmt::ExprFunc variants

#![allow(unused)]
fn main() {
// Set updated_at to the database's current time instead of a Rust-side value
user.update()
    .updated_at(toasty::stmt::now())
    .exec(&db)
    .await?;

// Also usable in create operations
User::create()
    .name("Alice")
    .created_at(toasty::stmt::now())
    .exec(&db)
    .await?;
}

Extend ExprFunc enum in toasty-core with new function variants (e.g. Now)
SQL serialization for each function across supported databases (NOW() for PostgreSQL/MySQL, datetime('now') for SQLite)
Codegen: update field setter generation to accept both value types and function expressions
Future: support additional scalar functions (e.g. COALESCE, LOWER, UPPER, LENGTH)

Raw SQL Support

Execute arbitrary SQL statements directly
Parameterized queries with type-safe bindings
Raw SQL fragments within typed queries (escape hatch for complex expressions)

Data Modification

Upsert

Insert-or-update: atomic INSERT ... ON CONFLICT DO UPDATE (PostgreSQL/SQLite), ON DUPLICATE KEY UPDATE (MySQL), MERGE (SQL Server/Oracle)
Insert-or-ignore (DO NOTHING / INSERT IGNORE)
Conflict target: by column(s), by constraint name, partial indexes (PostgreSQL)
Column update control: update all non-key columns, named subset, or raw SQL expression
Access to the proposed row via EXCLUDED pseudo-table in the update expression
Bulk upsert (multi-row VALUES)
DynamoDB: PutItem (unconditional replace) vs. UpdateItem with condition expression

Mutation Result Information

Return affected row counts from update operations (how many records were updated)
Return affected row counts from delete operations (how many records were deleted)
Better result types that provide operation metadata
Distinguish between “no rows matched” vs “rows matched but no changes needed”

Transactions

Atomic Batch Operations

Cross-database atomic batch API
Supported across SQL and NoSQL databases
Type-safe operation batching
All-or-nothing semantics

SQL Transaction API

Manual transaction control for SQL databases
BEGIN/COMMIT/ROLLBACK support
Savepoints and nested transactions
Isolation level configuration

Schema Management

Migrations

Schema migration system
Migration generation
Rollback support
Schema versioning
CLI tools for schema management

Toasty Runtime Improvements

Concurrent Task Execution

Replace the current ad-hoc background task with a proper in-flight task manager
Execute independent parts of an execution plan concurrently
Track and coordinate multiple in-flight tasks within a single query execution

Cancellation & Cleanup

Detect when the caller drops the future representing query completion
Perform clean cancellation on drop (rollback any incomplete transactions)
Ensure no resource leaks or orphaned database state on cancellation

Internal Instrumentation & Metrics

Instrument time spent in each execution phase (planning, simplification, execution, serialization)
Track CPU time consumed by query planning to detect expensive plans
Provide internal metrics for diagnosing performance bottlenecks

Performance

Query Engine Optimization

Dedicated post-lowering optimization pass for expensive predicate analysis (run once, not per-node)
Equivalence classes for transitive constraint reasoning (a = b AND b = 5 implies a = 5)
Structured constraint representation (constant bindings, range bounds, exclusion sets)
Targeted predicate normalization without full DNF conversion

Stored Procedures (Pre-Compiled Query Plans)

Compile query plans once and execute them many times with different parameter values
Skip the full compilation pipeline (simplification, lowering, HIR/MIR planning) on repeated calls
Parameterized statement AST with Param slots for value substitution at execution time
Pairs with database-level prepared statements for end-to-end optimization

Optimization Features

Bulk inserts/updates
Query caching
Connection pooling improvements

Developer Experience

Ergonomic Macros

toasty::query!() - Succinct query syntax that translates to builder DSL

#![allow(unused)]
fn main() {
// Instead of: User::all().filter(...).order_by(...).collect(&db).await
toasty::query!(User, filter: ..., order_by: ...).collect(&db).await
}

toasty::create!() - Concise record creation syntax

#![allow(unused)]
fn main() {
// Instead of: User::create().name("Alice").age(30).exec(&db).await
toasty::create!(User, name: "Alice", age: 30).exec(&db).await
}

toasty::update!() - Simplified update syntax

#![allow(unused)]
fn main() {
// Instead of: user.update().name("Bob").age(31).exec(&db).await
toasty::update!(user, name: "Bob", age: 31).exec(&db).await
}

Tooling & Debugging

Query logging

Safety & Security

Sensitive Value Flagging

Flag sensitive fields (e.g. passwords, tokens, secrets) so they are automatically redacted in logs and debug output
Attribute-based opt-in: #[sensitive] on model fields marks values that must never appear in plaintext outside the database
All logging, query tracing, and error messages strip or mask flagged values
Prevents accidental credential leakage in application logs, query dumps, and diagnostics

Trusted vs Untrusted Input

Distinguish between values originating from untrusted user input and values produced internally by the query engine (e.g. literal numbers, generated keys)
Engine-produced values can skip escaping/parameterization since they are known-safe, reducing unnecessary overhead
Untrusted input continues to be parameterized or escaped to prevent SQL injection
Enables more efficient SQL generation without weakening safety guarantees for external data

Notes

The roadmap documents describe potential enhancements and missing features. For information about what’s currently implemented, refer to the user guide or test the API directly.

Composite Key Support

Overview

Toasty has partial composite key support. Basic CRUD operations work for models with composite primary keys (both field-level #[key] and model-level #[key(partition = ..., local = ...)]), but several engine optimizations, relationship patterns, and driver operations panic or fall back when encountering composite keys.

This document catalogs the gaps, surveys how other ORMs handle composite keys, identifies common SQL patterns that require composite key support, and proposes a phased implementation plan.

Current State

What Works

Schema definition — Two syntaxes for composite keys:

#![allow(unused)]
fn main() {
// Field-level: multiple #[key] attributes
#[derive(Debug, toasty::Model)]
struct Foo {
    #[key]
    one: String,
    #[key]
    two: String,
}

// Model-level: partition/local keys (designed for DynamoDB compatibility)
#[derive(Debug, toasty::Model)]
#[key(partition = user_id, local = id)]
struct Todo {
    #[auto]
    id: uuid::Uuid,
    user_id: uuid::Uuid,
    title: String,
}
}

Generated query methods for composite keys:

filter_by_<field1>_and_<field2>() — filter by both key fields
get_by_<field1>_and_<field2>() — get a single record by both keys
filter_by_<partition_field>() — filter by partition key alone
Comparison operators on local keys: gt(), ge(), lt(), le(), ne(), eq()

Database support:

SQL databases (SQLite, PostgreSQL, MySQL): composite primary keys via field-level #[key]
DynamoDB: partition/local key syntax (max 2 keys: 1 partition + 1 local)

Test coverage:

one_model_query — partition/local key queries with range operators
has_many_crud_basic::has_many_when_fk_is_composite — HasMany with composite FK (working)
embedded — composite keys with embedded struct fields
examples/composite-key/ — end-to-end example application

What Does Not Work

The following locations contain todo!(), assert!(), or panic!() that block composite key usage:

Engine Simplification (5 locations)

File	Line	Issue
`engine/simplify/expr_binary_op.rs`	23-25	`todo!("handle composite keys")` when simplifying equality on model references with composite PKs
`engine/simplify/expr_binary_op.rs`	43-45	`todo!("handle composite keys")` when simplifying binary ops on composite FK fields
`engine/simplify/expr_in_list.rs`	30-32	`todo!()` when optimizing IN-list expressions for models with composite PKs
`engine/simplify/lift_in_subquery.rs`	92-96	`assert_eq!(len, 1, "TODO: composite keys")` — subquery lifting restricted to single-field FKs
`engine/simplify/lift_in_subquery.rs`	109-111, 145-148, 154-157	Three more `todo!("composite keys")` in BelongsTo and HasOne subquery lifting
`engine/simplify/rewrite_root_path_expr.rs`	18-19	`todo!("composite primary keys")` when rewriting path expressions with key constraints

Engine Lowering (2 locations)

File	Line	Issue
`engine/lower/insert.rs`	90-92	`todo!()` when lowering inserts with BelongsTo relations that have composite FKs
`engine/lower.rs`	893-896	Unhandled else branch when lowering relationships with composite FKs

DynamoDB Driver (4 locations)

File	Line	Issue
`driver-dynamodb/op/update_by_key.rs`	197	`assert!(op.keys.len() == 1)` — batch update limited to single key
`driver-dynamodb/op/delete_by_key.rs`	119-121	`panic!("only 1 key supported so far")` — batch delete limited to single key
`driver-dynamodb/op/delete_by_key.rs`	33	`panic!("TODO: support more than 1 unique index")`
`driver-dynamodb/op/create_table.rs`	113	`assert_eq!(1, index.columns.len())` — composite unique indexes unsupported

Stubbed Tests (2 tests)

File	Test	Status
`has_many_crud_basic.rs`	`has_many_when_pk_is_composite`	Empty — not implemented
`has_many_crud_basic.rs`	`has_many_when_fk_and_pk_are_composite`	Empty — not implemented

Design Constraints

Auto-increment is intentionally forbidden with composite keys. The schema verifier rejects #[auto(increment)] on composite PK tables. UUID auto-generation is the supported alternative.
DynamoDB limits composite keys to 2 columns (1 partition + 1 local). This is a DynamoDB limitation, not a Toasty limitation.

How Other ORMs Handle Composite Keys

Rust ORMs

Diesel — First-class composite key support. #[diesel(primary_key(col1, col2))] on the struct; find() accepts a tuple (val1, val2); Identifiable returns a tuple reference. BelongsTo works with composite keys via explicit foreign_key attribute. Compile-time type checking through generated code.

SeaORM — Supports composite keys via multiple #[sea_orm(primary_key)] field attributes. PrimaryKeyTrait::ValueType is a tuple. find_by_id() and delete_by_id() accept tuples. DAO pattern works fully. Composite foreign keys are less ergonomic but functional.

Python ORMs

SQLAlchemy — Gold standard for composite key support. Multiple primary_key=True columns define a composite PK. session.get(Model, (a, b)) for lookups. ForeignKeyConstraint at the table level handles composite FKs cleanly. Identity map uses tuples. All features (eager/lazy loading, cascades, relationships) work uniformly with composite keys.

Django — Added CompositePrimaryKey in Django 5.2 (2025) after years of surrogate-key-only design. pk returns a tuple. Model.objects.get(pk=(1, 2)) works. Composite FK support is still limited. Ecosystem (admin, REST frameworks, third-party packages) is catching up.

Tortoise ORM — No composite PK support. Surrogate key + unique constraint is the only option.

JavaScript/TypeScript ORMs

Prisma — @@id([field1, field2]) defines composite PKs. Auto-generates compound field names (field1_field2) for findUnique/update/delete. Multi-field @relation(fields: [...], references: [...]) for composite FKs. Fully type-safe generated client.

TypeORM — Multiple @PrimaryColumn() decorators. All operations use object-based where clauses ({ field1: val1, field2: val2 }). @JoinColumn accepts an array for composite FKs. save() does upsert based on all PK fields.

Sequelize — Supports composite PK definition but findByPk() does not work with composite keys (must use findOne({ where })). Composite FK support requires workarounds or raw SQL.

Drizzle — primaryKey({ columns: [col1, col2] }) in the table config callback. foreignKey({ columns: [...], foreignColumns: [...] }) for composite FKs. No special find-by-PK method; all queries use explicit where + and(). SQL-first philosophy.

Java/Kotlin

Hibernate/JPA — Two approaches: @IdClass (flat fields + separate ID class) and @EmbeddedId (nested object). PK class must implement Serializable, equals(), hashCode(). @JoinColumns (plural) for composite FKs. @MapsId connects relationship fields to embedded ID fields. Full relationship support.

Exposed (Kotlin) — PrimaryKey(col1, col2) in the table object. Only the DSL (SQL-like) API supports composite keys; the DAO (EntityClass) API does not. Relationships require manual joins.

Go ORMs

GORM — Multiple gorm:"primaryKey" tags. Composite FKs via foreignKey:Col1,Col2;references:Col1,Col2. Zero-value problem: PK column with value 0 is treated as “not set.”

Ent — No composite PK support by design (graph semantics, every node has a single ID). Unique composite indexes are the workaround.

Ruby

ActiveRecord (Rails 7.1+) — primary_key: [:col1, :col2] in migrations, self.primary_key = [:col1, :col2] in model. find([a, b]) for lookups. query_constraints: [:col1, :col2] for composite FK associations. Pre-7.1 required the composite_primary_keys gem.

Cross-ORM Summary

ORM	Composite PK	Composite FK	Find by PK	Relationship Support
Diesel (Rust)	Yes	Yes	Tuple	Full
SeaORM (Rust)	Yes	Partial	Tuple	Full
SQLAlchemy (Python)	Yes	Yes	Tuple	Full
Django (Python)	5.2+	Limited	Tuple	Partial
Prisma (TS)	Yes	Yes	Generated compound	Full
TypeORM (TS)	Yes	Yes	Object	Full
Sequelize (JS)	Yes	Partial	Broken	Partial
Drizzle (TS)	Yes	Yes	Manual where	Manual
Hibernate/JPA	Yes	Yes	ID class	Full
GORM (Go)	Yes	Yes	Where clause	Full
ActiveRecord (Ruby)	7.1+	7.1+	Array	Partial

Key takeaway: Mature ORMs (Diesel, SQLAlchemy, Hibernate) treat composite keys as first-class citizens where all operations work uniformly. The most common API pattern is tuple-based identity (find((a, b))). Composite foreign keys are universally harder than composite PKs — even established ORMs have rougher edges there.

Common SQL Patterns Requiring Composite Keys

1. Junction Tables (Many-to-Many)

The most common use case. The junction table’s PK is the combination of FKs to both related tables.

CREATE TABLE enrollment (
    student_id INTEGER NOT NULL REFERENCES student(id),
    course_id INTEGER NOT NULL REFERENCES course(id),
    enrolled_at TIMESTAMP DEFAULT NOW(),
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)
);

Junction tables often accumulate extra attributes (grade, enrolled_at, role) that make them first-class entities requiring full CRUD support, not just a hidden link table.

Toasty gap: Many-to-many relationships are listed as a separate roadmap item. Composite key support is a prerequisite — junction tables are inherently composite-keyed.

2. Multi-Tenant Data Isolation

Tenant ID appears as the first column in every composite PK, enabling partition-level isolation and efficient tenant-scoped queries.

CREATE TABLE tenant_document (
    tenant_id UUID NOT NULL REFERENCES tenant(id),
    document_id UUID NOT NULL DEFAULT gen_random_uuid(),
    title TEXT NOT NULL,
    PRIMARY KEY (tenant_id, document_id)
);

-- All queries are scoped: WHERE tenant_id = $1 AND ...

Why composite PKs: Enforces isolation at the database level. PK index prefix enables efficient tenant-scoped queries. Maps directly to DynamoDB’s partition/local key model.

Toasty gap: The #[key(partition = ..., local = ...)] syntax already models this. The gaps are in relationship handling when both sides use composite keys.

3. Time-Series Data

CREATE TABLE sensor_reading (
    sensor_id INTEGER NOT NULL,
    recorded_at TIMESTAMP NOT NULL,
    value DOUBLE PRECISION NOT NULL,
    PRIMARY KEY (sensor_id, recorded_at)
);

Why composite PKs: Natural ordering by sensor then time. Range scans on recorded_at within a sensor are efficient. Supports table partitioning by time ranges.

4. Hierarchical Data (Closure Table)

CREATE TABLE category_closure (
    ancestor_id INTEGER NOT NULL REFERENCES category(id),
    descendant_id INTEGER NOT NULL REFERENCES category(id),
    depth INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (ancestor_id, descendant_id)
);

5. Composite Foreign Keys Referencing Composite PKs

A child table references a parent with a composite PK — all parent PK columns appear in the child as FK columns.

CREATE TABLE order_item (
    order_id INTEGER NOT NULL REFERENCES "order"(id),
    item_number INTEGER NOT NULL,
    PRIMARY KEY (order_id, item_number)
);

CREATE TABLE order_item_shipment (
    id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL,
    item_number INTEGER NOT NULL,
    shipment_id INTEGER NOT NULL REFERENCES shipment(id),
    FOREIGN KEY (order_id, item_number)
        REFERENCES order_item(order_id, item_number)
);

Toasty gap: This is the hardest pattern. The engine simplification and lowering layers assume single-field FKs in multiple places. Fixing this is the core of the composite key work.

6. Versioned Records

CREATE TABLE document_version (
    document_id INTEGER NOT NULL REFERENCES document(id),
    version INTEGER NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (document_id, version)
);

7. Composite Unique Constraints vs Composite Primary Keys

Some applications prefer a surrogate PK with a composite unique constraint:

-- Surrogate PK + composite unique
CREATE TABLE enrollment (
    id SERIAL PRIMARY KEY,
    student_id INTEGER NOT NULL,
    course_id INTEGER NOT NULL,
    UNIQUE (student_id, course_id)
);

Trade-offs: surrogate PKs simplify FKs (single column) and URL design, but composite PKs are more storage-efficient and semantically meaningful. ORMs that don’t support composite PKs (Django pre-5.2, Tortoise, Ent) force the surrogate pattern.

Toasty should support both patterns — composite PKs for direct use and composite unique constraints for the surrogate approach.

Implementation Plan

Phase 1: Engine Simplification — Composite PK/FK Handling

Fix the todo!() panics in the engine simplification layer so that queries involving composite keys pass through without crashing, even if not fully optimized.

Files:

engine/simplify/expr_binary_op.rs — Handle composite PKs and FKs in equality simplification. For composite keys, generate an AND of per-field comparisons.
engine/simplify/expr_in_list.rs — Handle IN-list for composite PKs. Generate (col1, col2) IN ((v1, v2), (v3, v4)) or equivalent AND/OR tree.
engine/simplify/rewrite_root_path_expr.rs — Rewrite path expressions for composite PKs.

Approach: Where a single-field operation currently destructures let [field] = &fields[..], extend to iterate over all fields and combine with AND expressions.

Phase 2: Subquery Lifting for Composite FKs

Extend the subquery lifting optimization to handle composite foreign keys in BelongsTo and HasOne relationships.

Files:

engine/simplify/lift_in_subquery.rs — Remove the assert_eq!(len, 1) and handle multi-field FKs. For the optimization path, generate AND of per-field comparisons. For the fallback IN subquery path, generate tuple-based IN expressions or multiple correlated conditions.

Approach: The existing single-field logic maps fk_field.source -> fk_field.target. For composite keys, do the same for each field pair and combine with AND.

Phase 3: Engine Lowering — Composite FK Relationships

Fix insert and relationship lowering to handle composite FKs.

Files:

engine/lower/insert.rs — When lowering BelongsTo in insert operations, set all FK fields from the related record’s PK fields, not just one.
engine/lower.rs — Handle composite FKs in relationship lowering. Generate multi-column join conditions.

Phase 4: DynamoDB Driver — Batch Operations with Composite Keys

Files:

driver-dynamodb/op/update_by_key.rs — Support batch updates with multiple keys (iterate and issue individual UpdateItem calls if needed).
driver-dynamodb/op/delete_by_key.rs — Support batch deletes. Remove the single-key panic.
driver-dynamodb/op/create_table.rs — Support composite unique indexes (Global Secondary Indexes with multiple key columns where DynamoDB allows it).

Phase 5: Test Coverage

Fill in the stubbed tests and add new ones covering all composite key combinations:

Existing stubs to implement:

has_many_when_pk_is_composite — Parent has composite PK, child has single FK pointing to it
has_many_when_fk_and_pk_are_composite — Both sides have composite keys

New tests to add:

Test	Description
`composite_pk_crud`	Full CRUD (create, read, update, delete) on a model with 2+ key fields
`composite_pk_three_fields`	Composite PK with 3 fields to test beyond the 2-field case
`composite_fk_belongs_to`	BelongsTo where the FK is composite (references a composite PK)
`composite_fk_has_one`	HasOne with composite FK
`composite_key_pagination`	Cursor-based pagination with composite PK ordering
`composite_key_scoped_queries`	Scoped queries (e.g., `user.todos().filter_by_id(...)`) with composite keys
`composite_key_update_non_key_fields`	Update non-key fields on a composite-keyed model
`composite_key_unique_constraint`	Composite unique constraint (not PK) behavior
`junction_table_pattern`	Many-to-many junction table with composite PK and extra attributes
`multi_tenant_pattern`	Tenant-scoped models with `(tenant_id, entity_id)` composite PKs

Design Decisions

Tuple-Based Identity

Following Diesel and SQLAlchemy’s lead, composite key identity should be represented as tuples. The current generated methods (get_by_field1_and_field2(val1, val2)) are a good API.

AND Composition for Multi-Field Conditions

When a single-field operation like pk_field = value needs to become a composite operation, the standard approach is:

pk_field1 = value1 AND pk_field2 = value2

This maps cleanly to SQL WHERE clauses and DynamoDB key conditions. The engine’s stmt::ExprAnd already supports this.

IN-List with Composite Keys

For batch lookups, composite IN can be expressed as:

-- Row-value syntax (PostgreSQL, MySQL 8.0+, SQLite)
WHERE (col1, col2) IN ((v1a, v2a), (v1b, v2b))

-- Equivalent OR-of-ANDs (universal)
WHERE (col1 = v1a AND col2 = v2a) OR (col1 = v1b AND col2 = v2b)

The OR-of-ANDs form is more portable across databases. The engine should generate this form and let the SQL serializer optimize to row-value syntax where supported.

Composite FK Optimization

The subquery lifting optimization (lift_in_subquery.rs) currently rewrites:

-- Before: subquery
user_id IN (SELECT id FROM users WHERE name = 'Alice')
-- After: direct comparison
user_id = <alice_id>

For composite FKs, the rewrite becomes:

-- Before: correlated subquery
(order_id, item_number) IN (SELECT order_id, item_number FROM order_items WHERE ...)
-- After: direct comparison
order_id = <val1> AND item_number = <val2>

The same optimization logic applies — just iterated over each FK field pair.

Testing Strategy

All new tests go in the integration suite (toasty-driver-integration-suite) to run against all database backends
Use the existing #[driver_test] macro for multi-database testing
Use the matrix testing infrastructure (composite dimension) where appropriate
Each phase should have passing tests before moving to the next phase
No unit tests in source code per project convention

Query Ordering, Limits & Pagination

Overview

Toasty provides cursor-based pagination using keyset pagination, which offers consistent performance and works well across both SQL and NoSQL databases. The implementation converts pagination cursors into WHERE clauses rather than using OFFSET, avoiding the performance issues of traditional offset-based pagination.

Potential Future Work

Multi-column Ordering Convenience

Add .then_by() method for chaining multiple order clauses:

#![allow(unused)]
fn main() {
let users = User::all()
    .order_by(User::FIELDS.status().asc())
    .then_by(User::FIELDS.created_at().desc())
    .paginate(10)
    .collect(&db)
    .await?;
}

Current workaround requires manual construction:

#![allow(unused)]
fn main() {
use toasty::stmt::OrderBy;

let order = OrderBy::from([
    Post::FIELDS.status().asc(),
    Post::FIELDS.created_at().desc(),
]);

let posts = Post::all()
    .order_by(order)
    .collect(&db)
    .await?;
}

Implementation:

File: toasty-macros/src/expand/query.rs
Add .then_by() method to query builder
Complexity: Medium

Direct Limit Method

Expose .limit() for non-paginated queries:

#![allow(unused)]
fn main() {
let recent_posts: Vec<Post> = Post::all()
    .order_by(Post::FIELDS.created_at().desc())
    .limit(5)
    .collect(&db)
    .await?;
}

Implementation:

File: toasty-macros/src/expand/query.rs
Generate .limit() method
Complexity: Low

Last Convenience Method

Get the last matching record:

#![allow(unused)]
fn main() {
let last_user: Option<User> = User::all()
    .order_by(User::FIELDS.created_at().desc())
    .last(&db)
    .await?;
}

Implementation:

File: toasty-macros/src/expand/query.rs
Generate .last() method
Complexity: Low

Testing

Additional Test Coverage

Tests that could be added:

Multi-column ordering
- Verify correct ordering with multiple columns
- Test tie-breaking behavior
Direct .limit() method (when implemented)
- Non-paginated queries with limit
- Verify correct number of results
.last() convenience method (when implemented)
- Returns last matching record
- Returns None when no matches
Edge cases
- Empty results with pagination
- Single page results (no next/prev cursors)
- Pagination beyond last page
- Large page sizes
- Page size of 1

Database-Specific Considerations

SQL Databases

MySQL: Uses LIMIT n for pagination (keyset approach via WHERE)
PostgreSQL: Uses LIMIT n for pagination (keyset approach via WHERE)
SQLite: Uses LIMIT n for pagination (keyset approach via WHERE)

All SQL databases use keyset pagination (WHERE clauses with cursors) rather than OFFSET for consistent performance.

NoSQL Databases

DynamoDB:
- Limited ordering support (only on sort keys)
- Pagination via LastEvaluatedKey
- Cursor-based approach maps well to DynamoDB’s native pagination
- Needs validation and testing

How Keyset Pagination Works

Instead of using OFFSET, Toasty converts cursors to WHERE clauses:

-- Traditional OFFSET (slow for large offsets)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET 10000;

-- Toasty's cursor approach (always fast)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10;

This provides:

Consistent Performance: O(log n) regardless of page number
Stable Results: New records don’t shift pagination boundaries
Database Agnostic: Works efficiently on NoSQL databases
Real-time Friendly: Handles concurrent insertions gracefully

Notes

Cursors (stmt::Expr) can be serialized at the application level if needed for web APIs
Pagination requires an explicit ORDER BY clause to ensure consistent results
Multi-column ordering works today via manual OrderBy construction
The .then_by() convenience method would improve ergonomics but isn’t essential

Query Constraints & Filtering

Overview

This document identifies gaps in Toasty’s query constraint support compared to mature ORMs, and outlines potential additions for building web applications.

Terminology

A “query constraint” refers to any predicate used in the WHERE clause of a query. In Toasty, constraints are built using:

Generated filter methods (Model::filter_by_<field>()) for indexed/key fields
Generic .filter() method accepting Expr<bool> for arbitrary conditions
Model::FIELDS.<field>() paths combined with comparison methods (.eq(), .gt(), etc.)

Core AST Support Without User API

These expression types exist in toasty-core (crates/toasty-core/src/stmt/expr.rs) and have SQL serialization, but lack a typed user-facing API on Path<T> or Expr<T>:

Expression	Core AST	SQL Serialized	User API	Notes
LIKE	`ExprPattern::Like`	Yes	None	SQL serialization exists
Begins With	`ExprPattern::BeginsWith`	Yes	None	Converted to `LIKE 'prefix%'` in SQL
EXISTS	`ExprExists`	Yes	None on user API	Used internally by engine
COUNT	`ExprFunc::Count`	Yes	None	Internal use only

ORM Comparison

The following table compares Toasty’s constraint support against 8 mature ORMs, highlighting missing features:

| Feature | Toasty | Prisma | Drizzle | Django | SQLAlchemy | Diesel | SeaORM | Hibernate | |—|—|—|—|—|—|—|—|—|—| | Set Operations | | | | | | | | | | NOT IN | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Range | | | | | | | | | | BETWEEN | No | Via gt+lt | Yes | Yes | Yes | Yes | Yes | Yes | | String Operations | | | | | | | | | | LIKE | AST only | Via contains | Yes | Yes | Yes | Yes | Yes | Yes | | Contains (substring) | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Starts with | AST only | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Ends with | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Case-insensitive (ILIKE) | No | Yes | Yes | Yes | Yes | Pg only | No | Manual | | Regex | No | No | No | Yes | Yes | No | No | No | | Full-text search | No | Preview | No | Yes (Pg) | Dialect | Crate | No | Extension | | Relation Filtering | | | | | | | | | | Filter by related fields | No | Yes | Via join | Yes | Yes | Via join | Via join | Via join | | Has related (some/none/every) | No | Yes | Via exists | Via exists | Yes | Via exists | Via join | Via exists | | Aggregation | | | | | | | | | | COUNT / SUM / AVG / etc. | No | Limited | Yes | Yes | Yes | Yes | Yes | Yes | | GROUP BY | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | HAVING | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Advanced | | | | | | | | | | Field-to-field comparison | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Arithmetic in queries | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Raw SQL escape hatch | No | Full query | Inline | Multiple | Inline | Inline | Inline | Native query | | JSON field queries | No | Limited | Via raw | Yes | Yes | Pg | Via raw | No | | CASE / WHEN | No | No | No | Yes | Yes | No | No | Yes | | Dynamic/conditional filters | No | Spread undef | Pass undef | Chain | Chain | BoxableExpr | add_option | Build list |

Potential Future Work

Features with Existing Internal Support

These features have core AST and SQL serialization but need user-facing APIs:

String Pattern Matching

Core AST: ExprPattern::BeginsWith and ExprPattern::Like exist with SQL serialization
Needed:
- Add ExprPattern::EndsWith and ExprPattern::Contains to core AST
- Add .contains(), .starts_with(), .ends_with() on Path<String>
- Add .like() for direct pattern matching
- Handle LIKE special character escaping (%, _)
Files: crates/toasty/src/stmt/path.rs, crates/toasty-core/src/stmt/expr.rs
Use case: Search functionality (e.g., search users by name fragment)

NOT IN

Current: IN exists but no negated form
Needed: ExprNotInList or negate the InList expression, plus .not_in_list() user API
Files: crates/toasty/src/stmt/path.rs, crates/toasty-core/src/stmt/expr.rs
Use case: Exclusion lists (e.g., “exclude these IDs from results”)

Features Needing New Implementation

Case-Insensitive String Matching

Current: No support at any layer
Needed: ILIKE support in SQL serialization (PostgreSQL native, LOWER() wrapper for SQLite/MySQL), plus user API
Design consideration: How to handle cross-database differences (ILIKE is Pg-only, LOWER()+LIKE is universal but slower)
Reference: Prisma (mode: 'insensitive'), Django (__iexact, __icontains)
Use case: User-facing search (e.g., email lookup, name search)

BETWEEN / Range Queries

Current: Users must combine .ge() and .le() manually
Needed: Syntactic sugar over AND(ge, le), or a dedicated ExprBetween
File: crates/toasty/src/stmt/path.rs
Reference: Drizzle (between()), Django (__range), Diesel (.between())
Use case: Date ranges, price ranges, numeric filtering

Relation/Association Filtering

Current: Scoped queries exist but no way to filter a top-level query by related model fields
Needed: JOIN or EXISTS subquery generation in the engine, plus user API design
Complexity: High - requires significant engine work
Reference: Prisma (some/none/every), Django (__ traversal), SQLAlchemy (.any()/.has())
Use case: Filtering parents by child attributes (e.g., “users who have at least one order over $100”)

Field-to-Field Comparison

Current: Path::eq() requires IntoExpr<T>, which accepts values but should also accept paths
Needed: Ensure Path<T> implements IntoExpr<T> and codegen supports cross-field comparisons
Reference: Django (F() expressions), SQLAlchemy (column comparison)
Use case: Comparing two columns (e.g., “updated_at > created_at”, “balance > minimum_balance”)

Arithmetic Operations in Queries

Current: No support - BinaryOp only includes comparison operators (Eq, Ne, Gt, Ge, Lt, Le)
Needed:
- Add arithmetic operators to AST: Add, Subtract, Multiply, Divide, Modulo
- SQL serialization for arithmetic expressions (standard across databases)
- User API to build arithmetic expressions (e.g., .add(), .multiply(), operator overloading, or expression builder)
- Type handling for arithmetic results (ensure type safety)
Files: crates/toasty-core/src/stmt/op_binary.rs, crates/toasty-core/src/stmt/expr.rs, crates/toasty/src/stmt/path.rs
Reference:
- Django: F('price') * F('quantity') > 100
- SQLAlchemy: column('price') * column('quantity') > 100
- Diesel: price.eq(quantity * 2)
- Drizzle: sqlprice * quantity > 100``
Use cases:
- Computed comparisons: WHERE age <= 2 * years_in_school
- Price calculations: WHERE price * quantity > 1000
- Time differences: WHERE (end_time - start_time) > 3600
- Percentage calculations: WHERE (actual / budget) * 100 > 110
- Complex business rules: WHERE (base_price * (1 - discount_rate)) > minimum_price
Design considerations:
- Should arithmetic create new expression types or extend BinaryOp?
- How to handle type coercion (int vs float, time arithmetic)?
- Support for parentheses and operator precedence
- Whether to support on SELECT side (computed columns) or just WHERE clauses initially

Aggregate Queries

Current: ExprFunc::Count exists internally but is not user-facing
Needed: User-facing API, return type handling, integration with GROUP BY
Complexity: High - requires significant API design
Reference: Django’s annotation system, SQLAlchemy’s func
Use case: Dashboards, analytics, summary views, pagination metadata

GROUP BY / HAVING

Current: No support at any layer
Needed: AST additions, SQL generation, engine support, user API
Complexity: High
Use case: Aggregate queries, reports, analytics, dashboards

Raw SQL Escape Hatch

Current: No support
Needed: Safe API for parameterized raw SQL fragments within typed queries
Design consideration: Full raw queries vs. raw fragments within typed queries vs. both
Reference: Drizzle (sql`...` templates), SQLAlchemy (text()), Diesel (sql())
Use case: Edge cases that the ORM can’t express

Dynamic / Conditional Query Building

Current: Users can chain .filter() calls, but no ergonomic way to skip filters when parameters are None
Needed: Pattern for optional filters
Reference: SeaORM (Condition::add_option()), Prisma (spread undefined), Diesel (BoxableExpression)
Use case: Search forms, filter UIs, API endpoints with optional parameters

Full-Text Search

Current: No support
Complexity: High - database-specific implementations (PostgreSQL tsvector, MySQL FULLTEXT, SQLite FTS5)
Design consideration: May be best as database-specific extensions rather than a unified API
Use case: Content-heavy applications (blogs, e-commerce, documentation sites)

JSON Field Queries

Current: No support
Complexity: High - needs path traversal syntax, type handling, database-specific operators
Dependency: Depends on JSON/JSONB data type support
Reference: Django (field__key__subkey), SQLAlchemy (column['key'])
Use case: Flexible/schemaless data within relational databases

Advanced / Niche Features

Regex Matching

Use case: Power-user filtering, data validation queries
Reference: Django (__regex, __iregex), SQLAlchemy (regexp_match())

Array/Collection Operations

Use case: PostgreSQL array columns, MongoDB array fields
Dependency: Requires array type support first
Reference: Prisma (has, hasEvery, hasSome), Django (ArrayField lookups)

CASE/WHEN Expressions

Use case: Conditional logic within queries for complex business rules
Reference: Django (When()/Case()), SQLAlchemy (case())

Subquery Comparisons (ALL/ANY/SOME)

Use case: Advanced filtering like “price > ALL(SELECT price FROM competitors)”
Reference: Hibernate, SQLAlchemy (all_(), any_())

IS DISTINCT FROM

Use case: NULL-safe comparisons without special-casing IS NULL
Reference: SQLAlchemy (only ORM with native support)

Implementation Considerations

Recommended Approach

Based on the analysis above, the following groupings maximize user value:

Group 1: Expose Existing Internals Items with core AST and SQL serialization that only need user-facing methods:

.not_in_list() on Path<T> (negate existing InList)

Estimated scope: ~50 lines of user-facing API code + integration tests

Group 2: String Operations Partial AST support that needs completion and exposure:

Add ExprPattern::EndsWith and ExprPattern::Contains to core AST
Add SQL serialization for new pattern variants
Add .contains(), .starts_with(), .ends_with() to Path<String>
Handle LIKE special character escaping

Estimated scope: ~200 lines across core + SQL + user API

Group 3: Ergonomic Improvements

Case-insensitive matching (ILIKE / LOWER() wrapper)
.between() convenience method
.like() direct exposure
Conditional/optional filter building helpers

Group 4: Structural Features Requires deeper engine work:

Relation filtering (JOIN/EXISTS generation)
Aggregate functions (user-facing COUNT/SUM/etc.)
GROUP BY / HAVING
Raw SQL escape hatch

Reference Implementation Goals

A comprehensive query constraint system would allow users to:

Search strings by substring, prefix, and suffix (case-sensitive and case-insensitive)
Use NOT IN with literal lists and subqueries
Filter by related model attributes
Use at least basic aggregate queries (COUNT)
Fall back to raw SQL for anything the ORM can’t express

This would put Toasty on par with the filtering capabilities of Diesel and SeaORM, and cover the vast majority of queries needed by typical web applications.

Query Engine Optimization Roadmap

Overview

The query engine currently performs simplification as a single VisitMut pass that applies local rewrite rules bottom-up. This works well for straightforward transformations (constant folding, tuple decomposition, association rewriting), but it has structural limitations as the optimizer takes on more complex work.

This document tracks improvements to the query engine’s optimization infrastructure, focusing on predicate simplification and the compilation pipeline.

Current State

Simplification Pass

The simplifier (engine/simplify.rs) implements VisitMut and applies rules in a single bottom-up traversal. Each node is visited once, simplified, and then its parent is simplified with the updated children.

What works well:

Local rewrites: constant folding, boolean identity, tuple decomposition
Association rewriting and subquery lifting
Match elimination (distributing binary ops over match arms)

Structural limitations:

Rules fire during the walk, so ordering matters. A rule that produces expressions consumable by another rule only works if the consumer fires later in the same walk or the walk is re-run.
Global analysis (e.g., detecting contradictions across an entire AND conjunction) must be done inline during the walk, mixing local and global concerns.
Expensive analyses run on every AND node encountered, even when only a small fraction would benefit.

Contradicting Equality Detection

The simplifier currently detects a = c1 AND a = c2 (where c1 != c2) inline in simplify_expr_and. This is O(n^2) in the number of equality predicates within a single AND. While operand lists are typically small, the analysis runs on every AND node during the walk, including intermediate nodes that are about to be restructured by other rules.

Planned Improvements

Phase 1: Post-Lowering Optimization Pass

Move expensive predicate analysis out of the per-node simplifier and into a dedicated pass that runs once after lowering, against the HIR representation. At this point the statement is fully resolved to table-level expressions and the predicate tree is stable — no more association rewrites or field resolution changes will restructure it.

This pass would handle:

Contradicting equality pruning
Redundant predicate elimination
Tautology detection
ExprLet inlining (currently done at the end of lower_returning; should move here so all post-lowering expression rewrites live in one place)

Why after lowering: Before lowering, predicates reference model-level fields and contain relationship navigation that the lowering phase rewrites. Running global analysis before this rewriting is wasted work — the predicate tree will change. After lowering, the predicates are in their final structural form (column references, subqueries), so analysis results are stable.

Phase 2: Equivalence Classes

Build equivalence classes from equality predicates before running constraint analysis. When the optimizer sees a = b AND b = c, it should know that a, b, and c are all equivalent, enabling:

Transitive contradiction detection: a = b AND b = 5 AND a = 7 is a contradiction (a must be both 5 and 7), even though no single pair of predicates directly conflicts.
Predicate implication: a = 5 AND a > 3 — the second predicate is implied and can be dropped.
Join predicate inference: If a = b and a filter constrains a, the same constraint applies to b.

Equivalence classes are a standard technique in query optimizers. The idea is to union-find expressions that are constrained to be equal, then check each class for conflicting constant bindings or range constraints.

Phase 3: Structured Constraint Analysis

Replace ad-hoc pairwise comparisons with a more structured representation of constraints. For each expression (or equivalence class), maintain:

Constant binding: The expression must equal a specific value
Range bounds: Upper/lower bounds from inequality predicates
NOT-equal set: Values the expression cannot be (from != predicates)

With this structure, contradiction detection becomes a property check rather than a search: an expression with two different constant bindings, or a constant binding outside its range bounds, is immediately contradictory.

Predicate Normalization (Not Full DNF)

Full conversion to disjunctive normal form (DNF) — where the entire predicate becomes an OR of ANDs — risks exponential blowup. A predicate with N AND-connected clauses of M OR-options each expands to M^N terms. This makes full DNF impractical as a general-purpose transformation.

Instead, apply targeted normalization:

Flatten associative operators: Merge nested AND(AND(...), ...) and OR(OR(...), ...) into flat lists (already done).
Canonicalize comparison direction: Ensure constants are on the right side of comparisons (already done).
Limited distribution: Distribute AND over OR only in specific cases where it enables index utilization or constraint extraction, with a size budget to prevent blowup.
OR-of-equalities to IN-list: Convert a = 1 OR a = 2 OR a = 3 to a IN (1, 2, 3) for more efficient execution.

The goal is to normalize enough for the constraint analysis to work without paying the exponential cost of full DNF.

Design Principles

Run expensive analysis once, not per-node. The current simplifier intermixes cheap local rewrites with expensive global analysis. Separate them.
Analyze after the predicate tree is stable. Post-lowering is the right point — predicates are resolved to columns and won’t be restructured.
Build structure, then query it. Constructing equivalence classes and constraint summaries up front makes individual checks cheap.
Budget-limited transformations. Any rewrite that can expand expression size (distribution, case expansion) must have a size limit.

Keyboard shortcuts

Toasty Developer Documentation