Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Toasty Architecture Overview

Project Structure

Toasty is an ORM for Rust that supports SQL and NoSQL databases. The codebase is a Cargo workspace with separate crates for each layer.

Crates

1. toasty

User-facing crate with query engine and runtime.

Key Components:

  • engine/: Multi-phase query compilation and execution pipeline
  • stmt/: Typed statement builders (wrappers around toasty_core::stmt types)
  • relation/: Relationship abstractions (HasMany, BelongsTo, HasOne)
  • model.rs: Model trait and ID generation

Query Execution Pipeline (high-level):

Statement AST → Simplify → Lower → Plan → Execute → Results

The engine compiles queries into a mini-program of actions executed by an interpreter. For details on HIR, MIR, and the full compilation pipeline, see Query Engine Architecture.

2. toasty-core

Shared types used by all other crates: schema representations, statement AST, and driver interface.

Key Components:

  • schema/: Model and database schema definitions
    • app/: Model-level definitions (fields, relations, constraints)
    • db/: Database-level table and column definitions
    • mapping/: Maps between models and database tables
    • builder/: Schema construction utilities
    • verify/: Schema validation
  • stmt/: Statement AST nodes for queries, inserts, updates, deletes
  • driver/: Driver interface, capabilities, and operations

3. toasty-codegen

Generates Rust code from the #[derive(Model)] macro.

Key Components:

  • schema/: Parses model attributes into schema representation
  • expand/: Generates implementations for models
    • model.rs: Model trait implementation
    • query.rs: Query builder methods
    • create.rs: Create/insert builders
    • update.rs: Update builders
    • relation.rs: Relationship methods
    • fields.rs: Field accessors
    • filters.rs: Filter method generation
    • schema.rs: Runtime schema generation

4. toasty-driver-*

Database-specific driver implementations.

Supported Databases:

  • toasty-driver-sqlite: SQLite implementation
  • toasty-driver-postgresql: PostgreSQL implementation
  • toasty-driver-mysql: MySQL implementation
  • toasty-driver-dynamodb: DynamoDB implementation

5. toasty-sql

Converts statement AST to SQL strings. Used by SQL-based drivers.

Key Components:

  • serializer/: SQL generation with dialect support
    • flavor.rs: Database-specific SQL dialects
    • statement.rs: Statement serialization
    • expr.rs: Expression serialization
    • ty.rs: Type serialization
  • stmt/: SQL-specific statement types

Further Reading

Toasty Query Engine

This document provides a high-level overview of the Toasty query execution engine for developers working on engine internals. It describes the multi-phase pipeline that transforms user queries into database operations.

Overview

The Toasty engine is a multi-database query compiler and runtime that executes ORM operations across SQL and NoSQL databases. It transforms a user’s query (represented as a Statement AST) into a sequence of executable actions through multiple compilation phases.

Execution Model

The final output is a mini program executed by an interpreter. Think of it like a small virtual machine or bytecode interpreter, though there is no control flow (yet):

  • Instructions (Actions): Operations like “execute this SQL”, “filter these results”, “merge child records into parents”
  • Variables: Storage slots, or registers, that hold intermediate results between instructions
  • Linear Execution: Instructions run in sequence (no control flow - no branches or loops, yet). Eventually, the interpreter will be smart about concurrency and execute independent operations in parallel when possible.
  • Interpreter: The engine executor reads each instruction, fetches inputs from variables, performs the operation, and stores outputs back to variables

For example, loading users with their todos:

SELECT users.id, users.name, (
    SELECT todos.id, todos.title 
    FROM todos 
    WHERE todos.user_id = users.id
) FROM users WHERE ...

compiles to a program like:

$0 = ExecSQL("SELECT * FROM users WHERE ...")
$1 = ExecSQL("SELECT * FROM todos WHERE user_id IN ...")
$2 = NestedMerge($0, $1, by: user_id)
return $2

The compilation pipeline below transforms user queries into this instruction/variable representation. Each phase brings the query closer to this final executable form.

Compilation Pipeline

User Query (Statement AST)
    ↓
[Verification] - Validate statement structure (debug builds only)
    ↓
[Simplification] - Normalize and optimize the statement AST
    ↓
[Lowering] - Convert to HIR for dependency analysis
    ↓
[Planning] - Build MIR operation graph
    ↓
[Execution Planning] - Convert to action sequence with variables
    ↓
[Execution] - Run actions against database driver
    ↓
Result Stream

Phase 1: Simplification

Location: engine/simplify.rs

The simplification phase normalizes and optimizes the statement AST before planning.

Key Transformations

  • Association Rewriting: Converts relationship navigation (e.g., user.todos()) into explicit subqueries with foreign key filters
  • Subquery Lifting: Transforms IN (SELECT ...) expressions into more efficient join-like operations
  • Expression Normalization: Simplifies complex expressions (e.g., flattening nested ANDs/ORs, constant folding)
  • Path Expression Rewriting: Resolves field paths and relationship traversals into explicit column references
  • Empty Query Detection: Identifies queries that will return no results

Example: Association Simplification

#![allow(unused)]
fn main() {
// user.todos().delete() generates:

Delete {
    from: Todo,
    via: User::todos,  // relationship traversal
    ...
}

// After simplification:

Delete {
    from: Todo,
    filter: todo.user_id IN (SELECT id FROM users WHERE ...)
}
}

Converting relationship navigation into explicit filters early means downstream phases only need to handle standard query patterns with filters and subqueries - no special-case logic for each relationship type.

Phase 2: Lowering

Location: engine/lower.rs

Lowering converts a simplified statement into HIR (High-level Intermediate Representation) - a collection of related statements with tracked dependencies.

Toasty tries to maximize what the target database can handle natively, only decomposing queries when necessary. For example, a query like User::find_by_name("John").todos().all() contains a subquery. SQL databases can execute this as SELECT * FROM todos WHERE user_id IN (SELECT id FROM users WHERE name = 'John'). DynamoDB cannot handle subqueries, so lowering splits this into two statements: first fetch user IDs, then query todos with those IDs.

The HIR tracks a dependency graph between statements - which statements depend on results from others, and which columns flow between them. This graph can contain cycles when preloading associations. For example:

SELECT users.id, users.name, (
    SELECT todos.id, todos.title 
    FROM todos 
    WHERE todos.user_id = users.id
) FROM users WHERE ...

The users query must execute first to provide IDs for the todos subquery, but the todos results must be merged back into the user records. This creates a cycle: users → todos → users.

This lowering phase handles:

  • Statement Decomposition: Breaking queries into sub-statements when the database can’t handle them directly
  • Dependency Tracking: Which statements must execute before others
  • Argument Extraction: Identifying values passed between statements (e.g., a loaded model’s ID used in a child query’s filter)
  • Relationship Handling: Processing relationship loads and nested queries

Lowering Algorithm

Lowering transforms model-level statements to table-level statements through a visitor pattern that rewrites each part of the statement AST:

  1. Table Resolution: InsertTarget::Model, UpdateTarget::Model, etc. become their corresponding table references
  2. Returning Clause Transformation: Returning::Model is replaced with Returning::Expr containing the expanded column expressions
  3. Field Reference Resolution: Model field references are converted to table column references
  4. Include Expansion: Association includes become subqueries in the returning clause

The TableToModel mapping (built during schema construction) drives the transformation. It contains an expression for each model field that maps to its corresponding table column(s). This supports more than a 1-1 mapping—a model field can be derived from multiple columns or a column can map to multiple fields. Association fields are initialized to Null in this mapping.

When lowering encounters a Returning::Model { include } clause:

  1. Call table_to_model.lower_returning_model() to get the base column expressions
  2. For each path in the include list, call build_include_subquery() to generate a subquery that selects the associated records
  3. Replace the Null placeholder in the returning expression with the generated subquery

Lowering Examples

Example 1: Simple query

Given a model with a renamed column:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key] #[auto] id: u64,
    #[column(name = "first_and_last_name")]
    name: String,
    email: String,
}
}
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Note: At model-level, no specific fields are selected

// After lowering
SELECT id, first_and_last_name, email FROM users WHERE id = ?
}

Example 2: Query with association

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
  INCLUDE todos

// After lowering
SELECT id, first_and_last_name, email, (
    SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
}

Phase 3: Planning

Location: engine/plan.rs

Planning converts HIR into MIR (Middle-level Intermediate Representation) - a directed acyclic graph of operations, both database queries and in-memory transformations. Edges represent data dependencies: an operation cannot execute until all operations it depends on have completed and produced their results.

Since the HIR graph can contain cycles, planning must break them to produce a DAG. This is done by introducing intermediate operations that batch-load data and merge results (e.g., NestedMerge).

Operation Types

The MIR supports various operation types (see engine/mir.rs for details):

SQL operations:

  • ExecStatement - Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)
  • ReadModifyWrite - Optimistic locking (read, modify, conditional write). Exists as a separate operation because the read result must be processed in-memory to compute the write, which ExecStatement cannot express.

Key-value operations (NoSQL):

  • GetByKey, DeleteByKey, UpdateByKey - Direct key access
  • QueryPk, FindPkByIndex - Key lookups via queries or indexes

In-memory operations:

  • Filter, Project - Transform and filter results
  • NestedMerge - Merge child records into parent records
  • Const - Constant values

Phase 4: Execution Planning

Location: engine/plan/execution.rs

Execution planning converts the MIR logical plan into a concrete sequence of actions that can be executed. This phase:

  • Assigns variable slots for storing intermediate results
  • Converts each MIR operation into an execution action
  • Maintains topological ordering to ensure dependencies execute first

Action Types

Actions mirror MIR operations but include concrete variable bindings:

SQL actions:

  • ExecStatement: Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)
  • ReadModifyWrite: Optimistic locking (read, modify, conditional write)

Key-value actions (NoSQL):

  • GetByKey: Batch fetch by primary key
  • DeleteByKey: Delete records by primary key
  • UpdateByKey: Update records by primary key
  • QueryPk: Query primary keys
  • FindPkByIndex: Find primary keys via secondary index

In-memory actions:

  • Filter: Apply in-memory filter to a variable’s data
  • Project: Transform records
  • NestedMerge: Merge child records into parent records
  • SetVar: Set a variable to a constant value

Phase 5: Execution

Location: engine/exec.rs

The execution phase is the interpreter that runs the compiled program. It iterates through actions, reading inputs from variables, performing operations, and storing outputs back to variables.

Execution Loop

The interpreter follows a simple pattern:

  1. Initialize variable storage
  2. For each action in sequence:
    • Load input data from variables
    • Perform the operation (database query or in-memory transform)
    • Store the result in the output variable
  3. Return to the user the result from the final variable (the last action’s output)

Variable Lifetime

The engine tracks how many times each variable is referenced by downstream actions. A variable may be used by multiple actions (e.g., the same user records merged with both todos and comments). When the last action that needs a variable completes, the variable’s value is dropped to free memory.

Driver Interaction

The execution phase is the only part of the engine that communicates with database drivers. The driver interface is intentionally simple: a single exec() method that accepts an Operation enum. This enum includes variants for both SQL operations (QuerySql, Insert) and key-value operations (GetByKey, QueryPk, FindPkByIndex, DeleteByKey, UpdateByKey).

Each driver implements whichever operations it supports. SQL drivers handle QuerySql natively while key-value drivers handle GetByKey, QueryPk, etc. The planner uses driver.capability() to determine which operations to generate for each database type.

Toasty Type System Architecture

Overview

Toasty uses Rust’s type system in the public API with both concrete types and generics. The query engine tracks the type of value each statement evaluates to using stmt::Type. This document describes how types flow through the system and the key components involved.

Type System Boundaries

Toasty has two distinct type systems with different responsibilities:

1. Rust-Level Type System (Compile-Time Safety)

At the Rust level, each model is a distinct type:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    name: String,
    email: String,
}

#[derive(Model)]
struct Todo {
    #[key]
    #[auto]
    id: u64,
    user_id: u64,
    title: String,
}

// Toasty generates type-safe field access preventing type mismatches:
User::get_by_email(&db, "john@example.com").await?;  // ✓ String matches email field
User::filter_by_id(&user_id).filter(User::FIELDS.name().eq("John")).all(&db).await?;  // ✓ String matches name field

// Type system prevents field/model confusion:
// User::FIELDS.title()  // ← Compile error! User has no title field
// Todo::FIELDS.email()  // ← Compile error! Todo has no email field
// User::FIELDS.name().eq(&todo_id)  // ← Compile error! u64 doesn't match String
}

The query builder API maintains this type safety through generics and traits, preventing you from accidentally mixing model types or referencing non-existent fields. The API uses generic types (Statement<M>, Select<M>, etc.) that wrap toasty_core::stmt types.

2. Query Engine Type System (Runtime)

When db.exec(statement) is called, the generic <M> parameter is erased:

#![allow(unused)]
fn main() {
// Generated query builder returns a typed wrapper
let query: FindUserById = User::find_by_id(&id);

// .into() converts to Statement<User>
let statement: Statement<User> = query.into();

// At db.exec() - generic is erased, .untyped is extracted
pub async fn exec<M: Model>(&self, statement: Statement<M>) -> Result<ValueStream> {
    engine::exec(self, statement.untyped).await  // <- Only toasty_core::stmt::Statement
}
}

At this boundary, the statement becomes untyped (no Rust generic), but the engine tracks the type of value the statement evaluates to using stmt::Type. Initially, this remains at the model-level—a query for User evaluates to Type::List(Type::Model(user_model_id)). During lowering, these convert to structural record types for database execution.

Type Flow Through the System

Rust API → Query Builder → Engine Entry → Lowering/Planning → Execution
    ↓           ↓              ↓               ↓                  ↓
Distinct    Type-Safe      Type::Model     Type::Record       stmt::Value
Types       Generics       (no generics)                      (typed)
(compile)   (compile)      (runtime)       (runtime)          (runtime)

At lowering, statements that evaluate to Type::Model(model_id) are converted to evaluate to Type::Record([field_types...]). This conversion enables the engine to work with concrete field types for database operations.

Detailed Architecture

Query Engine Entry Point

When the engine receives a toasty_core::stmt::Statement, it processes through verification, lowering, planning, and execution:

#![allow(unused)]
fn main() {
pub(crate) async fn exec(&self, stmt: Statement) -> Result<ValueStream> {
    if cfg!(debug_assertions) {
        self.verify(&stmt);
    }

    // Lower the statement to High-level intermediate representation
    let hir = self.lower_stmt(stmt)?;

    // Translate into a series of driver operations
    let plan = self.plan_hir_statement(hir)?;

    // Execute the plan
    self.exec_plan(plan).await
}
}

Lowering Phase (Model-to-Table Transformation)

The lowering phase transforms statements from model-level to table-level representations.

Example 1: Simple query

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// Note: At model-level, no specific fields are selected

// After lowering
SELECT id, name, email FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([Type::Id, Type::String, Type::String]))
}

Example 2: Query with association

#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User INCLUDE todos WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// where todos field is Type::List(Type::Model(todo_model_id))

// After lowering
SELECT id, name, email, (
    SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([
//   Type::Id, Type::String, Type::String,
//   Type::List(Type::Record([Type::Id, Type::String, Type::Id]))
// ]))
}

Planning and Variable Types

During planning, the engine assigns variables to hold intermediate results (see Query Engine Architecture for details on the execution model). Each variable is registered with its type, which is always Type::List(...) or Type::Unit.

Execution

At execution time, the VarStore holds the type information from planning. When storing a value stream in a variable, the store associates the expected type with it. The value stream ensures each value it yields conforms to that type. This type information carries through to the final result returned to the user.

Type Inference

While statements entering the engine have known types, planning constructs new expressions—projections, filters, and merge qualifications—whose types aren’t explicitly declared. The engine must infer these types from the expression structure to register variables correctly.

Type inference is handled by ExprContext, which walks expression trees and determines their result types based on the schema. For example, a column reference’s type comes from the schema definition, and a record expression’s type is built from its field types.

#![allow(unused)]
fn main() {
// Create context for type inference
let cx = stmt::ExprContext::new_with_target(&*self.engine.schema, stmt);

// Infer type of an expression reference
let ty = cx.infer_expr_reference_ty(expr_reference);

// Infer type of a full expression with argument types
let ret = ExprContext::new_free().infer_expr_ty(expr.as_expr(), &args);
}

Design

Design documents for Toasty.

Batch Query Execution

Overview

Batch queries let users send multiple independent queries to the database in a single round-trip. The results come back as a typed tuple matching the input queries.

#![allow(unused)]
fn main() {
let (active_users, recent_posts) = toasty::batch((
    User::find_by_active(true),
    Post::find_recent(100),
)).exec(&db).await?;

// active_users: Vec<User>
// recent_posts: Vec<Post>
}

The batch composes all queries into a single Statement whose returning expression is a record of subqueries. This means batch execution flows through the existing exec path — no new executor methods, no new driver operations.

This design covers SQL databases only. DynamoDB support is out of scope.

New Trait: IntoStatement<T>

A single new trait bridges query builders to Statement<T>:

#![allow(unused)]
fn main() {
pub trait IntoStatement<T> {
    fn into_statement(self) -> Statement<T>;
}
}

Query builders implement this for their model type. For example, UserQuery implements IntoStatement<User>:

#![allow(unused)]
fn main() {
impl IntoStatement<User> for UserQuery {
    fn into_statement(self) -> Statement<User> {
        self.stmt.into()
    }
}
}

The codegen already produces IntoSelect impls for query builders. IntoStatement can be blanket-implemented for anything that implements IntoSelect:

#![allow(unused)]
fn main() {
impl<T: IntoSelect> IntoStatement<T::Model> for T {
    fn into_statement(self) -> Statement<T::Model> {
        self.into_select().into()
    }
}
}

Tuple implementations

Tuples of IntoStatement types implement IntoStatement by composing their inner statements into a single select whose returning expression is a record of subqueries:

#![allow(unused)]
fn main() {
impl<T1, T2, A, B> IntoStatement<(Vec<T1>, Vec<T2>)> for (A, B)
where
    A: IntoStatement<T1>,
    B: IntoStatement<T2>,
{
    fn into_statement(self) -> Statement<(Vec<T1>, Vec<T2>)> {
        let stmt_a = self.0.into_statement().untyped;
        let stmt_b = self.1.into_statement().untyped;

        // Build: SELECT (stmt_a), (stmt_b)
        let query = stmt::Query::values(stmt::Expr::record([
            stmt::Expr::subquery(stmt_a),
            stmt::Expr::subquery(stmt_b),
        ]));

        Statement::from_raw(query.into())
    }
}
}

The resulting statement is equivalent to SELECT (subquery_1), (subquery_2). At the Toasty AST level this is a Query whose returning body is a Record([Expr::Stmt, Expr::Stmt]). The engine handles each subquery independently during execution and packs the results into a single Value::Record.

Tuple impls for arities 2 through 8 are generated with a macro.

Load for Tuples and Vec<T>

To deserialize the composed result, Load is implemented for Vec<T> and for tuples:

#![allow(unused)]
fn main() {
impl<T: Load> Load for Vec<T> {
    fn load(value: stmt::Value) -> Result<Self> {
        match value {
            Value::List(items) => items
                .into_iter()
                .map(T::load)
                .collect(),
            _ => Err(Error::type_conversion(value, "Vec<T>")),
        }
    }
}

impl<A: Load, B: Load> Load for (A, B) {
    fn load(value: stmt::Value) -> Result<Self> {
        match value {
            Value::Record(mut record) => Ok((
                A::load(record[0].take())?,
                B::load(record[1].take())?,
            )),
            _ => Err(Error::type_conversion(value, "(A, B)")),
        }
    }
}
}

With these impls, Load for (Vec<User>, Vec<Post>) works automatically: the outer tuple impl splits the record, then each Vec<T> impl iterates the list and loads individual model instances.

User-Facing API

#![allow(unused)]
fn main() {
pub fn batch<T, Q: IntoStatement<T>>(queries: Q) -> Batch<T>
where
    T: Load,
{
    Batch {
        stmt: queries.into_statement(),
    }
}

pub struct Batch<T> {
    stmt: Statement<T>,
}

impl<T: Load> Batch<T> {
    pub async fn exec(self, executor: &mut dyn Executor) -> Result<T> {
        use ExecutorExt;
        let stream = executor.exec(self.stmt).await?;
        let value = stream.next().await
            .ok_or_else(|| Error::record_not_found("batch returned no results"))??;
        T::load(value)
    }
}
}

Batch::exec calls the regular ExecutorExt::exec method. The composed statement flows through the standard engine pipeline. The result is a single value (a record of lists) that T::load deserializes into the typed tuple.

Execution Flow

User code:
    toasty::batch((UserQuery, PostQuery)).exec(&db)

IntoStatement for (A, B):
    SELECT (SELECT ... FROM users WHERE ...), (SELECT ... FROM posts ...)

Engine pipeline (standard exec path):
    lower → plan → exec

    The engine recognizes Expr::Stmt subqueries in the returning
    expression and executes each independently.

Result:
    Value::Record([
        Value::List([user1, user2, ...]),
        Value::List([post1, post2, ...]),
    ])

Load for (Vec<User>, Vec<Post>):
    (A::load(record[0]), B::load(record[1]))
    → (Vec<User>::load(list), Vec<Post>::load(list))
    → (vec![User::load(v1), ...], vec![Post::load(v1), ...])

Statement Changes

Statement<M> needs a way to construct from a raw stmt::Statement without requiring M: Model:

#![allow(unused)]
fn main() {
impl<M> Statement<M> {
    /// Build a statement from a raw untyped statement.
    ///
    /// Used by batch composition where M may be a tuple, not a model.
    pub(crate) fn from_raw(untyped: stmt::Statement) -> Self {
        Self {
            untyped,
            _p: PhantomData,
        }
    }
}
}

The existing Statement::from_untyped requires M: Model (via IntoSelect). from_raw has no bound on M and is pub(crate) so only internal code uses it.

Engine Support

The engine needs to handle a Query whose returning expression is a record of Expr::Stmt subqueries where each subquery returns multiple rows.

The lowerer already handles Expr::Stmt for association preloading (INCLUDE), where subqueries get added to the dependency graph and executed as part of the plan. Batch queries follow the same pattern: each Expr::Stmt in the returning record becomes an independent subquery in the plan, and the exec phase collects results into a Value::Record of Value::Lists.

If the existing lowerer does not handle bare subqueries in a returning record (outside of an INCLUDE context), a small extension is needed to recognize this pattern and plan it the same way.

Implementation Plan

Phase 1: IntoStatement trait and Load impls

  1. Add IntoStatement<T> trait to crates/toasty/src/stmt/
  2. Add blanket impl IntoStatement<T::Model> for T: IntoSelect
  3. Add Load for Vec<T> and Load for (A, B) (and higher arities via macro)
  4. Add Statement::from_raw
  5. Export IntoStatement from lib.rs and codegen_support

Phase 2: Batch API

  1. Add toasty::batch() function and Batch<T> struct
  2. Add tuple impls of IntoStatement<(Vec<T1>, Vec<T2>, ...)> (via macro)
  3. Wire Batch::exec through the standard ExecutorExt::exec path

Phase 3: Engine support

  1. Verify that the lowerer handles Expr::Stmt subqueries in a returning record correctly (it may already work via the INCLUDE path)
  2. If not, extend the lowerer to plan bare record-of-subqueries statements
  3. Verify the exec phase packs subquery results into Value::Record of Value::Lists

Phase 4: Integration tests

  1. Batch two selects on different models
  2. Batch a select that returns rows with a select that returns empty
  3. Batch with filters, ordering, and limits
  4. Batch inside a transaction
  5. Batch of a single query (degenerates to normal execution)

Files Modified

FileChange
crates/toasty/src/stmt/into_statement.rsNew: IntoStatement<T> trait, blanket impl
crates/toasty/src/stmt.rsAdd Statement::from_raw, re-export IntoStatement
crates/toasty/src/load.rsAdd Load impls for Vec<T> and tuples
crates/toasty/src/batch.rsAdd batch(), Batch<T>, tuple IntoStatement impls
crates/toasty/src/lib.rsRe-export batch, Batch, IntoStatement
crates/toasty/src/engine/lower.rsHandle record-of-subqueries in returning (if needed)

Compile-Time Required Field Verification for create!

Problem

When a user omits a required field from a create! invocation, the error only surfaces at runtime as a database NOT NULL constraint violation. We want a compile-time error that names the missing field.

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: Id<User>,

    name: String,              // required
    email: String,             // required
    bio: Option<String>,       // optional (nullable)
    #[default(0)]
    login_count: i64,          // optional (has default)
}

// Should produce a compile error naming `email`
toasty::create!(User, { name: "Carl" }).exec(&db).await;
}

Design

Generate a hidden ZST verification chain alongside each model. The create! macro expands to call the verifier in addition to the real builder. The verifier uses typestate to track which required fields have been set and #[diagnostic::on_unimplemented] to produce per-field error messages. The real builder is unchanged.

What makes a field “required”

A field requires explicit user input on create unless ANY of these hold:

  • The type is Option<T> (nullable)
  • The field has #[auto]
  • The field has #[default(...)]
  • The field has #[update(...)] (applied as default on create)
  • The field is a HasMany or HasOne relation (populated separately)

BelongsTo fields are required if their target is non-nullable (e.g., BelongsTo<User> is required, BelongsTo<Option<User>> is not). This matches the existing nullable detection via <T as Relation>::nullable().

Generated code

For a model with required fields name and email:

#![allow(unused)]
fn main() {
// ---- Marker types (defined once in toasty crate) ----

pub struct Set;
pub struct NotSet;

// ---- Generated by #[derive(Model)] on User ----

// One trait per required field with a custom diagnostic
#[doc(hidden)]
#[diagnostic::on_unimplemented(
    message = "cannot create `User`: required field `name` is not set",
    label = "call `.name(...)` before `.exec()`"
)]
pub trait __user_create_has_name {}
impl __user_create_has_name for Set {}

#[doc(hidden)]
#[diagnostic::on_unimplemented(
    message = "cannot create `User`: required field `email` is not set",
    label = "call `.email(...)` before `.exec()`"
)]
pub trait __user_create_has_email {}
impl __user_create_has_email for Set {}

// Verifier: all ZSTs, optimized away entirely
#[doc(hidden)]
pub struct __UserCreateVerify<Name = NotSet, Email = NotSet>(
    ::std::marker::PhantomData<(Name, Email)>,
);

impl __UserCreateVerify {
    pub fn new() -> Self {
        __UserCreateVerify(::std::marker::PhantomData)
    }
}

impl<Name, Email> __UserCreateVerify<Name, Email> {
    // Required field: transitions type param to Set
    pub fn name(self) -> __UserCreateVerify<Set, Email> {
        __UserCreateVerify(::std::marker::PhantomData)
    }

    pub fn email(self) -> __UserCreateVerify<Name, Set> {
        __UserCreateVerify(::std::marker::PhantomData)
    }

    // Optional fields: no type transition
    pub fn bio(self) -> Self { self }
    pub fn login_count(self) -> Self { self }

    // Relation fields (with_ variants used by create! for closures)
    pub fn todos(self) -> Self { self }
    pub fn with_todos(self) -> Self { self }
}

// check() only compiles when all required traits are satisfied
impl<Name, Email> __UserCreateVerify<Name, Email>
where
    Name: __user_create_has_name,
    Email: __user_create_has_email,
{
    pub fn check(self) {}
}

// Entry point on the model type — resolves through aliases
impl User {
    #[doc(hidden)]
    pub fn __verify_create() -> __UserCreateVerify {
        __UserCreateVerify::new()
    }
}
}

create! macro expansion

The create! macro emits the verification chain before the real builder. The verifier methods mirror the builder methods but take no arguments.

#![allow(unused)]
fn main() {
// Input:
toasty::create!(User, { name: "Carl", bio: "hello" })

// Expands to:
{
    // Compile-time verification (all ZST, erased entirely)
    User::__verify_create().name().bio().check();

    // Real builder (unchanged)
    User::create().name("Carl").bio("hello")
}
}

For type aliases (type Foo = User), Foo::__verify_create() resolves through the type system to User::__verify_create() — no naming conventions needed.

Error messages

Missing one field:

error[E0277]: cannot create `User`: required field `email` is not set
  --> src/main.rs:5:42
   |
5  |     create!(User, { name: "Carl" }).exec(&db).await;
   |                                     ^^^^ call `.email(...)` before `.exec()`

Missing multiple fields (Rust reports all unsatisfied bounds):

error[E0277]: cannot create `User`: required field `name` is not set
  --> src/main.rs:5:24
   |
5  |     create!(User, {}).exec(&db).await;
   |                       ^^^^ call `.name(...)` before `.exec()`

error[E0277]: cannot create `User`: required field `email` is not set
  --> src/main.rs:5:24
   |
5  |     create!(User, {}).exec(&db).await;
   |                       ^^^^ call `.email(...)` before `.exec()`

Scoped and batch create

For scoped creation (create!(user.todos(), { ... })), the create! macro cannot call __verify_create() on the scope expression. Verification only applies to the type-target form. This is acceptable: scoped creation already implies certain fields are set by the relation.

For batch creation (create!(User, [{ ... }, { ... }])), each item in the list gets its own verification chain.

#![allow(unused)]
fn main() {
// Input:
toasty::create!(User, [{ name: "Carl", email: "a@b.com" }, { name: "Bob", email: "b@c.com" }])

// Expands to:
{
    User::__verify_create().name().email().check();
    User::__verify_create().name().email().check();
    User::create_many()
        .with_item(|b| { let b = b.name("Carl").email("a@b.com"); b })
        .with_item(|b| { let b = b.name("Bob").email("b@c.com"); b })
}
}

Nested creation (closures)

The create! macro generates .with_field(|b| { ... }) for nested struct bodies. The verifier mirrors this with a no-arg .with_field() method that returns Self (identity for relation fields).

#![allow(unused)]
fn main() {
// Input:
toasty::create!(User, { name: "Carl", email: "a@b.com", todos: [{ title: "buy milk" }] })

// Verification chain:
User::__verify_create().name().email().with_todos().check();
}

Nested model verification (e.g., verifying Todo’s required fields within the closure) is not covered in this design. The nested model’s builder will catch missing fields at the database level as it does today.

Implementation Plan

Step 1: Add marker types to toasty crate

Add Set and NotSet ZSTs to toasty::codegen_support (the module re-exported for generated code).

File: crates/toasty/src/codegen_support.rs (or equivalent)

Step 2: Add is_required_on_create helper to codegen field

Add a method to Field in toasty-codegen that returns whether a field is required for creation. This centralizes the logic:

#![allow(unused)]
fn main() {
impl Field {
    pub fn is_required_on_create(&self) -> bool {
        // Relations: only BelongsTo can be required
        match &self.ty {
            FieldTy::HasMany(_) | FieldTy::HasOne(_) => return false,
            FieldTy::BelongsTo(rel) => return !rel.nullable,
            FieldTy::Primitive(_) => {}
        }

        // Skip auto, default, update fields
        if self.attrs.auto.is_some() {
            return false;
        }
        if self.attrs.default_expr.is_some() || self.attrs.update_expr.is_some() {
            return false;
        }

        // Check if the Rust type is Option<T>
        // (For non-serialized fields, Primitive::NULLABLE handles this at
        // runtime, but we need a syntactic check at codegen time.)
        if let FieldTy::Primitive(ty) = &self.ty {
            if is_option_type(ty) {
                return false;
            }
        }

        true
    }
}
}

The is_option_type helper already exists in the codebase (used by serialize field codegen). Extract it to a shared location if not already shared.

Step 3: Generate verifier in expand/create.rs

Add a new method expand_create_verifier to Expand that generates:

  1. One __model_create_has_{field} trait per required field with #[diagnostic::on_unimplemented]
  2. The __ModelCreateVerify struct with type params for required fields
  3. new(), field methods (required → type transition, optional → identity), and check() with trait bounds
  4. The __verify_create() associated function on the model impl

Call expand_create_verifier() from the model’s root expansion alongside expand_create_builder().

Step 4: Update create! macro expansion

In crates/toasty-macros/src/create/expand.rs, modify the expand function to emit the verification chain before the builder chain.

For Target::Type with CreateItem::Single:

#![allow(unused)]
fn main() {
// Verification chain: Type::__verify_create().field1().field2().check();
// Builder chain:      Type::create().field1(val1).field2(val2)
}

For Target::Type with CreateItem::List, emit one verification chain per item.

For Target::Scope, emit only the builder chain (no verification).

The verification field calls mirror the builder field calls but drop the arguments. For CreateItem::Single, each field becomes .field_name(). For CreateItem::List and nested structs, the with_* closure is replaced by a simple .with_field_name() call.

Step 5: Tests

Add compile-fail tests that verify:

  • Missing a single required field → error naming the field
  • Missing multiple required fields → errors naming each field
  • Optional fields can be omitted without error
  • #[auto] fields can be omitted without error
  • #[default] fields can be omitted without error
  • #[update] fields can be omitted without error
  • All fields provided → compiles successfully
  • Type aliases work (type Foo = User; create!(Foo, { ... }))

Limitations

  • Scope targets: create!(user.todos(), { ... }) does not get verification. The scope expression is not a type path, so we cannot call __verify_create() on it.

  • Nested models: Required fields on nested models (inside closures) are not verified by this mechanism. They continue to rely on database constraint errors.

  • Direct builder API: Users who call User::create().name("Carl").exec() without the create! macro do not get verification. The public builder is unchanged. This is intentional — the macro is the recommended API, and changing the builder’s type signature would be a larger change.

  • diagnostic::on_unimplemented support: This attribute is stable since Rust 1.78. The custom message and label fields are respected by rustc. Third-party tools (rust-analyzer, older compilers) may show a generic trait bound error instead of the custom message.

Files Modified

FileChange
crates/toasty/src/codegen_support.rsAdd Set, NotSet marker types
crates/toasty-codegen/src/schema/field.rsAdd is_required_on_create() method
crates/toasty-codegen/src/expand/create.rsAdd expand_create_verifier()
crates/toasty-codegen/src/expand/mod.rsCall expand_create_verifier() from root expansion
crates/toasty-macros/src/create/expand.rsEmit verification chain in expand()

create! Macro v2

Redesign of the create! macro syntax to support mixed-type batch creation, better disambiguation between type targets and scope targets, and compile-time required field verification.

Syntax

Single creation (struct-literal form)

#![allow(unused)]
fn main() {
toasty::create!(User { name: "Carl", email: "carl@example.com" })
}

No comma between the type path and {. This is visually identical to Rust’s struct literal syntax, making it immediately recognizable.

Scoped creation (in keyword)

#![allow(unused)]
fn main() {
toasty::create!(in user.todos() { title: "buy milk" })
}

The in keyword prefixes the scope expression, unambiguously marking it as a scope target. No comma is needed — in is not a valid start of a type path or expression in this position, so it cleanly disambiguates.

The scope expression after in is parsed with Expr::parse_without_eager_brace (from syn). This prevents the parser from consuming the { fields } body as part of the expression — the same technique Rust uses for for pat in expr { body }. A bare { can only start an expression as a block or struct literal; parse_without_eager_brace suppresses struct literal parsing, and a block would require ; or a trailing expression, so the field body { name: "Carl" } is never ambiguous with the scope expression.

Batch creation (same type shorthand)

#![allow(unused)]
fn main() {
toasty::create!(User::[
    { name: "Carl", email: "carl@example.com" },
    { name: "Alice", email: "alice@example.com" },
])
}

Type::[items] creates multiple records of the same type. The :: makes this syntactically distinct from both the struct-literal form and array indexing.

Batch creation (mixed types)

#![allow(unused)]
fn main() {
toasty::create!([
    User { name: "Carl", email: "carl@example.com" },
    Article { title: "Hello World", author: &carl },
])
}

A bare [items] where each item is a struct-literal form or a scoped in creation. This leverages the batch infrastructure (IntoStatement tuple/vec) to compose multiple inserts of different types into a single batch operation.

Scoped items can be mixed into the batch:

#![allow(unused)]
fn main() {
toasty::create!([
    User { name: "Carl", email: "carl@example.com" },
    in user.friends() { name: "Bob" },
])
}

Parsing Strategy

The macro input starts with one of four forms, distinguished by the first tokens:

First tokensFormTarget
Path {Single creationType
inScoped creationScope
Path :: [Same-type batchType
[Mixed-type batchMultiple types

Parsing steps:

  1. If input starts with [ → mixed-type batch
  2. If input starts with in → scoped creation: call Expr::parse_without_eager_brace for the scope expression, then parse { fields }
  3. Otherwise, parse as syn::Path:
    • If followed by { → single creation (struct-literal form)
    • If followed by :: [ → same-type batch

Inside a [ batch list, each item is parsed with the same disambiguation: in prefix → scoped item, Path { → type-target item.

Expansion

Single creation

#![allow(unused)]
fn main() {
// Input:
toasty::create!(User { name: "Carl", email: "carl@example.com" })

// Expands to:
{
    User::__verify_create().name().email().check();
    User::create().name("Carl").email("carl@example.com")
}
}

Returns a UserCreate builder. The caller chains .exec(&db) to execute.

Scoped creation

#![allow(unused)]
fn main() {
// Input:
toasty::create!(in user.todos() { title: "buy milk" })

// Expands to:
user.todos().create().title("buy milk")
}

No verification chain — the scope expression is not a type path, and the relation context already implies certain fields.

Same-type batch

#![allow(unused)]
fn main() {
// Input:
toasty::create!(User::[
    { name: "Carl", email: "carl@example.com" },
    { name: "Alice", email: "alice@example.com" },
])

// Expands to:
{
    User::__verify_create().name().email().check();
    User::__verify_create().name().email().check();
    (
        User::create().name("Carl").email("carl@example.com"),
        User::create().name("Alice").email("alice@example.com"),
    )
}
}

Returns a tuple of create builders. Each item gets its own verification chain. All batch forms expand to tuples of builders, which compose with toasty::batch() for execution. CreateMany / create_many() are deprecated and not used in new expansions.

Mixed-type batch

#![allow(unused)]
fn main() {
// Input:
toasty::create!([
    User { name: "Carl", email: "carl@example.com" },
    Article { title: "Hello World" },
])

// Expands to:
{
    User::__verify_create().name().email().check();
    Article::__verify_create().title().check();
    (
        User::create().name("Carl").email("carl@example.com"),
        Article::create().title("Hello World"),
    )
}
}

Returns a tuple of create builders (UserCreate, ArticleCreate). The caller passes the tuple to toasty::batch() for combined execution:

#![allow(unused)]
fn main() {
let (user, article) = toasty::batch(
    toasty::create!([
        User { name: "Carl", email: "carl@example.com" },
        Article { title: "Hello World" },
    ])
).exec(&mut db).await?;
}

Mixed batch with scoped items

#![allow(unused)]
fn main() {
// Input:
toasty::create!([
    User { name: "Carl", email: "carl@example.com" },
    in carl.todos() { title: "buy milk" },
])

// Expands to:
{
    User::__verify_create().name().email().check();
    (
        User::create().name("Carl").email("carl@example.com"),
        carl.todos().create().title("buy milk"),
    )
}
}

Scoped items in a batch do not get verification chains (same as standalone scoped creation). Type-target items get verification as usual.

All batch forms (same-type and mixed-type) produce tuples of builders. This composes naturally with toasty::batch(), which already accepts tuples via IntoStatement. CreateMany / create_many() are not used — all batching goes through toasty::batch().

Compile-Time Required Field Verification

See create-macro-required-field-verification.md for the full design. Summary:

  • #[derive(Model)] generates a hidden __verify_create() method on each model that returns a ZST verifier with typestate tracking
  • Required field methods transition type params from NotSet to Set
  • Optional field methods return Self (identity)
  • check() is only available when all required-field traits are satisfied
  • #[diagnostic::on_unimplemented] gives per-field error messages
  • The create! macro emits verification chains before the builder chains
  • Verification is only emitted for type-target forms (single, same-type batch, mixed-type batch), not scoped creation

Nested Creation

Nested struct bodies and relation lists work the same as today within each item:

#![allow(unused)]
fn main() {
toasty::create!(User {
    name: "Carl",
    email: "carl@example.com",
    todos: [
        { title: "buy milk" },
        { title: "write code" },
    ],
})
}

The verification chain for nested bodies calls the relation method as a no-op:

#![allow(unused)]
fn main() {
User::__verify_create().name().email().with_todos().check();
}

Nested model verification (e.g., Todo’s required fields) is not covered by the verification chain. The nested model’s builder catches missing fields at the database level.

Migration from v1

Breaking changes

v1 syntaxv2 syntax
create!(User, { name: "Carl" })create!(User { name: "Carl" })
create!(user.todos(), { ... })create!(in user.todos() { ... })
create!(User, [{ ... }, { ... }])create!(User::[ { ... }, { ... } ])

The v1 type-target forms (create!(User, { ... }) and create!(User, [...])) are removed. The scope form now uses the in keyword prefix instead of a comma separator.

Implementation Plan

Phase 1: Macro v2 syntax

Step 1: Update create! macro parser

Rewrite crates/toasty-macros/src/create/parse.rs to handle the four forms:

  1. [ → mixed-type batch
  2. in expr { ... } → scoped creation
  3. Path { → single creation
  4. Path :: [ → same-type batch

Update Target enum and CreateInput to represent the new forms.

Step 2: Update create! macro expansion

Rewrite crates/toasty-macros/src/create/expand.rs to generate:

  • Builder chains as today
  • Tuple output for batch forms

No verification chains yet — those are added in phase 2.

Step 3: Update existing tests and examples

All existing create! usages need to be updated to the new syntax. This includes:

  • Integration tests in crates/toasty-driver-integration-suite/src/tests/
  • Examples in examples/
  • Benchmarks

Step 4: Add syntax tests

  • Tests for each syntax form (single, scoped, same-type batch, mixed-type batch)
  • Type alias tests (type Foo = User; create!(Foo { ... }))

Phase 2: Compile-time required field verification

(From create-macro-required-field-verification.md)

Step 5: Implement verification codegen

  • Add Set/NotSet markers to toasty::codegen_support
  • Add is_required_on_create() to codegen Field
  • Generate verifier struct, traits, and __verify_create() in expand/create.rs

Step 6: Wire verification into create! expansion

Update macro expansion to emit __verify_create() chains before builder chains for type-target forms (single, same-type batch, mixed-type batch). Scoped creation is unchanged.

Step 7: Add verification tests

  • Compile-fail tests for missing required fields
  • Tests verifying optional fields can be omitted without error

DynamoDB: OR Predicates in Index Key Conditions

Problem

DynamoDB’s KeyConditionExpression does not support OR — neither for partition keys nor sort keys. This means queries like WHERE user_id = 1 OR user_id = 2 on an indexed field are currently broken for DynamoDB.

The engine must detect OR in index key conditions and fan them out into N individual DynamoDB Query calls — one per OR branch — then concatenate the results.

A secondary motivation: the batch-load mechanism used for nested association preloads (rewrite_stmt_query_for_batch_load_nosql) produces ANY(MAP(arg[input], pred)), which at exec time expands to OR via simplify_expr_any. This hits the same DynamoDB restriction and is addressed by the same fix.

Where OR Can Reach a Key Condition

Only two engine actions use KeyConditionExpression:

  • QueryPk — queries the primary table when exact PK keys cannot be extracted
  • FindPkByIndex — queries a GSI to retrieve primary keys

GetByKey uses BatchGetItem (explicit key values, no expression), so OR is never relevant there. pk = v1 OR pk = v2 on the primary key produces IndexPlan.key_values = Some([v1, v2]), routing to GetByKey directly — no issue.

QueryPk

OR reaches QueryPk.pk_filter when IndexPlan.key_values is None:

  • User-specified OR on sort key: WHERE pk = v AND (sk >= s1 OR sk >= s2) — range predicates have no extractable key values.
  • Batch-load (e.g. a HasMany where the FK is the partition key of the child’s composite primary key): rewrite_stmt_query_for_batch_load_nosql produces ANY(MAP(arg[input], fk = arg[0])). The list is a runtime input, so key_values is None. At exec time simplify_expr_any expands it to OR.

FindPkByIndex

FindPkByIndex.filter is the output of partition_filter, which isolates index key conditions from non-key conditions. partition_filter on AND distributes cleanly: status = active AND (user_id = 1 OR user_id = 2) produces index_filter = user_id = 1 OR user_id = 2 and result_filter = status = active.

OR reaches it in the same two ways as QueryPk:

  • User-specified OR: WHERE user_id = 1 OR user_id = 2 on a GSI partition key.
  • Batch-load: same ANY(MAP(arg[input], pred)) expansion path as above.

Mixed OR Operands

partition_filter currently has a todo!() for OR operands that contain both index and non-index parts — e.g. (pk = 1 AND status = a) OR pk = 2.

This is in scope. Strategy:

  • Extract key conditions from each OR branch to build the fan-out: ANY(MAP([1, 2], pk = arg[0]))
  • Apply the full original predicate as an in-memory post-filter: (pk = 1 AND status = a) OR pk = 2

This is conservative but correct, and consistent with how post_filter is already used.

Canonical Form: ANY(MAP(key_list, per_call_pred))

All OR cases are represented uniformly as ANY(MAP(key_list, per_call_pred)):

  • key_list — one entry per required Query call; each entry has one value per key column (scalar for partition-key-only, Value::Record for partition + sort key)
  • per_call_pred — the key condition for one call, referencing element fields as arg[0], arg[1], …

Single key columnuser_id = 1 OR user_id = 2:

ANY(MAP([1, 2], user_id = arg[0]))

Composite key(todo_id = t1 AND step_id >= s1) OR (todo_id = t2 AND step_id >= s2):

ANY(MAP([(t1, s1), (t2, s2)], todo_id = arg[0] AND step_id >= arg[1]))

Batch-loadANY(MAP(arg[input], todo_id = arg[0])) — already in canonical form; no structural change needed, only the exec fan-out behavior changes.

Design

1. Capability Flag

#![allow(unused)]
fn main() {
/// Whether OR is supported in index key conditions (e.g. DynamoDB KeyConditionExpression).
pub index_or_predicate: bool,
}

DynamoDB: false. All other backends: true (SQL backends never use these actions).

2. IndexPlan Output Contract

#![allow(unused)]
fn main() {
pub(crate) struct IndexPlan<'a> {
    pub(crate) index: &'a Index,

    /// Filter to push to the index. Guaranteed form:
    ///
    /// | Condition                          | Form                                             |
    /// |------------------------------------|--------------------------------------------------|
    /// | No OR                              | plain expr — `user_id = 1`                       |
    /// | OR, `index_or_predicate = true`    | `Expr::Or([branch1, branch2, ...])`              |
    /// | OR, `index_or_predicate = false`   | `ANY(MAP(Value::List([v1, ...]), per_call_pred))` |
    /// | Batch-load (any capability)        | `ANY(MAP(arg[input], per_call_pred))`            |
    pub(crate) index_filter: stmt::Expr,

    /// Non-index conditions applied in-memory after results return from each call.
    pub(crate) result_filter: Option<stmt::Expr>,

    /// Full original predicate applied after all fan-out results are collected.
    /// Set for mixed OR operands — see §"Mixed OR Operands".
    pub(crate) post_filter: Option<stmt::Expr>,

    /// Literal key values for direct lookup: a `Value::List` of `Value::Record` entries,
    /// one per lookup. Set by `partition_filter` when all key columns have literal equality
    /// matches. When `Some`, the planner routes to `GetByKey` and ignores `index_filter`.
    /// May coexist with a canonical `ANY(MAP(...))` `index_filter` — both are produced
    /// simultaneously by `partition_filter`; the planner always prefers `GetByKey`.
    pub(crate) key_values: Option<stmt::Value>,
}
}

Planner routing (primary key path):

key_values.is_some()          → GetByKey (BatchGetItem)
index_filter = ANY(MAP(...))  → fan-out via QueryPk × N
otherwise                     → single QueryPk call

3. Key Value Extraction in index_match

partition_filter extracts literal key values during filter partitioning, setting key_values when all key columns have literal equality matches. This replaces the current try_build_key_filter (kv.rs) post-hoc re-analysis of index_filter.

What moves into index_match: walking each OR branch, reading the RHS of each key column’s equality predicate, assembling Value::List([Value::Record([v0, ...]), ...]).

What stays in the planner: constructing eval::Func from key_values to drive the GetByKey operation — a mechanical wrap requiring no further expression analysis.

Why this matters for ordering: if partition_filter produced the canonical ANY(MAP([1,2], pk=arg[0])) form first, the downstream try_build_key_filter Or arm would never fire, silently breaking the GetByKey path for primary key OR queries. Extracting key values inside partition_filter eliminates this conflict — both outputs are produced together.

4. Planner Invariant

When !capability.index_or_predicate, neither FindPkByIndex.filter nor QueryPk.pk_filter contains Expr::Or. OR is always restructured into ANY(MAP(arg[i], per_call_pred)) by partition_filter before reaching the exec layer.

Batch-load pathANY(MAP(...)) is already produced upstream; the invariant holds. Only the exec fan-out needs fixing.

User-specified OR pathpartition_filter produces canonical form directly. The planner consumes IndexPlan.index_filter as-is; no rewrite in plan_secondary_index_execution or plan_primary_key_execution. For mixed OR operands, partition_filter additionally sets IndexPlan.post_filter to the full original predicate.

5. Exec Fan-out

Both action_find_pk_by_index and action_query_pk receive the same treatment.

After substituting inputs into the filter, check for ANY(MAP(arg[i], per_call_pred)):

  • If present: iterate over input[i] element by element; substitute each into per_call_pred and issue one driver call; concatenate results. Do not call simplify_expr_any — it would re-expand to OR.
  • Otherwise: unchanged single-call path.

6. DynamoDB Driver

Revert the temporary OR-splitting workaround in exec_find_pk_by_index. The driver is a dumb executor of a single valid key condition.

Summary of Changes

LocationChange
CapabilityAdd index_or_predicate: bool; false for DynamoDB
IndexPlanAdd key_values: Option<stmt::Value> field
index_match / partition_filterOr arm: produce canonical ANY(MAP(...)) when !index_or_predicate; extract key_values; fix mixed OR todo!()
plan_primary_key_executionRoute on key_values / ANY(MAP(...)) instead of calling try_build_key_filter
plan_secondary_index_executionNo rewrite needed; consumes IndexPlan.index_filter as-is
kv.rs / try_build_key_filterRemove (literal case now handled by index_match)
action_find_pk_by_indexFan out over ANY(MAP(...)) — one driver call per element; skip simplify_expr_any
action_query_pkSame fan-out treatment
DynamoDB exec_find_pk_by_indexRevert OR-splitting workaround

Data-Carrying Enum Implementation Design

Builds on unit enum support (#355). See docs/design/enums-and-embedded-structs.md for the user-facing design.

Value Stream Encoding

Unit and data variants are encoded differently in the value stream:

  • Unit variant: Value::I64(discriminant) — unchanged from unit enum encoding
  • Data variant: Value::Record([I64(discriminant), ...active_field_values])

Only the active variant’s fields appear in the record; inactive variant columns (NULL in the DB) are not included. Primitive::load dispatches on the value type:

I64(d)      => unit variant with discriminant d
Record(r)   => data variant; r[0] is the discriminant, r[1..] are fields

Schema Changes

EnumVariant gains a fields: Vec<Field> — the same Field type used by EmbeddedStruct. Field indices are assigned globally across all variants within the enum, keeping FieldId { model: enum_id, index } as a unique identifier consistent with how EmbeddedStruct works. The primary_key, auto, and constraints members of Field are always false/None/[] for variant fields.

Primitive::ty() changes based on variant content:

  • Unit-only enum → Type::I64 (unchanged)
  • Any data variant present → Type::Model(Self::id()), same as embedded structs

Codegen Changes

Parsing: toasty-codegen/src/schema/ parses variant fields and includes them in EmbeddedEnum registration so the runtime schema is complete.

Primitive::load: generated arms dispatch on value type first (I64 vs Record), then on the discriminant within each branch. Data variant arms load each field from its positional index in the record.

IntoExpr: unit variants emit Value::I64(disc) as today; data variants emit Value::Record([I64(disc), field_exprs...]).

{Enum}Fields struct: all enums (unit-only and data-carrying) generate a {Enum}Fields struct with is_{variant}() methods for discriminant-only filtering. For data-carrying enums, is_{variant}() uses project(path, [0]) to extract the discriminant from the record representation. For unit-only enums, it compares the path directly. The struct also delegates comparison methods (eq, ne, etc.) to Path<Self>.

Engine: Expr::Match

Both table_to_model and model_to_table are expressed using:

Match { subject: Expr, arms: [(pattern: Value, expr: Expr)], else_expr: Expr }

Expr::Match is never serialized to SQL — it is either evaluated in the engine (for writes) or eliminated by the simplifier before the plan stage (for reads/queries).

table_to_model

For an enum field, table_to_model emits a Match on the discriminator column. Each arm produces the value shape Primitive::load expects: unit arms emit I64(disc), data arms emit Record([I64(disc), ...field_col_refs]).

else branch: Expr::Error

The else branch of an enum Match represents the case where the discriminant column holds an unrecognized value — semantically unreachable for well-formed data.

For data-carrying enums, the else branch is Record([disc_col, Error, ...Error]) — the same Record shape as data arms, but with Expr::Error in every field slot. This design is critical for the simplifier: projections distribute uniformly into the else branch, and field-slot projections yield Expr::Error (correct: accessing a field on an unknown variant is an error), while discriminant projections ([0]) yield disc_col (the same as every arm). This enables the uniform-arms optimization to fire after projection.

For unit-only enums with data variants, else is Expr::Error directly.

model_to_table

Runs the inverse: the incoming value (I64 or Record) is matched on its discriminant, and each arm emits a flat record of all enum columns in DB order — setting the discriminator and active variant fields, and NULLing every inactive variant column. This NULL-out is mandatory: because writes may not have a loaded model, the engine has no knowledge of the prior variant and must clear all non-active columns unconditionally.

Simplifier Rules

Project into Match (expr_project.rs)

Distributes a projection into each Match arm AND the else branch:

project(Match(subj, [p => e, ...], else), [i])
  → Match(subj, [p => project(e, [i]), ...], else: project(else, [i]))

Projection is pushed into the else branch unconditionally — Expr::Error inside a Record is handled naturally (projecting [0] out of Record([disc, Error]) yields disc; projecting [1] yields Error).

Uniform arms (expr_match.rs)

When all arms AND the else branch produce the same expression, the Match is redundant:

Match(subj, [1 => disc, 2 => disc], else: disc)  →  disc

The else branch MUST equal the common arm expression for this rule to fire. This makes the transformation provably correct — no branch is dropped that could produce a different value.

Match elimination in binary ops (expr_binary_op.rs)

Distributes a binary op over match arms, producing an OR of guarded comparisons. The else branch is included with a negated guard:

Match(subj, [p1 => e1, p2 => e2], else: e3) == rhs
  → OR(subj == p1 AND e1 == rhs,
       subj == p2 AND e2 == rhs,
       subj != p1 AND subj != p2 AND e3 == rhs)

Each term is fully simplified inline. Terms that fold to false/null are pruned. No special handling is needed for the else branch — it is always included and existing simplification rules handle Expr::Error naturally (see below).

Expr::Error semantics

Expr::Error is treated as “unreachable” — not as a poison value that propagates. No special Error propagation rules exist. Instead, existing rules eliminate Error through the surrounding context:

  • Data-carrying enum else: Record([disc, Error, ...]). After tuple decomposition, the guard disc != p1 AND disc != p2 contradicts the decomposed disc == c from the comparison target. The contradicting equality rule (a == c AND a != c → false) folds the AND to false.

  • false AND (Error == x): The false short-circuit in AND eliminates the term without needing to simplify Error == x.

  • Record([1, Error]) == Record([0, "alice"]): Tuple decomposition produces 1 == 0 AND Error == "alice". The 1 == 0 → false folds the AND to false.

In all well-formed cases, the guard constraints around Error cause the branch to be pruned without requiring Error-specific rules.

Type inference for Expr::Error

Expr::Error infers as Type::Unknown. TypeUnion::insert skips Unknown, so an Error branch in a Match doesn’t widen the inferred type union.

Variant-only filter flow

is_email() generates eq(project(path, [0]), I64(1)). After lowering:

eq(project(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
                 else: Record([disc, Error])), [0]),
   I64(1))
  1. Project-into-Match distributes [0] into all branches including else
  2. project(Record([disc, addr]), [0])disc (for each arm)
  3. project(Record([disc, Error]), [0])disc (for else)
  4. Uniform-arms fires: all arms AND else produce disc → folds to disc
  5. Result: eq(disc, I64(1)) — a clean disc_col = 1 predicate

Full-value equality filter flow

contact().eq(ContactInfo::Email { address: "alice@example.com" }) generates eq(path, Record([I64(1), "alice@example.com"])). After lowering:

eq(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
         else: Record([disc, Error])),
   Record([I64(1), "alice@example.com"]))
  1. Match elimination distributes eq into each arm AND else as OR
  2. disc == 1 AND Record([disc, addr]) == Record([I64(1), "alice"]) → simplifies
  3. disc == 2 AND Record([disc, num]) == Record([I64(1), "alice"]) → false (pruned)
  4. Else: disc != 1 AND disc != 2 AND Record([disc, Error]) == Record([I64(1), "alice"]) → tuple decomposition: disc != 1 AND disc != 2 AND disc == 1 AND Error == "alice" → contradicting equality (disc == 1 AND disc != 1) → false (pruned)
  5. Result: disc_col = 1 AND addr_col = 'alice@example.com'

Correctness Sharp Edges

Whole-variant replacement must NULL all inactive columns. The engine has no knowledge of the prior variant for query-based updates, so the model_to_table arms unconditionally NULL every column they do not own.

NULL discriminators are disallowed. The discriminator column carries NOT NULL, consistent with unit enums today. Option<Enum> support is a future concern.

Unknown discriminants fail at load time. An unrecognized discriminant (e.g. from a newer schema version) produces a runtime error via Expr::Error. Removing a variant requires a data migration.

No DB-level integrity for active variant fields. All variant columns are nullable (to accommodate inactive variants), so a NULL in a required active field is caught only at load time by Primitive::load, not at write time.

DynamoDB

Equivalent encoding to be determined when implementing the DynamoDB driver phase.

Implementation Status

Completed

  1. Schema: fields: Vec<Field> on EnumVariant; codegen parsing; Primitive::ty() returns Type::Model for data-carrying enums.

  2. Value encoding: Primitive::load() dispatches on I64 vs Record; IntoExpr emits Record for data variants.

  3. Expr::Match + Expr::Error: Match/MatchArm AST nodes with visitors, eval, and simplifier integration. Expr::Error for unreachable branches. build_table_to_model_field_enum uses Record([disc, Error, ...]) for the else branch.

  4. Simplifier: project-into-Match distribution; uniform-arms folding (with else-branch check); Match-to-OR elimination in binary ops; case distribution for binary ops with Match operands.

  5. {Enum}Fields codegen: all enums generate a fields struct with is_{variant}() methods and delegated comparison methods.

  6. Integration tests: CRUD for data-carrying enums; full-value equality filter; variant-only filter (is_email()); unit enum variant filter (is_pending()).

  7. Variant+field filter (contact().email().matches(|e| e.address().eq("x"))): per-variant field accessors with closure-based .matches() API.

  8. OR tautology elimination: is_variant(x, 0) or is_variant(x, 1) covering all variants of an enum folds to true in the OR simplifier.

Remaining

  • Partial updates: within-variant partial update builder.

  • DynamoDB: equivalent encoding in the DynamoDB driver.

Open Questions

  • SparseRecord / reload: within-variant partial updates are supported, so SparseRecord and reload are needed for enum variant fields. Determine how reload should handle a SparseRecord scoped to a specific variant’s fields — the in-memory model must update only the changed fields without disturbing the discriminant or other variant columns.

  • Shared columns: variants sharing a column via #[column("name")] is in the user-facing design. Schema parsing should record shared columns in Phase 1; full query support is a follow-on.

Enum and Embedded Struct Support

Addresses Issue #280.

Scope

Add support for:

  1. Enum types as model fields (unit, tuple, struct variants)
  2. Embedded structs (no separate table, stored inline)

Both use #[derive(toasty::Embed)].

Storage Strategy

Flattened storage:

  • Enums: Discriminator column + nullable columns per variant field
    • INTEGER discriminator with required #[column(variant = N)] on each variant
    • Works uniformly across all databases (PostgreSQL, MySQL, SQLite, DynamoDB)
  • Embedded structs: No discriminator, just flattened fields

Unit-only enums: No columns - stored as the INTEGER value itself.

Post-MVP: Native ENUM types for PostgreSQL/MySQL discriminators (optimization).

Column Naming

Pattern: {field}_{variant}_{name}

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    critter: Creature,  // field name
}

#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },      // variant, field
    #[column(variant = 2)]
    Lizard { habitat: String },
}

// Columns:
// - critter (discriminator)
// - critter_human_profession
// - critter_lizard_habitat
}

Customization

Rename field (at enum definition):

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },
    #[column(variant = 2)]
    Lizard {
        #[column("lizard_env")]  // Must include variant scope
        habitat: String,
    },
}
// → critter_lizard_env (field prefix "critter" is prepended)
}

Custom column names for enum variant fields must include the variant scope. The pattern becomes {field}_{custom_name} where custom_name should include the variant portion.

Rename field prefix (per use):

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    #[column("creature_type")]
    critter: Creature,
}
// → creature_type (discriminator)
// → creature_type_human_profession (field prefix replaced for all columns)
// → creature_type_lizard_habitat
}

The #[column("name")] attribute on the parent struct’s field replaces the field prefix for all generated columns.

Customize discriminator type (on enum definition):

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")]
enum Creature { ... }
}

The #[column(type = "...")] attribute on the enum type customizes the database type for the discriminator column (e.g., “bigint”, “smallint”, “tinyint”).

Tuple Variants

Numeric field naming: {field}_{variant}_{index}

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
    #[column(variant = 1)]
    Phone(String, String),
}
// Columns: contact, contact_phone_0, contact_phone_1
}

Customize with #[column("...")]:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
    #[column(variant = 1)]
    Phone(
        #[column("phone_country")]
        String,
        #[column("phone_number")]
        String,
    ),
}
// Columns: contact, contact_phone_country, contact_phone_number
}

Nested Types

Path flattened with underscores:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
    #[column(variant = 1)]
    Mail { address: Address },
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}

// → contact_mail_address_city
// → contact_mail_address_street
}

Shared Columns Across Variants

Multiple variants can share the same column by specifying the same #[column("name")]:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
    #[key]
    #[auto]
    id: u64,
    creature: Creature,
}

#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human {
        #[column("name")]
        name: String,
        profession: String,
    },
    #[column(variant = 2)]
    Animal {
        #[column("name")]
        name: String,
        species: String,
    },
}

// Columns:
// - creature (discriminator)
// - creature_name (shared between Human and Animal)
// - creature_human_profession
// - creature_animal_species
}

Requirements:

  • Fields sharing a column must have compatible types (validated at schema build time)
  • The shared column name must be identical across variants
  • Compatible types: same primitive type, or compatible type conversions
  • Shared columns are still nullable at the database level (NULL when variant doesn’t use that field)

Discriminator Types

MVP: INTEGER discriminator for all databases

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },
    #[column(variant = 2)]
    Lizard { habitat: String },
}
}

All variants require #[column(variant = N)] with unique integer values. Compile error if missing.

Customize discriminator type:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")]  // Or "smallint", "tinyint", etc.
enum Creature {
    #[column(variant = 1)]
    Human { profession: String },
    #[column(variant = 2)]
    Lizard { habitat: String },
}
}

The #[column(type = "...")] attribute on the enum customizes the database type for the discriminator column.

Post-MVP: Native ENUM types for PostgreSQL/MySQL

CREATE TYPE creature AS ENUM ('Human', 'Lizard');

Can customize with #[column(variant = "name")] on variants.

NULL Handling

Inactive variant fields are NULL.

-- When critter = 'Human':
critter_human_profession = 'Knight'
critter_lizard_habitat = NULL

For Option<T> fields: Check discriminator first, then interpret NULL.

Usage

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    address: Address,  // embedded struct
    status: Status,    // embedded enum
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}

#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active { since: DateTime },
}
}

Registration: Automatic. db.register::<User>() transitively registers all nested embedded types.

Relations: Forbidden in embedded types (compile error).

Examples

Basic Enum

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
    #[key]
    #[auto]
    id: u64,
    status: Status,
}

#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active,
    #[column(variant = 3)]
    Done,
}
}

Schema:

CREATE TABLE task (
    id INTEGER PRIMARY KEY,
    status INTEGER NOT NULL
);
-- 1=Pending, 2=Active, 3=Done (requires #[column(variant = N)])

Data-Carrying Enum

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    contact: ContactMethod,
}

#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Phone { country: String, number: String },
}
}

Schema:

CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    contact INTEGER NOT NULL,
    contact_email_address TEXT,
    contact_phone_country TEXT,
    contact_phone_number TEXT
);

Embedded Struct

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    address: Address,
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}
}

Schema:

CREATE TABLE user (
    id INTEGER PRIMARY KEY,
    address_street TEXT NOT NULL,
    address_city TEXT NOT NULL,
    address_zip TEXT NOT NULL
);

Nested Enum + Embedded

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Mail { address: Address },
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}
}

Schema:

-- contact: ContactInfo
contact INTEGER NOT NULL,
contact_email_address TEXT,
contact_mail_address_street TEXT,
contact_mail_address_city TEXT

Querying

Basic variant checks

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
    #[key]
    #[auto]
    id: u64,
    status: Status,
}

#[derive(toasty::Embed)]
enum Status {
    #[column(variant = 1)]
    Pending,
    #[column(variant = 2)]
    Active,
    #[column(variant = 3)]
    Done,
}

// Query by variant (shorthand)
Task::all().filter(Task::FIELDS.status().is_pending())
Task::all().filter(Task::FIELDS.status().is_active())

// Equivalent using .matches() without field conditions
Task::all().filter(
    Task::FIELDS.status().matches(Status::VARIANTS.pending())
)
}

Field access on variant fields

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    contact: ContactMethod,
}

#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Phone { country: String, number: String },
}

// Match specific variants and access their fields
User::all().filter(
    User::FIELDS.contact().matches(
        ContactMethod::VARIANTS.email().address().contains("@gmail")
    )
)

User::all().filter(
    User::FIELDS.contact().matches(
        ContactMethod::VARIANTS.phone().country().eq("US")
    )
)

// Shorthand for variant-only checks (no field conditions)
User::all().filter(User::FIELDS.contact().is_email())
User::all().filter(User::FIELDS.contact().is_phone())

// Equivalent using .matches()
User::all().filter(
    User::FIELDS.contact().matches(ContactMethod::VARIANTS.email())
)
}

Embedded struct field constraints

Embedded struct fields can be accessed directly for filtering, ordering, and other query operations:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    address: Address,
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

// Filter by embedded struct fields
User::all().filter(User::FIELDS.address().city().eq("Seattle"))
User::all().filter(User::FIELDS.address().zip().like("98%"))

// Multiple constraints on embedded struct
User::all().filter(
    User::FIELDS.address().city().eq("Seattle")
        .and(User::FIELDS.address().zip().like("98%"))
)

// Order by embedded struct fields
User::all().order_by(User::FIELDS.address().city().asc())

// Select embedded struct fields (projection)
User::all()
    .select(User::FIELDS.id())
    .select(User::FIELDS.address().city())
}

Nested embedded structs

For nested embedded types, continue chaining field accessors:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Company {
    #[key]
    #[auto]
    id: u64,
    headquarters: Office,
}

#[derive(toasty::Embed)]
struct Office {
    name: String,
    location: Address,
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

// Access nested embedded struct fields
Company::all().filter(
    Company::FIELDS.headquarters().location().city().eq("Seattle")
)

Company::all().filter(
    Company::FIELDS.headquarters().name().eq("Main Office")
        .and(Company::FIELDS.headquarters().location().zip().like("98%"))
)
}

Combining enum and embedded struct constraints

When an enum variant contains an embedded struct, use .matches() to specify the variant, then access the embedded struct’s fields:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: u64,
    contact: ContactInfo,
}

#[derive(toasty::Embed)]
enum ContactInfo {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Mail { address: Address },
}

#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
}

// Filter by embedded struct fields within enum variant
User::all().filter(
    User::FIELDS.contact().matches(
        ContactInfo::VARIANTS.mail().address().city().eq("Seattle")
    )
)

// Multiple constraints on embedded struct within variant
User::all().filter(
    User::FIELDS.contact().matches(
        ContactInfo::VARIANTS.mail()
            .address().city().eq("Seattle")
            .address().street().contains("Main")
    )
)
}

Constraints with shared columns

When enum variants share columns, constraints apply based on the variant being matched:

#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
    #[key]
    #[auto]
    id: u64,
    creature: Creature,
}

#[derive(toasty::Embed)]
enum Creature {
    #[column(variant = 1)]
    Human {
        #[column("name")]
        name: String,
        profession: String,
    },
    #[column(variant = 2)]
    Animal {
        #[column("name")]
        name: String,
        species: String,
    },
}

// Query the shared "name" field for a specific variant
Character::all().filter(
    Character::FIELDS.creature().matches(
        Creature::VARIANTS.human().name().eq("Alice")
    )
)

// Query across variants using the shared column
// (finds any creature with this name, regardless of variant)
Character::all().filter(
    Character::FIELDS.creature().name().eq("Bob")
)

// Variant-specific field
Character::all().filter(
    Character::FIELDS.creature().matches(
        Creature::VARIANTS.human().profession().eq("Knight")
    )
)
}

Updating

Update builders provide two methods per field:

  • .field(value) - Direct value assignment
  • .with_field(|f| ...) - Closure-based update

The .with_* methods provide a uniform API across all field types and enable:

  • Embedded types: Partial updates (only set specific nested fields)
  • Primitives: Future type-specific operations (e.g., NumericUpdate::increment())
  • Enums: Update variant fields without changing the discriminator

Whole replacement

Setting an embedded struct field on an update replaces all of its columns:

#![allow(unused)]
fn main() {
// Loaded model update — sets address_street, address_city, address_zip
user.update()
    .address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
    .exec(&db).await?;

// Query-based update — same behavior, no model loaded
User::filter_by_id(id).update()
    .address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
    .exec(&db).await?;
}

Partial updates

Each field (primitive or embedded) generates a companion {Type}Update<'a> type that provides a view into the update statement’s assignments. These update types hold a reference to the statement and a projection path, allowing them to directly mutate the statement as fields are set. This enables efficient nested updates without intermediate allocations.

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

// AddressUpdate<'a> is generated automatically by `#[derive(toasty::Embed)]`
// StringUpdate<'a> is generated for primitive String fields
}

Embedded types:

#![allow(unused)]
fn main() {
// Whole replacement — sets all address columns
user.update()
    .address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
    .exec(&db).await?;

// Partial update — only address_city is SET
user.update()
    .with_address(|a| {
        a.set_city("Seattle");
    })
    .exec(&db).await?;

// Multiple sub-fields — only address_city and address_zip are SET
user.update()
    .with_address(|a| {
        a.set_city("Seattle");
        a.set_zip("98101");
    })
    .exec(&db).await?;

// Query-based partial update
User::filter_by_id(id).update()
    .with_address(|a| a.set_city("Seattle"))
    .exec(&db).await?;
}

Primitive types:

#![allow(unused)]
fn main() {
// Direct value
user.update()
    .name("Alice")
    .exec(&db).await?;

// Via closure (enables future type-specific operations)
user.update()
    .with_name(|n| {
        n.set("Alice");
    })
    .exec(&db).await?;
}

For now, primitive update builders only provide .set(). Future enhancements could add type-specific operations like NumericUpdate::increment(), StringUpdate::append(), etc.

Partial updates with nested embedded structs

Nested embedded structs also generate {Type}Update<'a> types. The .with_* methods can be nested naturally:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Office {
    name: String,
    location: Address,
}

// Update only headquarters_location_city
company.update()
    .with_headquarters(|h| {
        h.with_location(|a| {
            a.set_city("Seattle");
        });
    })
    .exec(&db).await?;

// Update headquarters_name and headquarters_location_zip
company.update()
    .with_headquarters(|h| {
        h.with_name(|n| n.set("West Coast HQ"));
        h.with_location(|a| {
            a.set_zip("98101");
        });
    })
    .exec(&db).await?;
}

Enum updates

Enums use whole-variant replacement. Setting an enum field replaces the discriminator and all variant columns:

#![allow(unused)]
fn main() {
// Replace the entire enum value — sets discriminator + variant fields,
// NULLs out fields from the previous variant
user.update()
    .contact(ContactMethod::Email { address: "new@example.com".into() })
    .exec(&db).await?;
}

For data-carrying variants, use .with_contact() to update fields within the current variant without changing the discriminator:

#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactMethod {
    #[column(variant = 1)]
    Email { address: String },
    #[column(variant = 2)]
    Phone { country: String, number: String },
}

// Update only the phone number, leave country and discriminator unchanged
user.update()
    .with_contact(|c| {
        c.phone(|p| {
            p.with_number(|n| n.set("555-1234"));
        });
    })
    .exec(&db).await?;

// Update email variant
User::filter_by_id(id).update()
    .with_contact(|c| {
        c.email(|e| {
            e.with_address(|a| a.set("new@example.com"));
        });
    })
    .exec(&db).await?;
}

ContactMethodUpdate<'a> has one method per variant (e.g., .phone(), .email()). Each method accepts a closure that receives a builder scoped to that variant’s fields. The discriminator is not changed by partial updates.

Mapping Layer Formalization

Problem

Toasty’s mapping layer connects model-level fields to database-level columns. A model field’s type may differ from its storage type (e.g., Timestamp stored as i64 or text). The mapping must be a bijection — every model value encodes to exactly one stored value and decodes back losslessly. The bijection operates at the record level, not per-field: n model fields may map to m database columns (e.g., multiple fields JSON-encoded into a single column).

The bijection alone is not sufficient. When lowering expressions (filters, ORDER BY, arithmetic) to the database, we need to know whether a given operator can be pushed through the encoding. This is the question of whether the encoding is a homomorphism with respect to that operator:

  • For arithmetic: encode(a ⊕ b) = encode(a) ⊕' encode(b)
  • For comparisons: a < b ⟺ encode(a) <' encode(b)

If yes, the operator can be evaluated in storage space (efficient, index-friendly). If no, the database must first decode to the model type (SQL CAST) or the operation must be evaluated application-side.

These are two decoupled concerns:

  1. Bijection — can we round-trip values? (required for correctness)
  2. Operator homomorphism — which operators preserve semantics through the encoding? (determines what can be pushed to the DB)

A mapping with no homomorphic operators is still valid — you can store and retrieve. You just can’t push any filters or ordering to the database.

Examples

Timestamp as i64 (epoch seconds)

encode(ts) = ts.epoch_seconds()
decode(n)  = Timestamp::from_epoch_seconds(n)

Bijection: ✓ — lossless round-trip.

== homomorphic: ✓ — ts1 == ts2 ⟺ encode(ts1) == encode(ts2)

< homomorphic: ✓ — ts1 < ts2 ⟺ encode(ts1) < encode(ts2)

Epoch seconds preserve temporal ordering under integer comparison, so range queries (<, >, BETWEEN) can operate directly on the raw column.

+ homomorphic: ✓ — encode(ts + 234s) = encode(ts) + 234

Integer addition over epoch seconds preserves timestamp arithmetic.

Timestamp as text (ISO 8601)

encode(ts) = ts.to_iso8601()
decode(s)  = Timestamp::parse_iso8601(s)

Bijection: ✓ — lossless round-trip (assuming canonical formatting).

== homomorphic: ✓ — injective encoding preserves equality.

< homomorphic: fragile — lexicographic order matches temporal order only for fixed-width UTC formats. Not generally safe.

+ homomorphic: ✗ — text + 234 is meaningless.

String with case inversion

encode(s) = s.invert_case()    // "Hello" → "hELLO"
decode(s) = s.invert_case()    // "hELLO" → "Hello"

Bijection: ✓ — case inversion is its own inverse.

== homomorphic: ✓ — injective, so equality is preserved. Encode the search term the same way and compare.

< homomorphic: ✗ — ordering is reversed between cases:

"ABC" < "abc"                   (A=65 < a=97)
encode("ABC") = "abc"
encode("abc") = "ABC"
"abc" > "ABC"                   — ordering reversed

A valid mapping, but useless for range queries in storage space.

Bijection by Construction

For arbitrary functions, bijectivity is undecidable. Instead of detecting it, we construct mappings from known-bijective primitives and composition rules that preserve bijectivity. If a mapping is built entirely from these, it is guaranteed valid.

Composition rules

  • Sequential: f ∘ g is a bijection if both f and g are.
  • Parallel/product: (f(a), g(b)) is a bijection if both f and g are.

These compose freely — complex mappings built from simple bijective pieces are automatically valid. Homomorphism properties, however, may be lost at each composition step and must be tracked separately.

Dimensionality: multiple fields → one column

Two fields may map to the same column if and only if the model constrains them to always hold the same value (an equivalence class). In this case no information is lost and the mapping remains a bijection — but only over the restricted domain where the constraint holds. Without such a constraint, collapsing two independent fields into one column destroys injectivity.

This gives us computed fields as a natural consequence. Two fields can reference the same column through different bijective transformations:

regular:  String → column              (identity)
inverted: String → invert_case(column) (bijection)

Because the transformations are bijections, both fields are readable AND writable. Writing regular = "Hello" stores "Hello" in the column; inverted automatically becomes "hELLO". Writing inverted = "hELLO" applies the inverse to store "Hello"; regular is automatically "Hello". Data flow in both directions is fully determined by the bijection — no special computed-field machinery needed.

Computed Fields

Storage is the source of truth. Each field is a view of the underlying column(s) through its bijection. Computed fields are a direct consequence: when multiple fields reference the same column through different bijections, each field is a different view of the same stored data.

Schema representation

Each field stores a bijection pair:

  • field_to_column: encode — compute column value from field value (inverse)
  • column_to_field: decode — compute field value from column value (forward)

A reverse index maps each column to the set of fields that reference it.

Write propagation

When a field is set, the column value is determined, which determines all sibling fields:

  1. Compute column value: col = field_a.field_to_column(new_value)
  2. For each sibling field on the same column: field_b = field_b.column_to_field(col)

The composed transform between two fields sharing a column is: field_b.column_to_field(field_a.field_to_column(value))

Conflict detection

If the user sets two fields that share a column in the same operation, the resulting column values must agree. If field_a.field_to_column(val_a) ≠ field_b.field_to_column(val_b), the write is invalid and must be rejected.

Bijective Primitives

Three categories of bijective primitives, each with encode/decode halves:

Type reinterpretation

Converts a single value between two types with the same information content. Implemented as Expr::Cast in both directions.

Current pairs:

  • Timestamp ↔ String (ISO 8601)
  • Uuid ↔ String
  • Uuid ↔ Bytes
  • Date ↔ String
  • Time ↔ String
  • DateTime ↔ String
  • Zoned ↔ String
  • Timestamp ↔ DateTime
  • Timestamp ↔ Zoned
  • Zoned ↔ DateTime
  • Decimal ↔ String
  • BigDecimal ↔ String
  • Integer widening/narrowing (i8 ↔ i16 ↔ i32 ↔ i64, etc.)

Affine transformations

Arithmetic transformations by a constant. Each is a bijection with a known inverse.

  • x + k — inverse: x - k
  • x * k (k ≠ 0) — inverse: x / k
  • x * k + c (k ≠ 0) — inverse: (x - c) / k

Homomorphism properties (for x + k as representative):

  • == homomorphic: ✓ — a == b ⟺ (a+k) == (b+k)
  • < homomorphic: ✓ — a < b ⟺ (a+k) < (b+k)
  • + homomorphic: ✗ — encode(a+b) = a+b+k ≠ encode(a)+encode(b) = a+b+2k

Note: x * k for negative k reverses ordering (< not homomorphic).

Product (record)

Packs/unpacks multiple independent values into a fixed-size tuple.

  • Encode: Expr::Record — combine values into a tuple
  • Decode: Expr::Project — extract by index

Bijective because each component is independent and individually recoverable. Used for embedded structs (fields flattened into columns).

Coproduct (tagged union)

Encodes/decodes a discriminated union (enum) where the discriminant partitions the domain into disjoint subsets.

  • Encode: Expr::Project — extract discriminant and per-variant fields
  • Decode: Expr::Match — branch on discriminant, reconstruct variant via Expr::Record

Bijective if and only if:

  • Arms are exhaustive (cover all discriminant values)
  • Arms are disjoint (no overlapping discriminants)
  • Each arm’s body is individually a bijection

This is a coproduct of bijections: if f_i: A_i → B_i is a bijection for each variant i, the combined mapping on the tagged union Σ_i A_i → Σ_i B_i is also a bijection.

Operator Homomorphism

Operator inventory

Current Toasty binary operators (BinaryOp): ==, !=, <, <=, >, >=.

Arithmetic operators (+, -) are not yet in the AST but are needed for computed fields and interval arithmetic.

For homomorphism analysis, != is the negation of ==, and >=/<= are derivable from </>. So the independent set is: ==, <, +.

Per-primitive homomorphism

Type reinterpretation:

Encoding==<+
Timestamp ↔ String✓ (¹)
Uuid ↔ Stringn/a
Uuid ↔ Bytesn/a
Date ↔ String✓ (¹)
Time ↔ String✓ (¹)
DateTime ↔ String✓ (¹)
Zoned ↔ String
Timestamp ↔ DateTime
Timestamp ↔ Zoned
Zoned ↔ DateTime
Decimal ↔ String
BigDecimal ↔ String
Integer widening

(¹) Requires canonical fixed-width serialization format. Lexicographic ordering matches semantic ordering only if Toasty guarantees consistent formatting (no variable-length subsecond digits, no timezone offset variations, etc.).

All type reinterpretations are injective, so == is always preserved. < and + depend on whether the target type’s native operations align with the source type’s semantics.

Affine transformations:

Encoding==<+
x + k
x * k (k>0)
x * k (k<0)✗ (reversed)
x * k + c✓ if k>0

Product (record):

OperatorHomomorphic?
==✓ — if each component preserves ==
<conditional — requires lexicographic comparison and each component preserves <
+✓ — if each component preserves + (component-wise)

Coproduct (tagged union):

OperatorHomomorphic?
==✓ — if discriminant + each arm preserves ==
<generally ✗ — cross-variant comparison is usually meaningless
+✗ — arithmetic across variants undefined

Homomorphism under composition

Sequential (g ∘ f): if both f and g are homomorphic for an operator, so is the composition. Proof: a op b ⟺ f(a) op f(b) ⟺ g(f(a)) op g(f(b)).

Parallel/product ((f(a), g(b))): preserves == if both f and g do. Preserves < only if tuple comparison is lexicographic and both preserve <.

Coproduct: preserves == if each arm does. Does not generally preserve <.

Cross-encoding comparisons

When two operands use different encodings (e.g., field₁ uses Timestamp→i64, field₂ uses Timestamp→i64+offset), can_distribute does not directly apply. The comparison encode₁(a) op encode₂(b) mixes two encodings and may not preserve semantics.

Fallback: decode both to model space and compare there.

decode₁(col₁) op decode₂(col₂)

This always produces correct results but may require SQL CAST or application-side evaluation.

Database independence

can_distribute does not take a database parameter. Database capabilities determine which bijection is selected (e.g., PostgreSQL has native timestamps → identity mapping; SQLite does not → Timestamp↔i64). Once the bijection is chosen, can_distribute is purely a property of that bijection and the operator.

The only edge case is if two databases use the same types but their operators behave differently (e.g., string collation affecting <). This can be handled by treating such behavioral differences as part of the encoding rather than adding a database parameter.

Precision / Domain Restriction

Lossy encodings like #[column(type = timestamp(2))] involve two distinct steps:

  1. Domain restriction (lossy, write-time): the user’s full-precision value is truncated to the representable domain. This is many-to-one — multiple inputs collapse to the same output. It is not part of the mapping.

  2. Encoding (bijective): over the restricted domain (values with ≤2 fractional digits), the mapping is a perfect bijection — lossless round-trip.

The mapping framework only governs step 2. Step 1 is a write-time concern: when the user assigns a value, it gets projected into the representable domain. Analogous to integer narrowing (i64 → i32): the mapping between i32 values and the stored column is bijective; the loss happens if you store a value outside i32 range.

Nullability

Option<T> with None → NULL is a coproduct bijection:

  • Domain partition: Option<T> = None | Some(T) — two disjoint cases.
  • Encoding: None → NULL, Some(v) → encode(v) — each arm is individually bijective (unit↔NULL is trivially so; Some delegates to T’s encoding).
  • Decoding: NULL → None, non-NULL → Some(decode(v)).

This satisfies the coproduct conditions (exhaustive, disjoint, per-arm bijective).

NULL breaks standard ==

SQL uses three-valued logic: NULL = NULL evaluates to NULL (falsy), not TRUE. This means the standard == operator is not homomorphic over the nullable encoding — the model-level None == None is true, but NULL = NULL is not.

NULL-safe operators

All Toasty target databases provide a NULL-safe equality operator:

DatabaseOperator
PostgreSQLIS NOT DISTINCT FROM
MySQL<=>
SQLiteIS

Using the NULL-safe operator restores == homomorphism: a == b ⟺ encode(a) IS NOT DISTINCT FROM encode(b).

Operator mapping

This means homomorphism is not just a property of (encoding, operator) — it is a property of the triple (encoding, model_op, storage_op). The lowerer may need to emit a different SQL operator than the one the user wrote:

  • Non-nullable field: model == → SQL =
  • Nullable field: model == → SQL IS NOT DISTINCT FROM (or <=>, IS)

can_distribute should return the storage-level operator to use, not just a boolean. Signature sketch:

can_distribute(encoding, model_op) -> Option<storage_op>

None means the operator cannot be pushed to the DB. Some(op) means it can, using the specified storage operator.

Ordering

NULL ordering is also database-specific (NULLS FIRST vs NULLS LAST). The lowerer must ensure consistent behavior across backends, potentially by emitting explicit NULLS FIRST/NULLS LAST clauses.

Lowering Algorithm

The lowerer transforms a model-level expression tree into a storage-level expression tree. The input contains field references and model-level literals. The output contains column references and storage-level values.

Core: lowering a binary operator

lower_binary_op(op, lhs, rhs):
    // 1. Identify field references and look up their encodings
    //    from the schema/mapping.
    lhs_encoding = lookup_encoding(lhs) if lhs is FieldRef, else None
    rhs_encoding = lookup_encoding(rhs) if rhs is FieldRef, else None

    // 2. Determine if the operator can distribute through the encoding.
    //    For single-column primitive encodings:
    if both are FieldRefs with same encoding:
        match can_distribute(encoding, op):
            Some(storage_op):
                // Both fields share the encoding — compare columns directly.
                emit: column_lhs storage_op column_rhs
            None:
                // Decode both to model space.
                emit: decode(column_lhs) op decode(column_rhs)

    if one is FieldRef, other is Literal:
        match can_distribute(field_encoding, op):
            Some(storage_op):
                // Encode the literal, compare in storage space.
                emit: column storage_op encode(literal)
            None:
                // Decode the column to model space.
                emit: decode(column) op literal

    if both are Literals:
        // Const-evaluate in model space.
        emit: literal_lhs op literal_rhs

Encoding the literal

encode(literal) applies the field’s field_to_column bijection to the model-level value, producing a storage-level value. For a UUID↔text encoding: encode(UUID("abc-123"))"abc-123".

Decoding the column

decode(column_ref) applies the field’s column_to_field bijection to the column reference, wrapping it in the appropriate SQL expression. For UUID↔text: decode(uuid_col)CAST(uuid_col AS UUID).

If the database lacks the model type (e.g., SQLite has no UUID), decode is not expressible in SQL. The operation must be evaluated application-side or the query rejected.

Multi-column encodings (product / coproduct)

For fields that span multiple columns, == expands structurally:

lower_binary_op(==, coproduct_field, literal):
    encoded = encode(literal)
    // encoded is a tuple: (disc_val, col1_val, col2_val, ...)

    // Expand into per-column comparisons:
    result = TRUE
    for each (column, encoded_value) in zip(field.columns, encoded):
        col_encoding = encoding_for(column)  // e.g., nullable text
        match can_distribute(col_encoding, ==):
            Some(storage_op):
                result = result AND (column storage_op encoded_value)
            None:
                result = result AND (decode(column) == encoded_value)
    emit: result

ORDER BY

lower_order_by(field):
    encoding = lookup_encoding(field)
    match can_distribute(encoding, <):
        Some(_):
            // Ordering is preserved in storage space.
            emit: ORDER BY column
        None:
            // Must decode to model space for correct ordering.
            emit: ORDER BY decode(column)

SELECT returning

Always decode — application needs model-level values:

lower_select_returning(field):
    emit: decode(column)  // column_to_field bijection

INSERT / UPDATE

Always encode — database needs storage-level values:

lower_insert_value(field, value):
    emit: encode(value)  // field_to_column bijection

Examples

WHERE uuid_col == UUID("abc-123"), UUID stored as text:

  1. LHS is FieldRef → encoding: UUID↔text, column: uuid_col
  2. RHS is literal: UUID("abc-123")
  3. can_distribute(UUID↔text, ==)Some(=)
  4. Encode literal: "abc-123"
  5. Output: uuid_col = 'abc-123'

WHERE uuid_col < UUID("abc-123"), UUID stored as text:

  1. LHS is FieldRef → encoding: UUID↔text, column: uuid_col
  2. RHS is literal: UUID("abc-123")
  3. can_distribute(UUID↔text, <)None
  4. Decode column: CAST(uuid_col AS UUID)
  5. Output: CAST(uuid_col AS UUID) < UUID('abc-123')
  6. (If DB lacks UUID type → application-side evaluation or reject)

WHERE contact == Contact::Phone { number: "123" }, coproduct encoding:

  1. LHS is FieldRef → coproduct encoding, columns: disc, phone_number, email_address
  2. RHS is literal → encode: (0, "123", NULL)
  3. Expand per-column:
    • disc = 0 (can_distribute(i64, ==)Some(=))
    • phone_number = '123' (can_distribute(nullable text, ==)Some(=))
    • email_address IS NULL (can_distribute(nullable text, ==)Some(IS))
  4. Output: disc = 0 AND phone_number = '123' AND email_address IS NULL

Schema Representation

Each field’s mapping is stored as a structured Bijection tree. This is the single source of truth — encode/decode expressions are derived from it.

Bijection enum

#![allow(unused)]
fn main() {
enum Bijection {
    /// No transformation — field type == column type.
    Identity,

    /// Lossless cast between two types with the same information content.
    /// e.g., UUID↔text, Timestamp↔i64, integer widening.
    Cast { from: Type, to: Type },

    /// x*k + c (k ≠ 0). Inverse: (x - c) / k.
    Affine { k: Value, c: Value },

    /// Option<T> → nullable column.
    /// Wraps an inner bijection with None↔NULL.
    Nullable(Box<Bijection>),

    /// Embedded struct → multiple columns.
    /// Each component is an independent bijection on one field↔column pair.
    Product(Vec<Bijection>),

    /// Enum → discriminant column + per-variant columns.
    Coproduct {
        discriminant: Box<Bijection>,
        variants: Vec<CoproductArm>,
    },

    /// Composition: apply `inner` first, then `outer`.
    /// encode = outer.encode(inner.encode(x))
    /// decode = inner.decode(outer.decode(x))
    Compose {
        inner: Box<Bijection>,
        outer: Box<Bijection>,
    },
}

struct CoproductArm {
    discriminant_value: Value,
    body: Bijection, // typically Product for data-carrying variants
}
}

Methods on Bijection

#![allow(unused)]
fn main() {
impl Bijection {
    /// Encode a model-level value to a storage-level value.
    fn encode(&self, value: Value) -> Value;

    /// Produce a decode expression: given a column reference (or tuple of
    /// column references), return a model-level expression.
    fn decode(&self, column_expr: Expr) -> Expr;

    /// Query whether `model_op` can be pushed through this encoding.
    /// Returns the storage-level operator to use, or None if the
    /// operation must fall back to model space.
    fn can_distribute(&self, model_op: BinaryOp) -> Option<StorageOp>;

    /// Number of columns this bijection spans.
    fn column_count(&self) -> usize;
}
}

can_distribute is defined recursively:

  • Identity: always Some(model_op) — no transformation.
  • Cast: lookup in the per-pair homomorphism table.
  • Affine: ==Some(=). <Some(<) if k > 0, None if k < 0.
  • Nullable: delegates to inner, may change op (e.g., ==IS NOT DISTINCT FROM).
  • Product: ==Some(=) if all components return Some. < → only if lexicographic and all components support <.
  • Coproduct: ==Some if discriminant + each arm returns Some. < → generally None.
  • Compose: Some only if both inner and outer return Some.

Per-field mapping

#![allow(unused)]
fn main() {
struct FieldMapping {
    bijection: Bijection,
    columns: Vec<ColumnId>, // columns this field maps to (1 for primitive, N for product/coproduct)
}
}

The model-level mapping::Model holds a FieldMapping per field, plus a reverse index from columns to fields (for computed field propagation).

Verification

The framework should be formally verified using Lean 4 + Mathlib. Mathlib already provides the algebraic vocabulary (bijections, homomorphisms, products, coproducts). The plan:

  1. Define the primitives and composition rules in Lean
  2. Prove the general theorems once (composition preserves bijection, coproduct conditions, etc.)
  3. For each concrete primitive, state and prove its homomorphism properties
  4. Lean checks everything mechanically

Engine-Level Pagination Design

Overview

This document describes the implementation of engine-level pagination in Toasty. The key principle is that pagination logic (limit+1 strategy, cursor extraction, etc.) should be handled by the engine, not in application-level code. This allows the engine to leverage database-specific capabilities (e.g., DynamoDB’s native cursor support) while providing compatibility for databases that don’t have native support (e.g., SQL databases).

Architecture Context

Statement System

  • toasty_core::stmt::Statement represents a superset of SQL - “Toasty-flavored SQL”
  • Contains both SQL concepts AND Toasty application-level concepts (models, paths, pagination)
  • Limit::PaginateForward is a Toasty-level concept that must be transformed by the engine before reaching SQL generation
  • By the time statements reach toasty-sql, they must contain ONLY valid SQL

Engine Pipeline

  1. Planner: Transforms Toasty statements into a pipeline of actions
  2. Actions: Executed by the engine, store results in VarStore
  3. VarStore: Stores intermediate results between pipeline steps
  4. ExecResponse: Final result containing values and optional metadata

Existing Patterns

  • eval::Func: Pre-computed transformations that execute during pipeline execution
  • partition_returning: Separates database-handled expressions from in-memory evaluations
  • Output::project: Transforms raw database results before storing in VarStore

Design

Core Types

#![allow(unused)]
fn main() {
// In engine.rs
pub struct ExecResponse {
    pub values: ValueStream,
    pub metadata: Option<Metadata>,
}

pub struct Metadata {
    pub next_cursor: Option<Expr>,
    pub prev_cursor: Option<Expr>,
    pub query: Query,
}

// In engine/plan/exec_statement.rs
pub struct ExecStatement {
    pub input: Option<Input>,
    pub output: Option<Output>,
    pub stmt: stmt::Statement,
    pub conditional_update_with_no_returning: bool,
    
    /// Pagination configuration for this query
    pub pagination: Option<Pagination>,
}

pub struct Pagination {
    /// Original limit before +1 transformation
    pub limit: u64,
    
    /// Function to extract cursor from a row
    /// Takes row as arg[0], returns cursor value(s)
    pub extract_cursor: eval::Func,
}
}

VarStore Changes

The VarStore needs to be updated to store ExecResponse instead of ValueStream:

#![allow(unused)]
fn main() {
pub(crate) struct VarStore {
    slots: Vec<Option<ExecResponse>>,
}
}

This allows pagination metadata to flow through the pipeline and be returned from engine::exec.

Implementation Plan

Phase 1: Update VarStore to ExecResponse [Mechanical Change]

This phase is a purely mechanical change to update the VarStore infrastructure. No pagination logic yet.

  1. Update VarStore (engine/exec/var_store.rs):

    • Change storage type from ValueStream to ExecResponse
    • Update load() to return ExecResponse
    • Update store() to accept ExecResponse
    • Update dup() to clone entire ExecResponse (including metadata)
  2. Update all action executors to wrap their results in ExecResponse:

    • For now, all actions will use metadata: None
    • Each action’s result becomes: ExecResponse { values, metadata: None }
    • Actions to update:
      • action_associate
      • action_batch_write
      • action_delete_by_key
      • action_exec_statement
      • action_find_pk_by_index
      • action_get_by_key
      • action_insert
      • action_query_pk
      • action_update_by_key
      • action_set_var
  3. Update pipeline execution (engine/exec.rs):

    • exec_pipeline returns ExecResponse
    • Handle VarStore returning ExecResponse
  4. Update main engine (engine.rs):

    • exec::exec now returns ExecResponse directly
    • Remove the temporary wrapping logic

This phase establishes the infrastructure without any behavioral changes. All existing tests should continue to pass.

Phase 2: Add Pagination to ExecStatement [Task 2]

  1. Add Pagination struct to engine/plan/exec_statement.rs
  2. Add pagination: Option<Pagination> field to ExecStatement
  3. No execution changes yet - just the structure

Phase 3: Planner Support for SQL Pagination [Task 3]

In planner/select.rs, add pagination planning logic:

#![allow(unused)]
fn main() {
impl Planner<'_> {
    fn plan_select_sql(...) {
        // ... existing logic ...
        
        // Handle pagination
        let pagination = if let Some(Limit::PaginateForward { limit, after }) = &stmt.limit {
            Some(self.plan_pagination(&mut stmt, &mut project, limit)?)
        } else {
            None
        };
        
        self.push_action(plan::ExecStatement {
            input,
            output: Some(plan::Output { var: output, project }),
            stmt: stmt.into(),
            conditional_update_with_no_returning: false,
            pagination,
        });
    }
    
    fn plan_pagination(
        &mut self,
        stmt: &mut stmt::Query,
        project: &mut eval::Func,
        limit_expr: &stmt::Expr,
    ) -> Result<Pagination> {
        let original_limit = self.extract_limit_value(limit_expr)?;
        
        // Get ORDER BY clause (required for pagination)
        let order_by = stmt.order_by.as_ref()
            .ok_or_else(|| anyhow!("Pagination requires ORDER BY"))?;
        
        // Check if ORDER BY is unique
        let is_unique = self.is_order_by_unique(order_by, stmt);
        
        // If not unique, append primary key as tie-breaker
        if !is_unique {
            self.append_pk_to_order_by(stmt)?;
        }
        
        // Ensure ORDER BY fields are in returning clause
        let (added_indices, original_field_count) = 
            self.ensure_order_by_in_returning(stmt)?;
        
        // Build cursor extraction function
        let extract_cursor = self.build_cursor_extraction_func(
            stmt,
            &added_indices,
        )?;
        
        // Modify project function if we added fields
        if !added_indices.is_empty() {
            self.adjust_project_for_pagination(
                project,
                original_field_count,
                added_indices.len(),
            );
        }
        
        // Transform limit to +1 for next page detection
        *stmt.limit.as_mut().unwrap() = Limit::Offset {
            limit: (original_limit + 1).into(),
            offset: None,
        };
        
        Ok(Pagination {
            limit: original_limit,
            extract_cursor,
        })
    }
}
}

Key helper methods:

  1. is_order_by_unique: Checks if ORDER BY fields form a unique constraint
  2. append_pk_to_order_by: Adds primary key as tie-breaker
  3. ensure_order_by_in_returning: Adds ORDER BY fields to SELECT if missing
  4. build_cursor_extraction_func: Creates eval::Func to extract cursor
  5. adjust_project_for_pagination: Modifies project to filter out added fields

Phase 4: Executor Implementation [Task 4]

In engine/exec/exec_statement.rs:

#![allow(unused)]
fn main() {
impl Exec<'_> {
    pub(super) async fn action_exec_statement(
        &mut self,
        action: &plan::ExecStatement,
    ) -> Result<()> {
        // ... existing logic to execute statement ...
        
        let res = if let Some(pagination) = &action.pagination {
            self.handle_paginated_query(res, pagination, &action.stmt).await?
        } else {
            ExecResponse {
                values: /* normal value stream */,
                metadata: None,
            }
        };
        
        self.vars.store(out.var, res);
        Ok(())
    }
    
    async fn handle_paginated_query(
        &mut self,
        rows: Rows,
        pagination: &Pagination,
        stmt: &Statement,
    ) -> Result<ExecResponse> {
        // Collect limit+1 rows
        let mut buffer = Vec::new();
        let mut count = 0;
        
        match rows {
            Rows::Values(stream) => {
                for await value in stream {
                    buffer.push(value?);
                    count += 1;
                    if count > pagination.limit {
                        break;
                    }
                }
            }
            _ => return Err(anyhow!("Pagination requires row results")),
        }
        
        // Check if there's a next page
        let has_next = buffer.len() > pagination.limit as usize;
        
        // Extract cursor if there's a next page
        let next_cursor = if has_next {
            // Get cursor from the LAST item we're keeping
            let last_kept = &buffer[pagination.limit as usize - 1];
            let cursor_value = pagination.extract_cursor.eval(&[last_kept.clone()])?;
            
            // Truncate buffer to requested limit
            buffer.truncate(pagination.limit as usize);
            
            Some(stmt::Expr::Value(cursor_value))
        } else {
            None
        };
        
        Ok(ExecResponse {
            values: ValueStream::from_vec(buffer),
            metadata: Some(Metadata {
                next_cursor,
                prev_cursor: None, // TODO: implement in future
                query: stmt.as_query().cloned().unwrap_or_default(),
            }),
        })
    }
}
}

Phase 5: Clean Up Application Layer [Task 5]

Remove the limit+1 logic from Paginate::collect:

#![allow(unused)]
fn main() {
pub async fn collect(self, db: &Db) -> Result<Page<M>> {
    // Simply delegate to db.paginate - engine handles pagination
    db.paginate(self.query).await
}
}

Update Db::paginate to use the metadata from ExecResponse:

#![allow(unused)]
fn main() {
pub async fn paginate<M: Model>(&self, statement: stmt::Select<M>) -> Result<Page<M>> {
    let exec_response = engine::exec(self, statement.untyped.clone().into()).await?;
    
    // Convert value stream to models
    let mut cursor = Cursor::new(self.schema.clone(), exec_response.values);
    let mut items = Vec::new();
    while let Some(item) = cursor.next().await {
        items.push(item?);
    }
    
    // Extract pagination metadata
    let (next_cursor, prev_cursor) = match exec_response.metadata {
        Some(metadata) => (metadata.next_cursor, metadata.prev_cursor),
        None => (None, None),
    };
    
    Ok(Page::new(items, statement, next_cursor, prev_cursor))
}
}

Key Design Decisions

  1. Single Source of Truth: The extract_cursor function is the only place that knows how to extract cursors. No redundant order_by_indices.

  2. Type Safety: Cursor extraction function uses actual inferred types from the schema, not Type::Any.

  3. Automatic Tie-Breaking: The planner automatically appends primary key to ORDER BY when needed for uniqueness.

  4. Transparent Field Addition: ORDER BY fields are added to returning clause transparently, and filtered out via the project function.

  5. Metadata Threading: ExecResponse flows through VarStore, preserving metadata through the pipeline.

Testing Strategy

  1. Unit Tests: Test cursor extraction function generation
  2. Integration Tests: Test pagination with various ORDER BY configurations
  3. Database Tests: Ensure SQL generation is correct (no PaginateForward in SQL)
  4. End-to-End Tests: Verify pagination works across different databases

Future Enhancements

  1. Previous Page Support: Implement prev_cursor extraction and PaginateBackward
  2. DynamoDB Native Pagination: Leverage LastEvaluatedKey instead of limit+1
  3. Complex ORDER BY: Support expressions beyond simple column references
  4. Optimization: Cache cursor extraction functions for common patterns

Serialized Field Implementation Design

Builds on the #[serialize] bookkeeping already in place (attribute parsing, SerializeFormat enum, FieldPrimitive.serialize field). This document covers the runtime serialization/deserialization codegen.

User-Facing API

#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
    #[key]
    #[auto]
    id: uuid::Uuid,

    name: String,

    #[serialize(json)]
    tags: Vec<String>,

    // nullable: the column may be NULL. The Rust type must be Option<T>.
    // None maps to NULL; Some(v) is serialized as JSON.
    #[serialize(json, nullable)]
    metadata: Option<HashMap<String, String>>,

    // Non-nullable Option: the entire Option value is serialized as JSON.
    // Some(v) → `v` as JSON, None → `null` as JSON text (column is NOT NULL).
    #[serialize(json)]
    extra: Option<String>,
}
}

Fields annotated with #[serialize(json)] are stored as JSON text in a single database column. The field’s Rust type must implement serde::Serialize and serde::DeserializeOwned. The database column type defaults to String/TEXT.

Nullability

By default, serialized fields are not nullable. The entire Rust value — including Option<T> — is serialized as-is into JSON text stored in a NOT NULL column. This means None becomes the JSON text null, and Some(v) becomes the JSON serialization of v.

To make the database column nullable, add nullable to the attribute: #[serialize(json, nullable)]. When nullable is set:

  • The Rust type must be Option<T>.
  • None maps to a SQL NULL (no value stored).
  • Some(v) serializes v as JSON text.

This is an explicit opt-in because the two behaviors are meaningfully different: a user may legitimately want to serialize None as JSON null text in a NOT NULL column (e.g., for a JSON API field where null is a valid value distinct from “no row”).

Value Encoding

A serialized field stores a JSON string in the database. The value stream uses Value::String for serialized fields, not the field’s logical Rust type.

Rust value ──serde_json::to_string──► Value::String(json) ──► DB column (TEXT)
DB column (TEXT) ──► Value::String(json) ──serde_json::from_str──► Rust value

Schema Changes

For serialized fields, field_ty bypasses <T as Primitive>::field_ty() and constructs FieldPrimitive directly with ty: Type::String. The user’s Rust type T does not need to implement Primitive — it only needs Serialize + DeserializeOwned.

Nullability is determined by the nullable flag in the attribute, not by inspecting the Rust type.

Remove serialize from Primitive::field_ty

Today Primitive::field_ty accepts a serialize argument so it can thread SerializeFormat into the FieldPrimitive it builds. With this design, serialized fields never go through Primitive::field_ty — codegen constructs the FieldPrimitive directly. That means the serialize parameter is dead for all callers and should be removed.

#![allow(unused)]
fn main() {
// Primitive trait (before):
fn field_ty(
    storage_ty: Option<db::Type>,
    serialize: Option<SerializeFormat>,
) -> FieldTy;

// Primitive trait (after):
fn field_ty(storage_ty: Option<db::Type>) -> FieldTy;
}

The default implementation drops the serialize field from the constructed FieldPrimitive (it is always None when going through the trait). Embedded type overrides (Embed, enum) already ignore both parameters.

Codegen changes:

#![allow(unused)]
fn main() {
// Non-serialized field (calls through the trait):
field_ty = quote!(<#ty as Primitive>::field_ty(#storage_ty));
nullable = quote!(<#ty as Primitive>::NULLABLE);

// Serialized field (constructed directly):
field_ty = quote!(FieldTy::Primitive(FieldPrimitive {
    ty: Type::String,
    storage_ty: #storage_ty,
    serialize: Some(SerializeFormat::Json),
}));
nullable = #serialize_nullable; // literal bool from attribute
}

No type-level hack is needed — the nullable flag is parsed from the attribute at macro expansion time and threaded through to schema registration as a literal bool.

Codegen Changes

Primitive::load / Model::load

For serialized fields, the generated load code reads a String from the record and deserializes it. The behavior depends on whether nullable is set:

#![allow(unused)]
fn main() {
// Non-nullable (default) — works for any T including Option<T>:
field_name: {
    let json_str = <String as Primitive>::load(record[i].take())?;
    serde_json::from_str(&json_str)
        .map_err(|e| Error::from_args(
            format_args!("failed to deserialize field '{}': {}", "field_name", e)
        ))?
},

// Nullable (#[serialize(json, nullable)]) — T must be Option<U>:
field_name: {
    let value = record[i].take();
    if value.is_null() {
        None
    } else {
        let json_str = <String as Primitive>::load(value)?;
        Some(serde_json::from_str(&json_str)
            .map_err(|e| Error::from_args(
                format_args!("failed to deserialize field '{}': {}", "field_name", e)
            ))?)
    }
},
}

Non-serialized fields are unchanged: <T as Primitive>::load(record[i].take())?.

Reload (root model and embedded)

Reload match arms follow the same pattern: load as String, then deserialize. For nullable fields, check null first.

Create builder setters

Serialized field setters accept the concrete Rust type (not impl IntoExpr<T>, since T does not implement IntoExpr) and serialize to a String expression:

#![allow(unused)]
fn main() {
// Non-nullable (default) — accepts T directly (including Option<T>):
pub fn field_name(mut self, field_name: FieldType) -> Self {
    let json = serde_json::to_string(&field_name).expect("failed to serialize");
    self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
    self
}

// Nullable (#[serialize(json, nullable)]) — accepts Option<InnerType>:
pub fn field_name(mut self, field_name: Option<InnerType>) -> Self {
    match &field_name {
        Some(v) => {
            let json = serde_json::to_string(v).expect("failed to serialize");
            self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
        }
        None => {
            self.stmt.set(index, Expr::<String>::from_value(Value::Null));
        }
    }
    self
}
}

Update builder setters

Same pattern as create: accept the concrete type, serialize to JSON, store as String expression.

Dependencies

serde_json is added as an optional dependency of the toasty crate, gated behind the existing serde feature:

# crates/toasty/Cargo.toml
[features]
serde = ["dep:serde_core", "dep:serde_json"]

[dependencies]
serde_json = { workspace = true, optional = true }

Generated code references serde_json through the codegen support module:

#![allow(unused)]
fn main() {
// crates/toasty/src/lib.rs, in codegen_support
#[cfg(feature = "serde")]
pub use serde_json;
}

If a user writes #[serialize(json)] without enabling the serde feature, the generated code fails to compile because codegen_support::serde_json does not exist. The compiler error points at the generated serde_json::from_str call.

Files Modified

FileChange
crates/toasty/Cargo.tomlAdd serde_json optional dep, update serde feature
crates/toasty/src/lib.rsRe-export serde_json in codegen_support
crates/toasty/src/stmt/primitive.rsRemove serialize param from Primitive::field_ty
crates/toasty-codegen/src/schema/field.rsParse nullable flag from #[serialize(...)] attribute
crates/toasty-codegen/src/expand.rsUpdate Embed/enum field_ty overrides to drop serialize param
crates/toasty-codegen/src/expand/schema.rsConstruct FieldPrimitive directly for serialized fields; remove serialize arg from non-serialized field_ty call
crates/toasty-codegen/src/expand/embedded_enum.rsDrop serialize arg from field_ty call
crates/toasty-codegen/src/expand/model.rsDeserialize in expand_load_body() and expand_embedded_reload_body()
crates/toasty-codegen/src/expand/create.rsSerialize in create setter for serialized fields
crates/toasty-codegen/src/expand/update.rsSerialize in update setter, deserialize in reload arms
crates/toasty-driver-integration-suite/Cargo.tomlAdd serde, serde_json deps, enable serde feature
crates/toasty-driver-integration-suite/src/tests/serialize.rsIntegration tests

Integration Tests

New file serialize.rs in the driver integration suite. Test cases:

  • Round-trip a Vec<String> field through create and read-back
  • Round-trip a nullable Option<T> field with Some and None (SQL NULL) values
  • Non-nullable Option<T> field: None round-trips as JSON null text (not SQL NULL)
  • Update a serialized field and verify the new value persists
  • Round-trip a custom struct with serde::Serialize + DeserializeOwned

Toasty ORM - Development Roadmap

This roadmap outlines potential enhancements and missing features for the Toasty ORM.

Overview

Toasty is an easy-to-use ORM for Rust that supports both SQL and NoSQL databases. This roadmap documents potential future work and feature gaps.

Feature Areas

Composite Keys

Composite Key Support (partial implementation)

  • Composite foreign key optimization in query simplification
  • Composite PK handling in expression rewriting and IN-list operations
  • HasMany/BelongsTo relationships with composite foreign keys referencing composite primary keys
  • Junction table / many-to-many patterns with composite keys
  • DynamoDB driver: batch delete/update with composite keys, composite unique indexes
  • Comprehensive test coverage for all composite key combinations

Query Capabilities

Query Ordering, Limits & Pagination

  • Multi-column ordering convenience method (.then_by())
  • Direct .limit() method for non-paginated queries
  • .last() convenience method

Query Constraints & Filtering

  • String operations: contains, starts with, ends with, LIKE (partial AST support)
  • NOT IN
  • Case-insensitive matching
  • BETWEEN / range queries
  • Relation filtering (filter by associated model fields)
  • Field-to-field comparison
  • Arithmetic operations in queries (add, subtract, multiply, divide, modulo)
  • Aggregate queries and GROUP BY / HAVING

Data Types

Extended Data Types

  • Embedded struct & enum support (partial implementation)
  • Serde-serialized types (JSON/JSONB columns for arbitrary Rust types)
  • Embedded collections (arrays, maps, sets, etc.)

Relationships & Loading

Partial Model Loading

  • Allow models to have fields that are not loaded by default (e.g. a large body column on an Article model)
  • Fields opt-in via a #[deferred] attribute and must be wrapped in a Deferred<T> type
  • By default, queries skip deferred fields; callers opt-in with .include(Article::body) (same API as relation preloading)
  • Accessing a Deferred<T> that was not loaded either returns an error or panics with a clear message
  • Works with primitive types, embedded structs, and embedded enums — just a subset of columns in the same table
    #![allow(unused)]
    fn main() {
    #[toasty::model]
    struct Article {
        #[key]
        #[auto]
        id: u64,
        title: String,
        author: BelongsTo<User>,
        #[deferred]
        body: Deferred<String>,   // not loaded unless explicitly included
    }
    
    // Load metadata only (no body column fetched)
    let articles = Article::all().collect(&db).await?;
    
    // Load with body
    let articles = Article::all().include(Article::body).collect(&db).await?;
    }

Relationships

  • Many-to-many relationships
  • Polymorphic associations
  • Nested preloading (multi-level .include() support)

Query Building

Query Features

  • Subquery improvements
  • Better conditional/dynamic query building ergonomics

Database Function Expressions

  • Allow database-side functions (e.g. NOW(), CURRENT_TIMESTAMP) as expressions in create and update operations
  • User API: field setters accept toasty::stmt helpers like toasty::stmt::now() that resolve to core::stmt::ExprFunc variants
    #![allow(unused)]
    fn main() {
    // Set updated_at to the database's current time instead of a Rust-side value
    user.update()
        .updated_at(toasty::stmt::now())
        .exec(&db)
        .await?;
    
    // Also usable in create operations
    User::create()
        .name("Alice")
        .created_at(toasty::stmt::now())
        .exec(&db)
        .await?;
    }
  • Extend ExprFunc enum in toasty-core with new function variants (e.g. Now)
  • SQL serialization for each function across supported databases (NOW() for PostgreSQL/MySQL, datetime('now') for SQLite)
  • Codegen: update field setter generation to accept both value types and function expressions
  • Future: support additional scalar functions (e.g. COALESCE, LOWER, UPPER, LENGTH)

Raw SQL Support

  • Execute arbitrary SQL statements directly
  • Parameterized queries with type-safe bindings
  • Raw SQL fragments within typed queries (escape hatch for complex expressions)

Data Modification

Upsert

  • Insert-or-update: atomic INSERT ... ON CONFLICT DO UPDATE (PostgreSQL/SQLite), ON DUPLICATE KEY UPDATE (MySQL), MERGE (SQL Server/Oracle)
  • Insert-or-ignore (DO NOTHING / INSERT IGNORE)
  • Conflict target: by column(s), by constraint name, partial indexes (PostgreSQL)
  • Column update control: update all non-key columns, named subset, or raw SQL expression
  • Access to the proposed row via EXCLUDED pseudo-table in the update expression
  • Bulk upsert (multi-row VALUES)
  • DynamoDB: PutItem (unconditional replace) vs. UpdateItem with condition expression

Mutation Result Information

  • Return affected row counts from update operations (how many records were updated)
  • Return affected row counts from delete operations (how many records were deleted)
  • Better result types that provide operation metadata
  • Distinguish between “no rows matched” vs “rows matched but no changes needed”

Transactions

Atomic Batch Operations

  • Cross-database atomic batch API
  • Supported across SQL and NoSQL databases
  • Type-safe operation batching
  • All-or-nothing semantics

SQL Transaction API

  • Manual transaction control for SQL databases
  • BEGIN/COMMIT/ROLLBACK support
  • Savepoints and nested transactions
  • Isolation level configuration

Schema Management

Migrations

  • Schema migration system
  • Migration generation
  • Rollback support
  • Schema versioning
  • CLI tools for schema management

Toasty Runtime Improvements

Concurrent Task Execution

  • Replace the current ad-hoc background task with a proper in-flight task manager
  • Execute independent parts of an execution plan concurrently
  • Track and coordinate multiple in-flight tasks within a single query execution

Cancellation & Cleanup

  • Detect when the caller drops the future representing query completion
  • Perform clean cancellation on drop (rollback any incomplete transactions)
  • Ensure no resource leaks or orphaned database state on cancellation

Internal Instrumentation & Metrics

  • Instrument time spent in each execution phase (planning, simplification, execution, serialization)
  • Track CPU time consumed by query planning to detect expensive plans
  • Provide internal metrics for diagnosing performance bottlenecks

Performance

Query Engine Optimization

  • Dedicated post-lowering optimization pass for expensive predicate analysis (run once, not per-node)
  • Equivalence classes for transitive constraint reasoning (a = b AND b = 5 implies a = 5)
  • Structured constraint representation (constant bindings, range bounds, exclusion sets)
  • Targeted predicate normalization without full DNF conversion

Stored Procedures (Pre-Compiled Query Plans)

  • Compile query plans once and execute them many times with different parameter values
  • Skip the full compilation pipeline (simplification, lowering, HIR/MIR planning) on repeated calls
  • Parameterized statement AST with Param slots for value substitution at execution time
  • Pairs with database-level prepared statements for end-to-end optimization

Optimization Features

  • Bulk inserts/updates
  • Query caching
  • Connection pooling improvements

Developer Experience

Ergonomic Macros

  • toasty::query!() - Succinct query syntax that translates to builder DSL
    #![allow(unused)]
    fn main() {
    // Instead of: User::all().filter(...).order_by(...).collect(&db).await
    toasty::query!(User, filter: ..., order_by: ...).collect(&db).await
    }
  • toasty::create!() - Concise record creation syntax
    #![allow(unused)]
    fn main() {
    // Instead of: User::create().name("Alice").age(30).exec(&db).await
    toasty::create!(User, name: "Alice", age: 30).exec(&db).await
    }
  • toasty::update!() - Simplified update syntax
    #![allow(unused)]
    fn main() {
    // Instead of: user.update().name("Bob").age(31).exec(&db).await
    toasty::update!(user, name: "Bob", age: 31).exec(&db).await
    }

Tooling & Debugging

  • Query logging

Safety & Security

Sensitive Value Flagging

  • Flag sensitive fields (e.g. passwords, tokens, secrets) so they are automatically redacted in logs and debug output
  • Attribute-based opt-in: #[sensitive] on model fields marks values that must never appear in plaintext outside the database
  • All logging, query tracing, and error messages strip or mask flagged values
  • Prevents accidental credential leakage in application logs, query dumps, and diagnostics

Trusted vs Untrusted Input

  • Distinguish between values originating from untrusted user input and values produced internally by the query engine (e.g. literal numbers, generated keys)
  • Engine-produced values can skip escaping/parameterization since they are known-safe, reducing unnecessary overhead
  • Untrusted input continues to be parameterized or escaped to prevent SQL injection
  • Enables more efficient SQL generation without weakening safety guarantees for external data

Notes

The roadmap documents describe potential enhancements and missing features. For information about what’s currently implemented, refer to the user guide or test the API directly.

Composite Key Support

Overview

Toasty has partial composite key support. Basic CRUD operations work for models with composite primary keys (both field-level #[key] and model-level #[key(partition = ..., local = ...)]), but several engine optimizations, relationship patterns, and driver operations panic or fall back when encountering composite keys.

This document catalogs the gaps, surveys how other ORMs handle composite keys, identifies common SQL patterns that require composite key support, and proposes a phased implementation plan.

Current State

What Works

Schema definition — Two syntaxes for composite keys:

#![allow(unused)]
fn main() {
// Field-level: multiple #[key] attributes
#[derive(Debug, toasty::Model)]
struct Foo {
    #[key]
    one: String,
    #[key]
    two: String,
}

// Model-level: partition/local keys (designed for DynamoDB compatibility)
#[derive(Debug, toasty::Model)]
#[key(partition = user_id, local = id)]
struct Todo {
    #[auto]
    id: uuid::Uuid,
    user_id: uuid::Uuid,
    title: String,
}
}

Generated query methods for composite keys:

  • filter_by_<field1>_and_<field2>() — filter by both key fields
  • get_by_<field1>_and_<field2>() — get a single record by both keys
  • filter_by_<field1>_and_<field2>_batch() — batch get by key tuples
  • filter_by_<partition_field>() — filter by partition key alone
  • Comparison operators on local keys: gt(), ge(), lt(), le(), ne(), eq()

Database support:

  • SQL databases (SQLite, PostgreSQL, MySQL): composite primary keys via field-level #[key]
  • DynamoDB: partition/local key syntax (max 2 keys: 1 partition + 1 local)

Test coverage:

  • one_model_composite_key::batch_get_by_key — basic CRUD with field-level composite keys
  • one_model_query — partition/local key queries with range operators
  • has_many_crud_basic::has_many_when_fk_is_composite — HasMany with composite FK (working)
  • embedded — composite keys with embedded struct fields
  • examples/composite-key/ — end-to-end example application

What Does Not Work

The following locations contain todo!(), assert!(), or panic!() that block composite key usage:

Engine Simplification (5 locations)

FileLineIssue
engine/simplify/expr_binary_op.rs23-25todo!("handle composite keys") when simplifying equality on model references with composite PKs
engine/simplify/expr_binary_op.rs43-45todo!("handle composite keys") when simplifying binary ops on composite FK fields
engine/simplify/expr_in_list.rs30-32todo!() when optimizing IN-list expressions for models with composite PKs
engine/simplify/lift_in_subquery.rs92-96assert_eq!(len, 1, "TODO: composite keys") — subquery lifting restricted to single-field FKs
engine/simplify/lift_in_subquery.rs109-111, 145-148, 154-157Three more todo!("composite keys") in BelongsTo and HasOne subquery lifting
engine/simplify/rewrite_root_path_expr.rs18-19todo!("composite primary keys") when rewriting path expressions with key constraints

Engine Lowering (2 locations)

FileLineIssue
engine/lower/insert.rs90-92todo!() when lowering inserts with BelongsTo relations that have composite FKs
engine/lower.rs893-896Unhandled else branch when lowering relationships with composite FKs

DynamoDB Driver (4 locations)

FileLineIssue
driver-dynamodb/op/update_by_key.rs197assert!(op.keys.len() == 1) — batch update limited to single key
driver-dynamodb/op/delete_by_key.rs119-121panic!("only 1 key supported so far") — batch delete limited to single key
driver-dynamodb/op/delete_by_key.rs33panic!("TODO: support more than 1 unique index")
driver-dynamodb/op/create_table.rs113assert_eq!(1, index.columns.len()) — composite unique indexes unsupported

Stubbed Tests (2 tests)

FileTestStatus
has_many_crud_basic.rshas_many_when_pk_is_compositeEmpty — not implemented
has_many_crud_basic.rshas_many_when_fk_and_pk_are_compositeEmpty — not implemented

Design Constraints

  • Auto-increment is intentionally forbidden with composite keys. The schema verifier rejects #[auto(increment)] on composite PK tables. UUID auto-generation is the supported alternative.
  • DynamoDB limits composite keys to 2 columns (1 partition + 1 local). This is a DynamoDB limitation, not a Toasty limitation.

How Other ORMs Handle Composite Keys

Rust ORMs

Diesel — First-class composite key support. #[diesel(primary_key(col1, col2))] on the struct; find() accepts a tuple (val1, val2); Identifiable returns a tuple reference. BelongsTo works with composite keys via explicit foreign_key attribute. Compile-time type checking through generated code.

SeaORM — Supports composite keys via multiple #[sea_orm(primary_key)] field attributes. PrimaryKeyTrait::ValueType is a tuple. find_by_id() and delete_by_id() accept tuples. DAO pattern works fully. Composite foreign keys are less ergonomic but functional.

Python ORMs

SQLAlchemy — Gold standard for composite key support. Multiple primary_key=True columns define a composite PK. session.get(Model, (a, b)) for lookups. ForeignKeyConstraint at the table level handles composite FKs cleanly. Identity map uses tuples. All features (eager/lazy loading, cascades, relationships) work uniformly with composite keys.

Django — Added CompositePrimaryKey in Django 5.2 (2025) after years of surrogate-key-only design. pk returns a tuple. Model.objects.get(pk=(1, 2)) works. Composite FK support is still limited. Ecosystem (admin, REST frameworks, third-party packages) is catching up.

Tortoise ORM — No composite PK support. Surrogate key + unique constraint is the only option.

JavaScript/TypeScript ORMs

Prisma@@id([field1, field2]) defines composite PKs. Auto-generates compound field names (field1_field2) for findUnique/update/delete. Multi-field @relation(fields: [...], references: [...]) for composite FKs. Fully type-safe generated client.

TypeORM — Multiple @PrimaryColumn() decorators. All operations use object-based where clauses ({ field1: val1, field2: val2 }). @JoinColumn accepts an array for composite FKs. save() does upsert based on all PK fields.

Sequelize — Supports composite PK definition but findByPk() does not work with composite keys (must use findOne({ where })). Composite FK support requires workarounds or raw SQL.

DrizzleprimaryKey({ columns: [col1, col2] }) in the table config callback. foreignKey({ columns: [...], foreignColumns: [...] }) for composite FKs. No special find-by-PK method; all queries use explicit where + and(). SQL-first philosophy.

Java/Kotlin

Hibernate/JPA — Two approaches: @IdClass (flat fields + separate ID class) and @EmbeddedId (nested object). PK class must implement Serializable, equals(), hashCode(). @JoinColumns (plural) for composite FKs. @MapsId connects relationship fields to embedded ID fields. Full relationship support.

Exposed (Kotlin)PrimaryKey(col1, col2) in the table object. Only the DSL (SQL-like) API supports composite keys; the DAO (EntityClass) API does not. Relationships require manual joins.

Go ORMs

GORM — Multiple gorm:"primaryKey" tags. Composite FKs via foreignKey:Col1,Col2;references:Col1,Col2. Zero-value problem: PK column with value 0 is treated as “not set.”

Ent — No composite PK support by design (graph semantics, every node has a single ID). Unique composite indexes are the workaround.

Ruby

ActiveRecord (Rails 7.1+)primary_key: [:col1, :col2] in migrations, self.primary_key = [:col1, :col2] in model. find([a, b]) for lookups. query_constraints: [:col1, :col2] for composite FK associations. Pre-7.1 required the composite_primary_keys gem.

Cross-ORM Summary

ORMComposite PKComposite FKFind by PKRelationship Support
Diesel (Rust)YesYesTupleFull
SeaORM (Rust)YesPartialTupleFull
SQLAlchemy (Python)YesYesTupleFull
Django (Python)5.2+LimitedTuplePartial
Prisma (TS)YesYesGenerated compoundFull
TypeORM (TS)YesYesObjectFull
Sequelize (JS)YesPartialBrokenPartial
Drizzle (TS)YesYesManual whereManual
Hibernate/JPAYesYesID classFull
GORM (Go)YesYesWhere clauseFull
ActiveRecord (Ruby)7.1+7.1+ArrayPartial

Key takeaway: Mature ORMs (Diesel, SQLAlchemy, Hibernate) treat composite keys as first-class citizens where all operations work uniformly. The most common API pattern is tuple-based identity (find((a, b))). Composite foreign keys are universally harder than composite PKs — even established ORMs have rougher edges there.

Common SQL Patterns Requiring Composite Keys

1. Junction Tables (Many-to-Many)

The most common use case. The junction table’s PK is the combination of FKs to both related tables.

CREATE TABLE enrollment (
    student_id INTEGER NOT NULL REFERENCES student(id),
    course_id INTEGER NOT NULL REFERENCES course(id),
    enrolled_at TIMESTAMP DEFAULT NOW(),
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)
);

Junction tables often accumulate extra attributes (grade, enrolled_at, role) that make them first-class entities requiring full CRUD support, not just a hidden link table.

Toasty gap: Many-to-many relationships are listed as a separate roadmap item. Composite key support is a prerequisite — junction tables are inherently composite-keyed.

2. Multi-Tenant Data Isolation

Tenant ID appears as the first column in every composite PK, enabling partition-level isolation and efficient tenant-scoped queries.

CREATE TABLE tenant_document (
    tenant_id UUID NOT NULL REFERENCES tenant(id),
    document_id UUID NOT NULL DEFAULT gen_random_uuid(),
    title TEXT NOT NULL,
    PRIMARY KEY (tenant_id, document_id)
);

-- All queries are scoped: WHERE tenant_id = $1 AND ...

Why composite PKs: Enforces isolation at the database level. PK index prefix enables efficient tenant-scoped queries. Maps directly to DynamoDB’s partition/local key model.

Toasty gap: The #[key(partition = ..., local = ...)] syntax already models this. The gaps are in relationship handling when both sides use composite keys.

3. Time-Series Data

CREATE TABLE sensor_reading (
    sensor_id INTEGER NOT NULL,
    recorded_at TIMESTAMP NOT NULL,
    value DOUBLE PRECISION NOT NULL,
    PRIMARY KEY (sensor_id, recorded_at)
);

Why composite PKs: Natural ordering by sensor then time. Range scans on recorded_at within a sensor are efficient. Supports table partitioning by time ranges.

4. Hierarchical Data (Closure Table)

CREATE TABLE category_closure (
    ancestor_id INTEGER NOT NULL REFERENCES category(id),
    descendant_id INTEGER NOT NULL REFERENCES category(id),
    depth INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (ancestor_id, descendant_id)
);

5. Composite Foreign Keys Referencing Composite PKs

A child table references a parent with a composite PK — all parent PK columns appear in the child as FK columns.

CREATE TABLE order_item (
    order_id INTEGER NOT NULL REFERENCES "order"(id),
    item_number INTEGER NOT NULL,
    PRIMARY KEY (order_id, item_number)
);

CREATE TABLE order_item_shipment (
    id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL,
    item_number INTEGER NOT NULL,
    shipment_id INTEGER NOT NULL REFERENCES shipment(id),
    FOREIGN KEY (order_id, item_number)
        REFERENCES order_item(order_id, item_number)
);

Toasty gap: This is the hardest pattern. The engine simplification and lowering layers assume single-field FKs in multiple places. Fixing this is the core of the composite key work.

6. Versioned Records

CREATE TABLE document_version (
    document_id INTEGER NOT NULL REFERENCES document(id),
    version INTEGER NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (document_id, version)
);

7. Composite Unique Constraints vs Composite Primary Keys

Some applications prefer a surrogate PK with a composite unique constraint:

-- Surrogate PK + composite unique
CREATE TABLE enrollment (
    id SERIAL PRIMARY KEY,
    student_id INTEGER NOT NULL,
    course_id INTEGER NOT NULL,
    UNIQUE (student_id, course_id)
);

Trade-offs: surrogate PKs simplify FKs (single column) and URL design, but composite PKs are more storage-efficient and semantically meaningful. ORMs that don’t support composite PKs (Django pre-5.2, Tortoise, Ent) force the surrogate pattern.

Toasty should support both patterns — composite PKs for direct use and composite unique constraints for the surrogate approach.

Implementation Plan

Phase 1: Engine Simplification — Composite PK/FK Handling

Fix the todo!() panics in the engine simplification layer so that queries involving composite keys pass through without crashing, even if not fully optimized.

Files:

  • engine/simplify/expr_binary_op.rs — Handle composite PKs and FKs in equality simplification. For composite keys, generate an AND of per-field comparisons.
  • engine/simplify/expr_in_list.rs — Handle IN-list for composite PKs. Generate (col1, col2) IN ((v1, v2), (v3, v4)) or equivalent AND/OR tree.
  • engine/simplify/rewrite_root_path_expr.rs — Rewrite path expressions for composite PKs.

Approach: Where a single-field operation currently destructures let [field] = &fields[..], extend to iterate over all fields and combine with AND expressions.

Phase 2: Subquery Lifting for Composite FKs

Extend the subquery lifting optimization to handle composite foreign keys in BelongsTo and HasOne relationships.

Files:

  • engine/simplify/lift_in_subquery.rs — Remove the assert_eq!(len, 1) and handle multi-field FKs. For the optimization path, generate AND of per-field comparisons. For the fallback IN subquery path, generate tuple-based IN expressions or multiple correlated conditions.

Approach: The existing single-field logic maps fk_field.source -> fk_field.target. For composite keys, do the same for each field pair and combine with AND.

Phase 3: Engine Lowering — Composite FK Relationships

Fix insert and relationship lowering to handle composite FKs.

Files:

  • engine/lower/insert.rs — When lowering BelongsTo in insert operations, set all FK fields from the related record’s PK fields, not just one.
  • engine/lower.rs — Handle composite FKs in relationship lowering. Generate multi-column join conditions.

Phase 4: DynamoDB Driver — Batch Operations with Composite Keys

Files:

  • driver-dynamodb/op/update_by_key.rs — Support batch updates with multiple keys (iterate and issue individual UpdateItem calls if needed).
  • driver-dynamodb/op/delete_by_key.rs — Support batch deletes. Remove the single-key panic.
  • driver-dynamodb/op/create_table.rs — Support composite unique indexes (Global Secondary Indexes with multiple key columns where DynamoDB allows it).

Phase 5: Test Coverage

Fill in the stubbed tests and add new ones covering all composite key combinations:

Existing stubs to implement:

  • has_many_when_pk_is_composite — Parent has composite PK, child has single FK pointing to it
  • has_many_when_fk_and_pk_are_composite — Both sides have composite keys

New tests to add:

TestDescription
composite_pk_crudFull CRUD (create, read, update, delete) on a model with 2+ key fields
composite_pk_three_fieldsComposite PK with 3 fields to test beyond the 2-field case
composite_fk_belongs_toBelongsTo where the FK is composite (references a composite PK)
composite_fk_has_oneHasOne with composite FK
composite_key_paginationCursor-based pagination with composite PK ordering
composite_key_batch_operationsBatch get/update/delete with composite keys
composite_key_scoped_queriesScoped queries (e.g., user.todos().filter_by_id(...)) with composite keys
composite_key_update_non_key_fieldsUpdate non-key fields on a composite-keyed model
composite_key_unique_constraintComposite unique constraint (not PK) behavior
junction_table_patternMany-to-many junction table with composite PK and extra attributes
multi_tenant_patternTenant-scoped models with (tenant_id, entity_id) composite PKs

Design Decisions

Tuple-Based Identity

Following Diesel and SQLAlchemy’s lead, composite key identity should be represented as tuples. The current generated methods (get_by_field1_and_field2(val1, val2)) are a good API. For batch operations, the tuple-of-references pattern (filter_by_field1_and_field2_batch([(&a, &b), ...])) is also solid.

AND Composition for Multi-Field Conditions

When a single-field operation like pk_field = value needs to become a composite operation, the standard approach is:

pk_field1 = value1 AND pk_field2 = value2

This maps cleanly to SQL WHERE clauses and DynamoDB key conditions. The engine’s stmt::ExprAnd already supports this.

IN-List with Composite Keys

For batch lookups, composite IN can be expressed as:

-- Row-value syntax (PostgreSQL, MySQL 8.0+, SQLite)
WHERE (col1, col2) IN ((v1a, v2a), (v1b, v2b))

-- Equivalent OR-of-ANDs (universal)
WHERE (col1 = v1a AND col2 = v2a) OR (col1 = v1b AND col2 = v2b)

The OR-of-ANDs form is more portable across databases. The engine should generate this form and let the SQL serializer optimize to row-value syntax where supported.

Composite FK Optimization

The subquery lifting optimization (lift_in_subquery.rs) currently rewrites:

-- Before: subquery
user_id IN (SELECT id FROM users WHERE name = 'Alice')
-- After: direct comparison
user_id = <alice_id>

For composite FKs, the rewrite becomes:

-- Before: correlated subquery
(order_id, item_number) IN (SELECT order_id, item_number FROM order_items WHERE ...)
-- After: direct comparison
order_id = <val1> AND item_number = <val2>

The same optimization logic applies — just iterated over each FK field pair.

Testing Strategy

  • All new tests go in the integration suite (toasty-driver-integration-suite) to run against all database backends
  • Use the existing #[driver_test] macro for multi-database testing
  • Use the matrix testing infrastructure (composite dimension) where appropriate
  • Each phase should have passing tests before moving to the next phase
  • No unit tests in source code per project convention

Query Ordering, Limits & Pagination

Overview

Toasty provides cursor-based pagination using keyset pagination, which offers consistent performance and works well across both SQL and NoSQL databases. The implementation converts pagination cursors into WHERE clauses rather than using OFFSET, avoiding the performance issues of traditional offset-based pagination.

Potential Future Work

Multi-column Ordering Convenience

Add .then_by() method for chaining multiple order clauses:

#![allow(unused)]
fn main() {
let users = User::all()
    .order_by(User::FIELDS.status().asc())
    .then_by(User::FIELDS.created_at().desc())
    .paginate(10)
    .collect(&db)
    .await?;
}

Current workaround requires manual construction:

#![allow(unused)]
fn main() {
use toasty::stmt::OrderBy;

let order = OrderBy::from([
    Post::FIELDS.status().asc(),
    Post::FIELDS.created_at().desc(),
]);

let posts = Post::all()
    .order_by(order)
    .collect(&db)
    .await?;
}

Implementation:

  • File: toasty-codegen/src/expand/query.rs
  • Add .then_by() method to query builder
  • Complexity: Medium

Direct Limit Method

Expose .limit() for non-paginated queries:

#![allow(unused)]
fn main() {
let recent_posts: Vec<Post> = Post::all()
    .order_by(Post::FIELDS.created_at().desc())
    .limit(5)
    .collect(&db)
    .await?;
}

Implementation:

  • File: toasty-codegen/src/expand/query.rs
  • Generate .limit() method
  • Complexity: Low

Last Convenience Method

Get the last matching record:

#![allow(unused)]
fn main() {
let last_user: Option<User> = User::all()
    .order_by(User::FIELDS.created_at().desc())
    .last(&db)
    .await?;
}

Implementation:

  • File: toasty-codegen/src/expand/query.rs
  • Generate .last() method
  • Complexity: Low

Testing

Additional Test Coverage

Tests that could be added:

  • Multi-column ordering

    • Verify correct ordering with multiple columns
    • Test tie-breaking behavior
  • Direct .limit() method (when implemented)

    • Non-paginated queries with limit
    • Verify correct number of results
  • .last() convenience method (when implemented)

    • Returns last matching record
    • Returns None when no matches
  • Edge cases

    • Empty results with pagination
    • Single page results (no next/prev cursors)
    • Pagination beyond last page
    • Large page sizes
    • Page size of 1

Database-Specific Considerations

SQL Databases

  • MySQL: Uses LIMIT n for pagination (keyset approach via WHERE)
  • PostgreSQL: Uses LIMIT n for pagination (keyset approach via WHERE)
  • SQLite: Uses LIMIT n for pagination (keyset approach via WHERE)

All SQL databases use keyset pagination (WHERE clauses with cursors) rather than OFFSET for consistent performance.

NoSQL Databases

  • DynamoDB:
    • Limited ordering support (only on sort keys)
    • Pagination via LastEvaluatedKey
    • Cursor-based approach maps well to DynamoDB’s native pagination
    • Needs validation and testing

How Keyset Pagination Works

Instead of using OFFSET, Toasty converts cursors to WHERE clauses:

-- Traditional OFFSET (slow for large offsets)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET 10000;

-- Toasty's cursor approach (always fast)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10;

This provides:

  • Consistent Performance: O(log n) regardless of page number
  • Stable Results: New records don’t shift pagination boundaries
  • Database Agnostic: Works efficiently on NoSQL databases
  • Real-time Friendly: Handles concurrent insertions gracefully

Notes

  • Cursors (stmt::Expr) can be serialized at the application level if needed for web APIs
  • Pagination requires an explicit ORDER BY clause to ensure consistent results
  • Multi-column ordering works today via manual OrderBy construction
  • The .then_by() convenience method would improve ergonomics but isn’t essential

Query Constraints & Filtering

Overview

This document identifies gaps in Toasty’s query constraint support compared to mature ORMs, and outlines potential additions for building web applications.

Terminology

A “query constraint” refers to any predicate used in the WHERE clause of a query. In Toasty, constraints are built using:

  • Generated filter methods (Model::filter_by_<field>()) for indexed/key fields
  • Generic .filter() method accepting Expr<bool> for arbitrary conditions
  • Model::FIELDS.<field>() paths combined with comparison methods (.eq(), .gt(), etc.)

Core AST Support Without User API

These expression types exist in toasty-core (crates/toasty-core/src/stmt/expr.rs) and have SQL serialization, but lack a typed user-facing API on Path<T> or Expr<T>:

ExpressionCore ASTSQL SerializedUser APINotes
LIKEExprPattern::LikeYesNoneSQL serialization exists
Begins WithExprPattern::BeginsWithYesNoneConverted to LIKE 'prefix%' in SQL
EXISTSExprExistsYesNone on user APIUsed internally by engine
COUNTExprFunc::CountYesNoneInternal use only

ORM Comparison

The following table compares Toasty’s constraint support against 8 mature ORMs, highlighting missing features:

| Feature | Toasty | Prisma | Drizzle | Django | SQLAlchemy | Diesel | SeaORM | Hibernate | |—|—|—|—|—|—|—|—|—|—| | Set Operations | | | | | | | | | | NOT IN | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Range | | | | | | | | | | BETWEEN | No | Via gt+lt | Yes | Yes | Yes | Yes | Yes | Yes | | String Operations | | | | | | | | | | LIKE | AST only | Via contains | Yes | Yes | Yes | Yes | Yes | Yes | | Contains (substring) | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Starts with | AST only | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Ends with | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Case-insensitive (ILIKE) | No | Yes | Yes | Yes | Yes | Pg only | No | Manual | | Regex | No | No | No | Yes | Yes | No | No | No | | Full-text search | No | Preview | No | Yes (Pg) | Dialect | Crate | No | Extension | | Relation Filtering | | | | | | | | | | Filter by related fields | No | Yes | Via join | Yes | Yes | Via join | Via join | Via join | | Has related (some/none/every) | No | Yes | Via exists | Via exists | Yes | Via exists | Via join | Via exists | | Aggregation | | | | | | | | | | COUNT / SUM / AVG / etc. | No | Limited | Yes | Yes | Yes | Yes | Yes | Yes | | GROUP BY | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | HAVING | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Advanced | | | | | | | | | | Field-to-field comparison | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Arithmetic in queries | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Raw SQL escape hatch | No | Full query | Inline | Multiple | Inline | Inline | Inline | Native query | | JSON field queries | No | Limited | Via raw | Yes | Yes | Pg | Via raw | No | | CASE / WHEN | No | No | No | Yes | Yes | No | No | Yes | | Dynamic/conditional filters | No | Spread undef | Pass undef | Chain | Chain | BoxableExpr | add_option | Build list |

Potential Future Work

Features with Existing Internal Support

These features have core AST and SQL serialization but need user-facing APIs:

String Pattern Matching

  • Core AST: ExprPattern::BeginsWith and ExprPattern::Like exist with SQL serialization
  • Needed:
    • Add ExprPattern::EndsWith and ExprPattern::Contains to core AST
    • Add .contains(), .starts_with(), .ends_with() on Path<String>
    • Add .like() for direct pattern matching
    • Handle LIKE special character escaping (%, _)
  • Files: crates/toasty/src/stmt/path.rs, crates/toasty-core/src/stmt/expr.rs
  • Use case: Search functionality (e.g., search users by name fragment)

NOT IN

  • Current: IN exists but no negated form
  • Needed: ExprNotInList or negate the InList expression, plus .not_in_list() user API
  • Files: crates/toasty/src/stmt/path.rs, crates/toasty-core/src/stmt/expr.rs
  • Use case: Exclusion lists (e.g., “exclude these IDs from results”)

Features Needing New Implementation

Case-Insensitive String Matching

  • Current: No support at any layer
  • Needed: ILIKE support in SQL serialization (PostgreSQL native, LOWER() wrapper for SQLite/MySQL), plus user API
  • Design consideration: How to handle cross-database differences (ILIKE is Pg-only, LOWER()+LIKE is universal but slower)
  • Reference: Prisma (mode: 'insensitive'), Django (__iexact, __icontains)
  • Use case: User-facing search (e.g., email lookup, name search)

BETWEEN / Range Queries

  • Current: Users must combine .ge() and .le() manually
  • Needed: Syntactic sugar over AND(ge, le), or a dedicated ExprBetween
  • File: crates/toasty/src/stmt/path.rs
  • Reference: Drizzle (between()), Django (__range), Diesel (.between())
  • Use case: Date ranges, price ranges, numeric filtering

Relation/Association Filtering

  • Current: Scoped queries exist but no way to filter a top-level query by related model fields
  • Needed: JOIN or EXISTS subquery generation in the engine, plus user API design
  • Complexity: High - requires significant engine work
  • Reference: Prisma (some/none/every), Django (__ traversal), SQLAlchemy (.any()/.has())
  • Use case: Filtering parents by child attributes (e.g., “users who have at least one order over $100”)

Field-to-Field Comparison

  • Current: Path::eq() requires IntoExpr<T>, which accepts values but should also accept paths
  • Needed: Ensure Path<T> implements IntoExpr<T> and codegen supports cross-field comparisons
  • Reference: Django (F() expressions), SQLAlchemy (column comparison)
  • Use case: Comparing two columns (e.g., “updated_at > created_at”, “balance > minimum_balance”)

Arithmetic Operations in Queries

  • Current: No support - BinaryOp only includes comparison operators (Eq, Ne, Gt, Ge, Lt, Le)
  • Needed:
    • Add arithmetic operators to AST: Add, Subtract, Multiply, Divide, Modulo
    • SQL serialization for arithmetic expressions (standard across databases)
    • User API to build arithmetic expressions (e.g., .add(), .multiply(), operator overloading, or expression builder)
    • Type handling for arithmetic results (ensure type safety)
  • Files: crates/toasty-core/src/stmt/op_binary.rs, crates/toasty-core/src/stmt/expr.rs, crates/toasty/src/stmt/path.rs
  • Reference:
    • Django: F('price') * F('quantity') > 100
    • SQLAlchemy: column('price') * column('quantity') > 100
    • Diesel: price.eq(quantity * 2)
    • Drizzle: sqlprice * quantity > 100``
  • Use cases:
    • Computed comparisons: WHERE age <= 2 * years_in_school
    • Price calculations: WHERE price * quantity > 1000
    • Time differences: WHERE (end_time - start_time) > 3600
    • Percentage calculations: WHERE (actual / budget) * 100 > 110
    • Complex business rules: WHERE (base_price * (1 - discount_rate)) > minimum_price
  • Design considerations:
    • Should arithmetic create new expression types or extend BinaryOp?
    • How to handle type coercion (int vs float, time arithmetic)?
    • Support for parentheses and operator precedence
    • Whether to support on SELECT side (computed columns) or just WHERE clauses initially

Aggregate Queries

  • Current: ExprFunc::Count exists internally but is not user-facing
  • Needed: User-facing API, return type handling, integration with GROUP BY
  • Complexity: High - requires significant API design
  • Reference: Django’s annotation system, SQLAlchemy’s func
  • Use case: Dashboards, analytics, summary views, pagination metadata

GROUP BY / HAVING

  • Current: No support at any layer
  • Needed: AST additions, SQL generation, engine support, user API
  • Complexity: High
  • Use case: Aggregate queries, reports, analytics, dashboards

Raw SQL Escape Hatch

  • Current: No support
  • Needed: Safe API for parameterized raw SQL fragments within typed queries
  • Design consideration: Full raw queries vs. raw fragments within typed queries vs. both
  • Reference: Drizzle (sql`...` templates), SQLAlchemy (text()), Diesel (sql())
  • Use case: Edge cases that the ORM can’t express

Dynamic / Conditional Query Building

  • Current: Users can chain .filter() calls, but no ergonomic way to skip filters when parameters are None
  • Needed: Pattern for optional filters
  • Reference: SeaORM (Condition::add_option()), Prisma (spread undefined), Diesel (BoxableExpression)
  • Use case: Search forms, filter UIs, API endpoints with optional parameters

Full-Text Search

  • Current: No support
  • Complexity: High - database-specific implementations (PostgreSQL tsvector, MySQL FULLTEXT, SQLite FTS5)
  • Design consideration: May be best as database-specific extensions rather than a unified API
  • Use case: Content-heavy applications (blogs, e-commerce, documentation sites)

JSON Field Queries

  • Current: No support
  • Complexity: High - needs path traversal syntax, type handling, database-specific operators
  • Dependency: Depends on JSON/JSONB data type support
  • Reference: Django (field__key__subkey), SQLAlchemy (column['key'])
  • Use case: Flexible/schemaless data within relational databases

Advanced / Niche Features

Regex Matching

  • Use case: Power-user filtering, data validation queries
  • Reference: Django (__regex, __iregex), SQLAlchemy (regexp_match())

Array/Collection Operations

  • Use case: PostgreSQL array columns, MongoDB array fields
  • Dependency: Requires array type support first
  • Reference: Prisma (has, hasEvery, hasSome), Django (ArrayField lookups)

CASE/WHEN Expressions

  • Use case: Conditional logic within queries for complex business rules
  • Reference: Django (When()/Case()), SQLAlchemy (case())

Subquery Comparisons (ALL/ANY/SOME)

  • Use case: Advanced filtering like “price > ALL(SELECT price FROM competitors)”
  • Reference: Hibernate, SQLAlchemy (all_(), any_())

IS DISTINCT FROM

  • Use case: NULL-safe comparisons without special-casing IS NULL
  • Reference: SQLAlchemy (only ORM with native support)

Implementation Considerations

Based on the analysis above, the following groupings maximize user value:

Group 1: Expose Existing Internals Items with core AST and SQL serialization that only need user-facing methods:

  • .not_in_list() on Path<T> (negate existing InList)

Estimated scope: ~50 lines of user-facing API code + integration tests

Group 2: String Operations Partial AST support that needs completion and exposure:

  • Add ExprPattern::EndsWith and ExprPattern::Contains to core AST
  • Add SQL serialization for new pattern variants
  • Add .contains(), .starts_with(), .ends_with() to Path<String>
  • Handle LIKE special character escaping

Estimated scope: ~200 lines across core + SQL + user API

Group 3: Ergonomic Improvements

  • Case-insensitive matching (ILIKE / LOWER() wrapper)
  • .between() convenience method
  • .like() direct exposure
  • Conditional/optional filter building helpers

Group 4: Structural Features Requires deeper engine work:

  • Relation filtering (JOIN/EXISTS generation)
  • Aggregate functions (user-facing COUNT/SUM/etc.)
  • GROUP BY / HAVING
  • Raw SQL escape hatch

Reference Implementation Goals

A comprehensive query constraint system would allow users to:

  1. Search strings by substring, prefix, and suffix (case-sensitive and case-insensitive)
  2. Use NOT IN with literal lists and subqueries
  3. Filter by related model attributes
  4. Use at least basic aggregate queries (COUNT)
  5. Fall back to raw SQL for anything the ORM can’t express

This would put Toasty on par with the filtering capabilities of Diesel and SeaORM, and cover the vast majority of queries needed by typical web applications.

Query Engine Optimization Roadmap

Overview

The query engine currently performs simplification as a single VisitMut pass that applies local rewrite rules bottom-up. This works well for straightforward transformations (constant folding, tuple decomposition, association rewriting), but it has structural limitations as the optimizer takes on more complex work.

This document tracks improvements to the query engine’s optimization infrastructure, focusing on predicate simplification and the compilation pipeline.

Current State

Simplification Pass

The simplifier (engine/simplify.rs) implements VisitMut and applies rules in a single bottom-up traversal. Each node is visited once, simplified, and then its parent is simplified with the updated children.

What works well:

  • Local rewrites: constant folding, boolean identity, tuple decomposition
  • Association rewriting and subquery lifting
  • Match elimination (distributing binary ops over match arms)

Structural limitations:

  • Rules fire during the walk, so ordering matters. A rule that produces expressions consumable by another rule only works if the consumer fires later in the same walk or the walk is re-run.
  • Global analysis (e.g., detecting contradictions across an entire AND conjunction) must be done inline during the walk, mixing local and global concerns.
  • Expensive analyses run on every AND node encountered, even when only a small fraction would benefit.

Contradicting Equality Detection

The simplifier currently detects a = c1 AND a = c2 (where c1 != c2) inline in simplify_expr_and. This is O(n^2) in the number of equality predicates within a single AND. While operand lists are typically small, the analysis runs on every AND node during the walk, including intermediate nodes that are about to be restructured by other rules.

Planned Improvements

Phase 1: Post-Lowering Optimization Pass

Move expensive predicate analysis out of the per-node simplifier and into a dedicated pass that runs once after lowering, against the HIR representation. At this point the statement is fully resolved to table-level expressions and the predicate tree is stable — no more association rewrites or field resolution changes will restructure it.

This pass would handle:

  • Contradicting equality pruning
  • Redundant predicate elimination
  • Tautology detection
  • ExprLet inlining (currently done at the end of lower_returning; should move here so all post-lowering expression rewrites live in one place)

Why after lowering: Before lowering, predicates reference model-level fields and contain relationship navigation that the lowering phase rewrites. Running global analysis before this rewriting is wasted work — the predicate tree will change. After lowering, the predicates are in their final structural form (column references, subqueries), so analysis results are stable.

Phase 2: Equivalence Classes

Build equivalence classes from equality predicates before running constraint analysis. When the optimizer sees a = b AND b = c, it should know that a, b, and c are all equivalent, enabling:

  • Transitive contradiction detection: a = b AND b = 5 AND a = 7 is a contradiction (a must be both 5 and 7), even though no single pair of predicates directly conflicts.
  • Predicate implication: a = 5 AND a > 3 — the second predicate is implied and can be dropped.
  • Join predicate inference: If a = b and a filter constrains a, the same constraint applies to b.

Equivalence classes are a standard technique in query optimizers. The idea is to union-find expressions that are constrained to be equal, then check each class for conflicting constant bindings or range constraints.

Phase 3: Structured Constraint Analysis

Replace ad-hoc pairwise comparisons with a more structured representation of constraints. For each expression (or equivalence class), maintain:

  • Constant binding: The expression must equal a specific value
  • Range bounds: Upper/lower bounds from inequality predicates
  • NOT-equal set: Values the expression cannot be (from != predicates)

With this structure, contradiction detection becomes a property check rather than a search: an expression with two different constant bindings, or a constant binding outside its range bounds, is immediately contradictory.

Predicate Normalization (Not Full DNF)

Full conversion to disjunctive normal form (DNF) — where the entire predicate becomes an OR of ANDs — risks exponential blowup. A predicate with N AND-connected clauses of M OR-options each expands to M^N terms. This makes full DNF impractical as a general-purpose transformation.

Instead, apply targeted normalization:

  • Flatten associative operators: Merge nested AND(AND(...), ...) and OR(OR(...), ...) into flat lists (already done).
  • Canonicalize comparison direction: Ensure constants are on the right side of comparisons (already done).
  • Limited distribution: Distribute AND over OR only in specific cases where it enables index utilization or constraint extraction, with a size budget to prevent blowup.
  • OR-of-equalities to IN-list: Convert a = 1 OR a = 2 OR a = 3 to a IN (1, 2, 3) for more efficient execution.

The goal is to normalize enough for the constraint analysis to work without paying the exponential cost of full DNF.

Design Principles

  • Run expensive analysis once, not per-node. The current simplifier intermixes cheap local rewrites with expensive global analysis. Separate them.
  • Analyze after the predicate tree is stable. Post-lowering is the right point — predicates are resolved to columns and won’t be restructured.
  • Build structure, then query it. Constructing equivalence classes and constraint summaries up front makes individual checks cheap.
  • Budget-limited transformations. Any rewrite that can expand expression size (distribution, case expansion) must have a size limit.