Toasty Architecture Overview
Project Structure
Toasty is an ORM for Rust that supports SQL and NoSQL databases. The codebase is a Cargo workspace with separate crates for each layer.
Crates
1. toasty
User-facing crate with query engine and runtime.
Key Components:
engine/: Multi-phase query compilation and execution pipeline- See Query Engine Architecture for detailed documentation
stmt/: Typed statement builders (wrappers aroundtoasty_core::stmttypes)relation/: Relationship abstractions (HasMany, BelongsTo, HasOne)model.rs: Model trait and ID generation
Query Execution Pipeline (high-level):
Statement AST → Simplify → Lower → Plan → Execute → Results
The engine compiles queries into a mini-program of actions executed by an interpreter. For details on HIR, MIR, and the full compilation pipeline, see Query Engine Architecture.
2. toasty-core
Shared types used by all other crates: schema representations, statement AST, and driver interface.
Key Components:
schema/: Model and database schema definitionsapp/: Model-level definitions (fields, relations, constraints)db/: Database-level table and column definitionsmapping/: Maps between models and database tablesbuilder/: Schema construction utilitiesverify/: Schema validation
stmt/: Statement AST nodes for queries, inserts, updates, deletesdriver/: Driver interface, capabilities, and operations
3. toasty-codegen
Generates Rust code from the #[derive(Model)] macro.
Key Components:
schema/: Parses model attributes into schema representationexpand/: Generates implementations for modelsmodel.rs: Model trait implementationquery.rs: Query builder methodscreate.rs: Create/insert buildersupdate.rs: Update buildersrelation.rs: Relationship methodsfields.rs: Field accessorsfilters.rs: Filter method generationschema.rs: Runtime schema generation
4. toasty-driver-*
Database-specific driver implementations.
Supported Databases:
toasty-driver-sqlite: SQLite implementationtoasty-driver-postgresql: PostgreSQL implementationtoasty-driver-mysql: MySQL implementationtoasty-driver-dynamodb: DynamoDB implementation
5. toasty-sql
Converts statement AST to SQL strings. Used by SQL-based drivers.
Key Components:
serializer/: SQL generation with dialect supportflavor.rs: Database-specific SQL dialectsstatement.rs: Statement serializationexpr.rs: Expression serializationty.rs: Type serialization
stmt/: SQL-specific statement types
Further Reading
- Query Engine Architecture - Query compilation and execution pipeline
- Type System - Type system design and conversions
Toasty Query Engine
This document provides a high-level overview of the Toasty query execution engine for developers working on engine internals. It describes the multi-phase pipeline that transforms user queries into database operations.
Overview
The Toasty engine is a multi-database query compiler and runtime that executes ORM operations across SQL and NoSQL databases. It transforms a user’s query (represented as a Statement AST) into a sequence of executable actions through multiple compilation phases.
Execution Model
The final output is a mini program executed by an interpreter. Think of it like a small virtual machine or bytecode interpreter, though there is no control flow (yet):
- Instructions (Actions): Operations like “execute this SQL”, “filter these results”, “merge child records into parents”
- Variables: Storage slots, or registers, that hold intermediate results between instructions
- Linear Execution: Instructions run in sequence (no control flow - no branches or loops, yet). Eventually, the interpreter will be smart about concurrency and execute independent operations in parallel when possible.
- Interpreter: The engine executor reads each instruction, fetches inputs from variables, performs the operation, and stores outputs back to variables
For example, loading users with their todos:
SELECT users.id, users.name, (
SELECT todos.id, todos.title
FROM todos
WHERE todos.user_id = users.id
) FROM users WHERE ...
compiles to a program like:
$0 = ExecSQL("SELECT * FROM users WHERE ...")
$1 = ExecSQL("SELECT * FROM todos WHERE user_id IN ...")
$2 = NestedMerge($0, $1, by: user_id)
return $2
The compilation pipeline below transforms user queries into this instruction/variable representation. Each phase brings the query closer to this final executable form.
Compilation Pipeline
User Query (Statement AST)
↓
[Verification] - Validate statement structure (debug builds only)
↓
[Simplification] - Normalize and optimize the statement AST
↓
[Lowering] - Convert to HIR for dependency analysis
↓
[Planning] - Build MIR operation graph
↓
[Execution Planning] - Convert to action sequence with variables
↓
[Execution] - Run actions against database driver
↓
Result Stream
Phase 1: Simplification
Location: engine/simplify.rs
The simplification phase normalizes and optimizes the statement AST before planning.
Key Transformations
- Association Rewriting: Converts relationship navigation (e.g.,
user.todos()) into explicit subqueries with foreign key filters - Subquery Lifting: Transforms
IN (SELECT ...)expressions into more efficient join-like operations - Expression Normalization: Simplifies complex expressions (e.g., flattening nested ANDs/ORs, constant folding)
- Path Expression Rewriting: Resolves field paths and relationship traversals into explicit column references
- Empty Query Detection: Identifies queries that will return no results
Example: Association Simplification
#![allow(unused)]
fn main() {
// user.todos().delete() generates:
Delete {
from: Todo,
via: User::todos, // relationship traversal
...
}
// After simplification:
Delete {
from: Todo,
filter: todo.user_id IN (SELECT id FROM users WHERE ...)
}
}
Converting relationship navigation into explicit filters early means downstream phases only need to handle standard query patterns with filters and subqueries - no special-case logic for each relationship type.
Phase 2: Lowering
Location: engine/lower.rs
Lowering converts a simplified statement into HIR (High-level Intermediate Representation) - a collection of related statements with tracked dependencies.
Toasty tries to maximize what the target database can handle natively, only decomposing queries when necessary. For example, a query like User::find_by_name("John").todos().all() contains a subquery. SQL databases can execute this as SELECT * FROM todos WHERE user_id IN (SELECT id FROM users WHERE name = 'John'). DynamoDB cannot handle subqueries, so lowering splits this into two statements: first fetch user IDs, then query todos with those IDs.
The HIR tracks a dependency graph between statements - which statements depend on results from others, and which columns flow between them. This graph can contain cycles when preloading associations. For example:
SELECT users.id, users.name, (
SELECT todos.id, todos.title
FROM todos
WHERE todos.user_id = users.id
) FROM users WHERE ...
The users query must execute first to provide IDs for the todos subquery, but the todos results must be merged back into the user records. This creates a cycle: users → todos → users.
This lowering phase handles:
- Statement Decomposition: Breaking queries into sub-statements when the database can’t handle them directly
- Dependency Tracking: Which statements must execute before others
- Argument Extraction: Identifying values passed between statements (e.g., a loaded model’s ID used in a child query’s filter)
- Relationship Handling: Processing relationship loads and nested queries
Lowering Algorithm
Lowering transforms model-level statements to table-level statements through a visitor pattern that rewrites each part of the statement AST:
- Table Resolution:
InsertTarget::Model,UpdateTarget::Model, etc. become their corresponding table references - Returning Clause Transformation:
Returning::Modelis replaced withReturning::Exprcontaining the expanded column expressions - Field Reference Resolution: Model field references are converted to table column references
- Include Expansion: Association includes become subqueries in the returning clause
The TableToModel mapping (built during schema construction) drives the transformation. It contains an expression for each model field that maps to its corresponding table column(s). This supports more than a 1-1 mapping—a model field can be derived from multiple columns or a column can map to multiple fields. Association fields are initialized to Null in this mapping.
When lowering encounters a Returning::Model { include } clause:
- Call
table_to_model.lower_returning_model()to get the base column expressions - For each path in the include list, call
build_include_subquery()to generate a subquery that selects the associated records - Replace the
Nullplaceholder in the returning expression with the generated subquery
Lowering Examples
Example 1: Simple query
Given a model with a renamed column:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key] #[auto] id: u64,
#[column(name = "first_and_last_name")]
name: String,
email: String,
}
}
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Note: At model-level, no specific fields are selected
// After lowering
SELECT id, first_and_last_name, email FROM users WHERE id = ?
}
Example 2: Query with association
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
INCLUDE todos
// After lowering
SELECT id, first_and_last_name, email, (
SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
}
Phase 3: Planning
Location: engine/plan.rs
Planning converts HIR into MIR (Middle-level Intermediate Representation) - a directed acyclic graph of operations, both database queries and in-memory transformations. Edges represent data dependencies: an operation cannot execute until all operations it depends on have completed and produced their results.
Since the HIR graph can contain cycles, planning must break them to produce a DAG. This is done by introducing intermediate operations that batch-load data and merge results (e.g., NestedMerge).
Operation Types
The MIR supports various operation types (see engine/mir.rs for details):
SQL operations:
ExecStatement- Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)ReadModifyWrite- Optimistic locking (read, modify, conditional write). Exists as a separate operation because the read result must be processed in-memory to compute the write, whichExecStatementcannot express.
Key-value operations (NoSQL):
GetByKey,DeleteByKey,UpdateByKey- Direct key accessQueryPk,FindPkByIndex- Key lookups via queries or indexes
In-memory operations:
Filter,Project- Transform and filter resultsNestedMerge- Merge child records into parent recordsConst- Constant values
Phase 4: Execution Planning
Location: engine/plan/execution.rs
Execution planning converts the MIR logical plan into a concrete sequence of actions that can be executed. This phase:
- Assigns variable slots for storing intermediate results
- Converts each MIR operation into an execution action
- Maintains topological ordering to ensure dependencies execute first
Action Types
Actions mirror MIR operations but include concrete variable bindings:
SQL actions:
ExecStatement: Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)ReadModifyWrite: Optimistic locking (read, modify, conditional write)
Key-value actions (NoSQL):
GetByKey: Batch fetch by primary keyDeleteByKey: Delete records by primary keyUpdateByKey: Update records by primary keyQueryPk: Query primary keysFindPkByIndex: Find primary keys via secondary index
In-memory actions:
Filter: Apply in-memory filter to a variable’s dataProject: Transform recordsNestedMerge: Merge child records into parent recordsSetVar: Set a variable to a constant value
Phase 5: Execution
Location: engine/exec.rs
The execution phase is the interpreter that runs the compiled program. It iterates through actions, reading inputs from variables, performing operations, and storing outputs back to variables.
Execution Loop
The interpreter follows a simple pattern:
- Initialize variable storage
- For each action in sequence:
- Load input data from variables
- Perform the operation (database query or in-memory transform)
- Store the result in the output variable
- Return to the user the result from the final variable (the last action’s output)
Variable Lifetime
The engine tracks how many times each variable is referenced by downstream actions. A variable may be used by multiple actions (e.g., the same user records merged with both todos and comments). When the last action that needs a variable completes, the variable’s value is dropped to free memory.
Driver Interaction
The execution phase is the only part of the engine that communicates with database drivers. The driver interface is intentionally simple: a single exec() method that accepts an Operation enum. This enum includes variants for both SQL operations (QuerySql, Insert) and key-value operations (GetByKey, QueryPk, FindPkByIndex, DeleteByKey, UpdateByKey).
Each driver implements whichever operations it supports. SQL drivers handle QuerySql natively while key-value drivers handle GetByKey, QueryPk, etc. The planner uses driver.capability() to determine which operations to generate for each database type.
Toasty Type System Architecture
Overview
Toasty uses Rust’s type system in the public API with both concrete types and generics. The query engine tracks the type of value each statement evaluates to using stmt::Type. This document describes how types flow through the system and the key components involved.
Type System Boundaries
Toasty has two distinct type systems with different responsibilities:
1. Rust-Level Type System (Compile-Time Safety)
At the Rust level, each model is a distinct type:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
name: String,
email: String,
}
#[derive(Model)]
struct Todo {
#[key]
#[auto]
id: u64,
user_id: u64,
title: String,
}
// Toasty generates type-safe field access preventing type mismatches:
User::get_by_email(&db, "john@example.com").await?; // ✓ String matches email field
User::filter_by_id(&user_id).filter(User::FIELDS.name().eq("John")).all(&db).await?; // ✓ String matches name field
// Type system prevents field/model confusion:
// User::FIELDS.title() // ← Compile error! User has no title field
// Todo::FIELDS.email() // ← Compile error! Todo has no email field
// User::FIELDS.name().eq(&todo_id) // ← Compile error! u64 doesn't match String
}
The query builder API maintains this type safety through generics and traits, preventing you from accidentally mixing model types or referencing non-existent fields. The API uses generic types (Statement<M>, Select<M>, etc.) that wrap toasty_core::stmt types.
2. Query Engine Type System (Runtime)
When db.exec(statement) is called, the generic <M> parameter is erased:
#![allow(unused)]
fn main() {
// Generated query builder returns a typed wrapper
let query: FindUserById = User::find_by_id(&id);
// .into() converts to Statement<User>
let statement: Statement<User> = query.into();
// At db.exec() - generic is erased, .untyped is extracted
pub async fn exec<M: Model>(&self, statement: Statement<M>) -> Result<ValueStream> {
engine::exec(self, statement.untyped).await // <- Only toasty_core::stmt::Statement
}
}
At this boundary, the statement becomes untyped (no Rust generic), but the engine tracks the type of value the statement evaluates to using stmt::Type. Initially, this remains at the model-level—a query for User evaluates to Type::List(Type::Model(user_model_id)). During lowering, these convert to structural record types for database execution.
Type Flow Through the System
Rust API → Query Builder → Engine Entry → Lowering/Planning → Execution
↓ ↓ ↓ ↓ ↓
Distinct Type-Safe Type::Model Type::Record stmt::Value
Types Generics (no generics) (typed)
(compile) (compile) (runtime) (runtime) (runtime)
At lowering, statements that evaluate to Type::Model(model_id) are converted to evaluate to Type::Record([field_types...]). This conversion enables the engine to work with concrete field types for database operations.
Detailed Architecture
Query Engine Entry Point
When the engine receives a toasty_core::stmt::Statement, it processes through verification, lowering, planning, and execution:
#![allow(unused)]
fn main() {
pub(crate) async fn exec(&self, stmt: Statement) -> Result<ValueStream> {
if cfg!(debug_assertions) {
self.verify(&stmt);
}
// Lower the statement to High-level intermediate representation
let hir = self.lower_stmt(stmt)?;
// Translate into a series of driver operations
let plan = self.plan_hir_statement(hir)?;
// Execute the plan
self.exec_plan(plan).await
}
}
Lowering Phase (Model-to-Table Transformation)
The lowering phase transforms statements from model-level to table-level representations.
Example 1: Simple query
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// Note: At model-level, no specific fields are selected
// After lowering
SELECT id, name, email FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([Type::Id, Type::String, Type::String]))
}
Example 2: Query with association
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User INCLUDE todos WHERE id = ?
// Evaluates to: Type::List(Type::Model(user_model_id))
// where todos field is Type::List(Type::Model(todo_model_id))
// After lowering
SELECT id, name, email, (
SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
// Evaluates to: Type::List(Type::Record([
// Type::Id, Type::String, Type::String,
// Type::List(Type::Record([Type::Id, Type::String, Type::Id]))
// ]))
}
Planning and Variable Types
During planning, the engine assigns variables to hold intermediate results (see Query Engine Architecture for details on the execution model). Each variable is registered with its type, which is always Type::List(...) or Type::Unit.
Execution
At execution time, the VarStore holds the type information from planning. When storing a value stream in a variable, the store associates the expected type with it. The value stream ensures each value it yields conforms to that type. This type information carries through to the final result returned to the user.
Type Inference
While statements entering the engine have known types, planning constructs new expressions—projections, filters, and merge qualifications—whose types aren’t explicitly declared. The engine must infer these types from the expression structure to register variables correctly.
Type inference is handled by ExprContext, which walks expression trees and determines their result types based on the schema. For example, a column reference’s type comes from the schema definition, and a record expression’s type is built from its field types.
#![allow(unused)]
fn main() {
// Create context for type inference
let cx = stmt::ExprContext::new_with_target(&*self.engine.schema, stmt);
// Infer type of an expression reference
let ty = cx.infer_expr_reference_ty(expr_reference);
// Infer type of a full expression with argument types
let ret = ExprContext::new_free().infer_expr_ty(expr.as_expr(), &args);
}
Design
Design documents for Toasty.
Batch Query Execution
Overview
Batch queries let users send multiple independent queries to the database in a single round-trip. The results come back as a typed tuple matching the input queries.
#![allow(unused)]
fn main() {
let (active_users, recent_posts) = toasty::batch((
User::find_by_active(true),
Post::find_recent(100),
)).exec(&db).await?;
// active_users: Vec<User>
// recent_posts: Vec<Post>
}
The batch composes all queries into a single Statement whose returning
expression is a record of subqueries. This means batch execution flows through
the existing exec path — no new executor methods, no new driver operations.
This design covers SQL databases only. DynamoDB support is out of scope.
New Trait: IntoStatement<T>
A single new trait bridges query builders to Statement<T>:
#![allow(unused)]
fn main() {
pub trait IntoStatement<T> {
fn into_statement(self) -> Statement<T>;
}
}
Query builders implement this for their model type. For example, UserQuery
implements IntoStatement<User>:
#![allow(unused)]
fn main() {
impl IntoStatement<User> for UserQuery {
fn into_statement(self) -> Statement<User> {
self.stmt.into()
}
}
}
The codegen already produces IntoSelect impls for query builders.
IntoStatement can be blanket-implemented for anything that implements
IntoSelect:
#![allow(unused)]
fn main() {
impl<T: IntoSelect> IntoStatement<T::Model> for T {
fn into_statement(self) -> Statement<T::Model> {
self.into_select().into()
}
}
}
Tuple implementations
Tuples of IntoStatement types implement IntoStatement by composing their
inner statements into a single select whose returning expression is a record of
subqueries:
#![allow(unused)]
fn main() {
impl<T1, T2, A, B> IntoStatement<(Vec<T1>, Vec<T2>)> for (A, B)
where
A: IntoStatement<T1>,
B: IntoStatement<T2>,
{
fn into_statement(self) -> Statement<(Vec<T1>, Vec<T2>)> {
let stmt_a = self.0.into_statement().untyped;
let stmt_b = self.1.into_statement().untyped;
// Build: SELECT (stmt_a), (stmt_b)
let query = stmt::Query::values(stmt::Expr::record([
stmt::Expr::subquery(stmt_a),
stmt::Expr::subquery(stmt_b),
]));
Statement::from_raw(query.into())
}
}
}
The resulting statement is equivalent to SELECT (subquery_1), (subquery_2).
At the Toasty AST level this is a Query whose returning body is a
Record([Expr::Stmt, Expr::Stmt]). The engine handles each subquery
independently during execution and packs the results into a single
Value::Record.
Tuple impls for arities 2 through 8 are generated with a macro.
Load for Tuples and Vec<T>
To deserialize the composed result, Load is implemented for Vec<T> and
for tuples:
#![allow(unused)]
fn main() {
impl<T: Load> Load for Vec<T> {
fn load(value: stmt::Value) -> Result<Self> {
match value {
Value::List(items) => items
.into_iter()
.map(T::load)
.collect(),
_ => Err(Error::type_conversion(value, "Vec<T>")),
}
}
}
impl<A: Load, B: Load> Load for (A, B) {
fn load(value: stmt::Value) -> Result<Self> {
match value {
Value::Record(mut record) => Ok((
A::load(record[0].take())?,
B::load(record[1].take())?,
)),
_ => Err(Error::type_conversion(value, "(A, B)")),
}
}
}
}
With these impls, Load for (Vec<User>, Vec<Post>) works automatically:
the outer tuple impl splits the record, then each Vec<T> impl iterates
the list and loads individual model instances.
User-Facing API
#![allow(unused)]
fn main() {
pub fn batch<T, Q: IntoStatement<T>>(queries: Q) -> Batch<T>
where
T: Load,
{
Batch {
stmt: queries.into_statement(),
}
}
pub struct Batch<T> {
stmt: Statement<T>,
}
impl<T: Load> Batch<T> {
pub async fn exec(self, executor: &mut dyn Executor) -> Result<T> {
use ExecutorExt;
let stream = executor.exec(self.stmt).await?;
let value = stream.next().await
.ok_or_else(|| Error::record_not_found("batch returned no results"))??;
T::load(value)
}
}
}
Batch::exec calls the regular ExecutorExt::exec method. The composed
statement flows through the standard engine pipeline. The result is a single
value (a record of lists) that T::load deserializes into the typed tuple.
Execution Flow
User code:
toasty::batch((UserQuery, PostQuery)).exec(&db)
IntoStatement for (A, B):
SELECT (SELECT ... FROM users WHERE ...), (SELECT ... FROM posts ...)
Engine pipeline (standard exec path):
lower → plan → exec
The engine recognizes Expr::Stmt subqueries in the returning
expression and executes each independently.
Result:
Value::Record([
Value::List([user1, user2, ...]),
Value::List([post1, post2, ...]),
])
Load for (Vec<User>, Vec<Post>):
(A::load(record[0]), B::load(record[1]))
→ (Vec<User>::load(list), Vec<Post>::load(list))
→ (vec![User::load(v1), ...], vec![Post::load(v1), ...])
Statement Changes
Statement<M> needs a way to construct from a raw stmt::Statement without
requiring M: Model:
#![allow(unused)]
fn main() {
impl<M> Statement<M> {
/// Build a statement from a raw untyped statement.
///
/// Used by batch composition where M may be a tuple, not a model.
pub(crate) fn from_raw(untyped: stmt::Statement) -> Self {
Self {
untyped,
_p: PhantomData,
}
}
}
}
The existing Statement::from_untyped requires M: Model (via IntoSelect).
from_raw has no bound on M and is pub(crate) so only internal code uses
it.
Engine Support
The engine needs to handle a Query whose returning expression is a record
of Expr::Stmt subqueries where each subquery returns multiple rows.
The lowerer already handles Expr::Stmt for association preloading (INCLUDE),
where subqueries get added to the dependency graph and executed as part of the
plan. Batch queries follow the same pattern: each Expr::Stmt in the returning
record becomes an independent subquery in the plan, and the exec phase collects
results into a Value::Record of Value::Lists.
If the existing lowerer does not handle bare subqueries in a returning record
(outside of an INCLUDE context), a small extension is needed to recognize this
pattern and plan it the same way.
Implementation Plan
Phase 1: IntoStatement trait and Load impls
- Add
IntoStatement<T>trait tocrates/toasty/src/stmt/ - Add blanket impl
IntoStatement<T::Model> for T: IntoSelect - Add
Load for Vec<T>andLoad for (A, B)(and higher arities via macro) - Add
Statement::from_raw - Export
IntoStatementfromlib.rsandcodegen_support
Phase 2: Batch API
- Add
toasty::batch()function andBatch<T>struct - Add tuple impls of
IntoStatement<(Vec<T1>, Vec<T2>, ...)>(via macro) - Wire
Batch::execthrough the standardExecutorExt::execpath
Phase 3: Engine support
- Verify that the lowerer handles
Expr::Stmtsubqueries in a returning record correctly (it may already work via theINCLUDEpath) - If not, extend the lowerer to plan bare record-of-subqueries statements
- Verify the exec phase packs subquery results into
Value::RecordofValue::Lists
Phase 4: Integration tests
- Batch two selects on different models
- Batch a select that returns rows with a select that returns empty
- Batch with filters, ordering, and limits
- Batch inside a transaction
- Batch of a single query (degenerates to normal execution)
Files Modified
| File | Change |
|---|---|
crates/toasty/src/stmt/into_statement.rs | New: IntoStatement<T> trait, blanket impl |
crates/toasty/src/stmt.rs | Add Statement::from_raw, re-export IntoStatement |
crates/toasty/src/load.rs | Add Load impls for Vec<T> and tuples |
crates/toasty/src/batch.rs | Add batch(), Batch<T>, tuple IntoStatement impls |
crates/toasty/src/lib.rs | Re-export batch, Batch, IntoStatement |
crates/toasty/src/engine/lower.rs | Handle record-of-subqueries in returning (if needed) |
Compile-Time Required Field Verification for create!
Problem
When a user omits a required field from a create! invocation, the error only
surfaces at runtime as a database NOT NULL constraint violation. We want a
compile-time error that names the missing field.
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: Id<User>,
name: String, // required
email: String, // required
bio: Option<String>, // optional (nullable)
#[default(0)]
login_count: i64, // optional (has default)
}
// Should produce a compile error naming `email`
toasty::create!(User, { name: "Carl" }).exec(&db).await;
}
Design
Generate a hidden ZST verification chain alongside each model. The create!
macro expands to call the verifier in addition to the real builder. The verifier
uses typestate to track which required fields have been set and
#[diagnostic::on_unimplemented] to produce per-field error messages. The real
builder is unchanged.
What makes a field “required”
A field requires explicit user input on create unless ANY of these hold:
- The type is
Option<T>(nullable) - The field has
#[auto] - The field has
#[default(...)] - The field has
#[update(...)](applied as default on create) - The field is a
HasManyorHasOnerelation (populated separately)
BelongsTo fields are required if their target is non-nullable (e.g.,
BelongsTo<User> is required, BelongsTo<Option<User>> is not). This matches
the existing nullable detection via <T as Relation>::nullable().
Generated code
For a model with required fields name and email:
#![allow(unused)]
fn main() {
// ---- Marker types (defined once in toasty crate) ----
pub struct Set;
pub struct NotSet;
// ---- Generated by #[derive(Model)] on User ----
// One trait per required field with a custom diagnostic
#[doc(hidden)]
#[diagnostic::on_unimplemented(
message = "cannot create `User`: required field `name` is not set",
label = "call `.name(...)` before `.exec()`"
)]
pub trait __user_create_has_name {}
impl __user_create_has_name for Set {}
#[doc(hidden)]
#[diagnostic::on_unimplemented(
message = "cannot create `User`: required field `email` is not set",
label = "call `.email(...)` before `.exec()`"
)]
pub trait __user_create_has_email {}
impl __user_create_has_email for Set {}
// Verifier: all ZSTs, optimized away entirely
#[doc(hidden)]
pub struct __UserCreateVerify<Name = NotSet, Email = NotSet>(
::std::marker::PhantomData<(Name, Email)>,
);
impl __UserCreateVerify {
pub fn new() -> Self {
__UserCreateVerify(::std::marker::PhantomData)
}
}
impl<Name, Email> __UserCreateVerify<Name, Email> {
// Required field: transitions type param to Set
pub fn name(self) -> __UserCreateVerify<Set, Email> {
__UserCreateVerify(::std::marker::PhantomData)
}
pub fn email(self) -> __UserCreateVerify<Name, Set> {
__UserCreateVerify(::std::marker::PhantomData)
}
// Optional fields: no type transition
pub fn bio(self) -> Self { self }
pub fn login_count(self) -> Self { self }
// Relation fields (with_ variants used by create! for closures)
pub fn todos(self) -> Self { self }
pub fn with_todos(self) -> Self { self }
}
// check() only compiles when all required traits are satisfied
impl<Name, Email> __UserCreateVerify<Name, Email>
where
Name: __user_create_has_name,
Email: __user_create_has_email,
{
pub fn check(self) {}
}
// Entry point on the model type — resolves through aliases
impl User {
#[doc(hidden)]
pub fn __verify_create() -> __UserCreateVerify {
__UserCreateVerify::new()
}
}
}
create! macro expansion
The create! macro emits the verification chain before the real builder. The
verifier methods mirror the builder methods but take no arguments.
#![allow(unused)]
fn main() {
// Input:
toasty::create!(User, { name: "Carl", bio: "hello" })
// Expands to:
{
// Compile-time verification (all ZST, erased entirely)
User::__verify_create().name().bio().check();
// Real builder (unchanged)
User::create().name("Carl").bio("hello")
}
}
For type aliases (type Foo = User), Foo::__verify_create() resolves through
the type system to User::__verify_create() — no naming conventions needed.
Error messages
Missing one field:
error[E0277]: cannot create `User`: required field `email` is not set
--> src/main.rs:5:42
|
5 | create!(User, { name: "Carl" }).exec(&db).await;
| ^^^^ call `.email(...)` before `.exec()`
Missing multiple fields (Rust reports all unsatisfied bounds):
error[E0277]: cannot create `User`: required field `name` is not set
--> src/main.rs:5:24
|
5 | create!(User, {}).exec(&db).await;
| ^^^^ call `.name(...)` before `.exec()`
error[E0277]: cannot create `User`: required field `email` is not set
--> src/main.rs:5:24
|
5 | create!(User, {}).exec(&db).await;
| ^^^^ call `.email(...)` before `.exec()`
Scoped and batch create
For scoped creation (create!(user.todos(), { ... })), the create! macro
cannot call __verify_create() on the scope expression. Verification only
applies to the type-target form. This is acceptable: scoped creation already
implies certain fields are set by the relation.
For batch creation (create!(User, [{ ... }, { ... }])), each item in the list
gets its own verification chain.
#![allow(unused)]
fn main() {
// Input:
toasty::create!(User, [{ name: "Carl", email: "a@b.com" }, { name: "Bob", email: "b@c.com" }])
// Expands to:
{
User::__verify_create().name().email().check();
User::__verify_create().name().email().check();
User::create_many()
.with_item(|b| { let b = b.name("Carl").email("a@b.com"); b })
.with_item(|b| { let b = b.name("Bob").email("b@c.com"); b })
}
}
Nested creation (closures)
The create! macro generates .with_field(|b| { ... }) for nested struct
bodies. The verifier mirrors this with a no-arg .with_field() method that
returns Self (identity for relation fields).
#![allow(unused)]
fn main() {
// Input:
toasty::create!(User, { name: "Carl", email: "a@b.com", todos: [{ title: "buy milk" }] })
// Verification chain:
User::__verify_create().name().email().with_todos().check();
}
Nested model verification (e.g., verifying Todo’s required fields within the
closure) is not covered in this design. The nested model’s builder will catch
missing fields at the database level as it does today.
Implementation Plan
Step 1: Add marker types to toasty crate
Add Set and NotSet ZSTs to toasty::codegen_support (the module re-exported
for generated code).
File: crates/toasty/src/codegen_support.rs (or equivalent)
Step 2: Add is_required_on_create helper to codegen field
Add a method to Field in toasty-codegen that returns whether a field is
required for creation. This centralizes the logic:
#![allow(unused)]
fn main() {
impl Field {
pub fn is_required_on_create(&self) -> bool {
// Relations: only BelongsTo can be required
match &self.ty {
FieldTy::HasMany(_) | FieldTy::HasOne(_) => return false,
FieldTy::BelongsTo(rel) => return !rel.nullable,
FieldTy::Primitive(_) => {}
}
// Skip auto, default, update fields
if self.attrs.auto.is_some() {
return false;
}
if self.attrs.default_expr.is_some() || self.attrs.update_expr.is_some() {
return false;
}
// Check if the Rust type is Option<T>
// (For non-serialized fields, Primitive::NULLABLE handles this at
// runtime, but we need a syntactic check at codegen time.)
if let FieldTy::Primitive(ty) = &self.ty {
if is_option_type(ty) {
return false;
}
}
true
}
}
}
The is_option_type helper already exists in the codebase (used by serialize
field codegen). Extract it to a shared location if not already shared.
Step 3: Generate verifier in expand/create.rs
Add a new method expand_create_verifier to Expand that generates:
- One
__model_create_has_{field}trait per required field with#[diagnostic::on_unimplemented] - The
__ModelCreateVerifystruct with type params for required fields new(), field methods (required → type transition, optional → identity), andcheck()with trait bounds- The
__verify_create()associated function on the model impl
Call expand_create_verifier() from the model’s root expansion alongside
expand_create_builder().
Step 4: Update create! macro expansion
In crates/toasty-macros/src/create/expand.rs, modify the expand function to
emit the verification chain before the builder chain.
For Target::Type with CreateItem::Single:
#![allow(unused)]
fn main() {
// Verification chain: Type::__verify_create().field1().field2().check();
// Builder chain: Type::create().field1(val1).field2(val2)
}
For Target::Type with CreateItem::List, emit one verification chain per
item.
For Target::Scope, emit only the builder chain (no verification).
The verification field calls mirror the builder field calls but drop the
arguments. For CreateItem::Single, each field becomes .field_name(). For
CreateItem::List and nested structs, the with_* closure is replaced by a
simple .with_field_name() call.
Step 5: Tests
Add compile-fail tests that verify:
- Missing a single required field → error naming the field
- Missing multiple required fields → errors naming each field
- Optional fields can be omitted without error
#[auto]fields can be omitted without error#[default]fields can be omitted without error#[update]fields can be omitted without error- All fields provided → compiles successfully
- Type aliases work (
type Foo = User; create!(Foo, { ... }))
Limitations
-
Scope targets:
create!(user.todos(), { ... })does not get verification. The scope expression is not a type path, so we cannot call__verify_create()on it. -
Nested models: Required fields on nested models (inside closures) are not verified by this mechanism. They continue to rely on database constraint errors.
-
Direct builder API: Users who call
User::create().name("Carl").exec()without thecreate!macro do not get verification. The public builder is unchanged. This is intentional — the macro is the recommended API, and changing the builder’s type signature would be a larger change. -
diagnostic::on_unimplementedsupport: This attribute is stable since Rust 1.78. The custommessageandlabelfields are respected byrustc. Third-party tools (rust-analyzer, older compilers) may show a generic trait bound error instead of the custom message.
Files Modified
| File | Change |
|---|---|
crates/toasty/src/codegen_support.rs | Add Set, NotSet marker types |
crates/toasty-codegen/src/schema/field.rs | Add is_required_on_create() method |
crates/toasty-codegen/src/expand/create.rs | Add expand_create_verifier() |
crates/toasty-codegen/src/expand/mod.rs | Call expand_create_verifier() from root expansion |
crates/toasty-macros/src/create/expand.rs | Emit verification chain in expand() |
create! Macro v2
Redesign of the create! macro syntax to support mixed-type batch creation,
better disambiguation between type targets and scope targets, and compile-time
required field verification.
Syntax
Single creation (struct-literal form)
#![allow(unused)]
fn main() {
toasty::create!(User { name: "Carl", email: "carl@example.com" })
}
No comma between the type path and {. This is visually identical to Rust’s
struct literal syntax, making it immediately recognizable.
Scoped creation (in keyword)
#![allow(unused)]
fn main() {
toasty::create!(in user.todos() { title: "buy milk" })
}
The in keyword prefixes the scope expression, unambiguously marking it as a
scope target. No comma is needed — in is not a valid start of a type path or
expression in this position, so it cleanly disambiguates.
The scope expression after in is parsed with Expr::parse_without_eager_brace
(from syn). This prevents the parser from consuming the { fields } body as
part of the expression — the same technique Rust uses for for pat in expr { body }. A bare { can only start an expression as a block or struct literal;
parse_without_eager_brace suppresses struct literal parsing, and a block
would require ; or a trailing expression, so the field body { name: "Carl" }
is never ambiguous with the scope expression.
Batch creation (same type shorthand)
#![allow(unused)]
fn main() {
toasty::create!(User::[
{ name: "Carl", email: "carl@example.com" },
{ name: "Alice", email: "alice@example.com" },
])
}
Type::[items] creates multiple records of the same type. The :: makes this
syntactically distinct from both the struct-literal form and array indexing.
Batch creation (mixed types)
#![allow(unused)]
fn main() {
toasty::create!([
User { name: "Carl", email: "carl@example.com" },
Article { title: "Hello World", author: &carl },
])
}
A bare [items] where each item is a struct-literal form or a scoped in
creation. This leverages the batch infrastructure (IntoStatement tuple/vec) to
compose multiple inserts of different types into a single batch operation.
Scoped items can be mixed into the batch:
#![allow(unused)]
fn main() {
toasty::create!([
User { name: "Carl", email: "carl@example.com" },
in user.friends() { name: "Bob" },
])
}
Parsing Strategy
The macro input starts with one of four forms, distinguished by the first tokens:
| First tokens | Form | Target |
|---|---|---|
Path { | Single creation | Type |
in | Scoped creation | Scope |
Path :: [ | Same-type batch | Type |
[ | Mixed-type batch | Multiple types |
Parsing steps:
- If input starts with
[→ mixed-type batch - If input starts with
in→ scoped creation: callExpr::parse_without_eager_bracefor the scope expression, then parse{ fields } - Otherwise, parse as
syn::Path:- If followed by
{→ single creation (struct-literal form) - If followed by
:: [→ same-type batch
- If followed by
Inside a [ batch list, each item is parsed with the same disambiguation: in
prefix → scoped item, Path { → type-target item.
Expansion
Single creation
#![allow(unused)]
fn main() {
// Input:
toasty::create!(User { name: "Carl", email: "carl@example.com" })
// Expands to:
{
User::__verify_create().name().email().check();
User::create().name("Carl").email("carl@example.com")
}
}
Returns a UserCreate builder. The caller chains .exec(&db) to execute.
Scoped creation
#![allow(unused)]
fn main() {
// Input:
toasty::create!(in user.todos() { title: "buy milk" })
// Expands to:
user.todos().create().title("buy milk")
}
No verification chain — the scope expression is not a type path, and the relation context already implies certain fields.
Same-type batch
#![allow(unused)]
fn main() {
// Input:
toasty::create!(User::[
{ name: "Carl", email: "carl@example.com" },
{ name: "Alice", email: "alice@example.com" },
])
// Expands to:
{
User::__verify_create().name().email().check();
User::__verify_create().name().email().check();
(
User::create().name("Carl").email("carl@example.com"),
User::create().name("Alice").email("alice@example.com"),
)
}
}
Returns a tuple of create builders. Each item gets its own verification chain.
All batch forms expand to tuples of builders, which compose with
toasty::batch() for execution. CreateMany / create_many() are deprecated
and not used in new expansions.
Mixed-type batch
#![allow(unused)]
fn main() {
// Input:
toasty::create!([
User { name: "Carl", email: "carl@example.com" },
Article { title: "Hello World" },
])
// Expands to:
{
User::__verify_create().name().email().check();
Article::__verify_create().title().check();
(
User::create().name("Carl").email("carl@example.com"),
Article::create().title("Hello World"),
)
}
}
Returns a tuple of create builders (UserCreate, ArticleCreate). The caller
passes the tuple to toasty::batch() for combined execution:
#![allow(unused)]
fn main() {
let (user, article) = toasty::batch(
toasty::create!([
User { name: "Carl", email: "carl@example.com" },
Article { title: "Hello World" },
])
).exec(&mut db).await?;
}
Mixed batch with scoped items
#![allow(unused)]
fn main() {
// Input:
toasty::create!([
User { name: "Carl", email: "carl@example.com" },
in carl.todos() { title: "buy milk" },
])
// Expands to:
{
User::__verify_create().name().email().check();
(
User::create().name("Carl").email("carl@example.com"),
carl.todos().create().title("buy milk"),
)
}
}
Scoped items in a batch do not get verification chains (same as standalone scoped creation). Type-target items get verification as usual.
All batch forms (same-type and mixed-type) produce tuples of builders. This
composes naturally with toasty::batch(), which already accepts tuples via
IntoStatement. CreateMany / create_many() are not used — all batching
goes through toasty::batch().
Compile-Time Required Field Verification
See create-macro-required-field-verification.md for the full design. Summary:
#[derive(Model)]generates a hidden__verify_create()method on each model that returns a ZST verifier with typestate tracking- Required field methods transition type params from
NotSettoSet - Optional field methods return
Self(identity) check()is only available when all required-field traits are satisfied#[diagnostic::on_unimplemented]gives per-field error messages- The
create!macro emits verification chains before the builder chains - Verification is only emitted for type-target forms (single, same-type batch, mixed-type batch), not scoped creation
Nested Creation
Nested struct bodies and relation lists work the same as today within each item:
#![allow(unused)]
fn main() {
toasty::create!(User {
name: "Carl",
email: "carl@example.com",
todos: [
{ title: "buy milk" },
{ title: "write code" },
],
})
}
The verification chain for nested bodies calls the relation method as a no-op:
#![allow(unused)]
fn main() {
User::__verify_create().name().email().with_todos().check();
}
Nested model verification (e.g., Todo’s required fields) is not covered by
the verification chain. The nested model’s builder catches missing fields at
the database level.
Migration from v1
Breaking changes
| v1 syntax | v2 syntax |
|---|---|
create!(User, { name: "Carl" }) | create!(User { name: "Carl" }) |
create!(user.todos(), { ... }) | create!(in user.todos() { ... }) |
create!(User, [{ ... }, { ... }]) | create!(User::[ { ... }, { ... } ]) |
The v1 type-target forms (create!(User, { ... }) and create!(User, [...]))
are removed. The scope form now uses the in keyword prefix instead of a
comma separator.
Implementation Plan
Phase 1: Macro v2 syntax
Step 1: Update create! macro parser
Rewrite crates/toasty-macros/src/create/parse.rs to handle the four forms:
[→ mixed-type batchin expr { ... }→ scoped creationPath {→ single creationPath :: [→ same-type batch
Update Target enum and CreateInput to represent the new forms.
Step 2: Update create! macro expansion
Rewrite crates/toasty-macros/src/create/expand.rs to generate:
- Builder chains as today
- Tuple output for batch forms
No verification chains yet — those are added in phase 2.
Step 3: Update existing tests and examples
All existing create! usages need to be updated to the new syntax. This
includes:
- Integration tests in
crates/toasty-driver-integration-suite/src/tests/ - Examples in
examples/ - Benchmarks
Step 4: Add syntax tests
- Tests for each syntax form (single, scoped, same-type batch, mixed-type batch)
- Type alias tests (
type Foo = User; create!(Foo { ... }))
Phase 2: Compile-time required field verification
(From create-macro-required-field-verification.md)
Step 5: Implement verification codegen
- Add
Set/NotSetmarkers totoasty::codegen_support - Add
is_required_on_create()to codegenField - Generate verifier struct, traits, and
__verify_create()inexpand/create.rs
Step 6: Wire verification into create! expansion
Update macro expansion to emit __verify_create() chains before builder chains
for type-target forms (single, same-type batch, mixed-type batch). Scoped
creation is unchanged.
Step 7: Add verification tests
- Compile-fail tests for missing required fields
- Tests verifying optional fields can be omitted without error
DynamoDB: OR Predicates in Index Key Conditions
Problem
DynamoDB’s KeyConditionExpression does not support OR — neither for partition keys nor
sort keys. This means queries like WHERE user_id = 1 OR user_id = 2 on an indexed field
are currently broken for DynamoDB.
The engine must detect OR in index key conditions and fan them out into N individual
DynamoDB Query calls — one per OR branch — then concatenate the results.
A secondary motivation: the batch-load mechanism used for nested association preloads
(rewrite_stmt_query_for_batch_load_nosql) produces ANY(MAP(arg[input], pred)), which
at exec time expands to OR via simplify_expr_any. This hits the same DynamoDB
restriction and is addressed by the same fix.
Where OR Can Reach a Key Condition
Only two engine actions use KeyConditionExpression:
QueryPk— queries the primary table when exact PK keys cannot be extractedFindPkByIndex— queries a GSI to retrieve primary keys
GetByKey uses BatchGetItem (explicit key values, no expression), so OR is never
relevant there. pk = v1 OR pk = v2 on the primary key produces
IndexPlan.key_values = Some([v1, v2]), routing to GetByKey directly — no issue.
QueryPk
OR reaches QueryPk.pk_filter when IndexPlan.key_values is None:
- User-specified OR on sort key:
WHERE pk = v AND (sk >= s1 OR sk >= s2)— range predicates have no extractable key values. - Batch-load (e.g. a HasMany where the FK is the partition key of the child’s
composite primary key):
rewrite_stmt_query_for_batch_load_nosqlproducesANY(MAP(arg[input], fk = arg[0])). The list is a runtime input, sokey_valuesisNone. At exec timesimplify_expr_anyexpands it to OR.
FindPkByIndex
FindPkByIndex.filter is the output of partition_filter, which isolates index key
conditions from non-key conditions. partition_filter on AND distributes cleanly:
status = active AND (user_id = 1 OR user_id = 2) produces
index_filter = user_id = 1 OR user_id = 2 and result_filter = status = active.
OR reaches it in the same two ways as QueryPk:
- User-specified OR:
WHERE user_id = 1 OR user_id = 2on a GSI partition key. - Batch-load: same
ANY(MAP(arg[input], pred))expansion path as above.
Mixed OR Operands
partition_filter currently has a todo!() for OR operands that contain both index and
non-index parts — e.g. (pk = 1 AND status = a) OR pk = 2.
This is in scope. Strategy:
- Extract key conditions from each OR branch to build the fan-out:
ANY(MAP([1, 2], pk = arg[0])) - Apply the full original predicate as an in-memory post-filter:
(pk = 1 AND status = a) OR pk = 2
This is conservative but correct, and consistent with how post_filter is already used.
Canonical Form: ANY(MAP(key_list, per_call_pred))
All OR cases are represented uniformly as ANY(MAP(key_list, per_call_pred)):
key_list— one entry per requiredQuerycall; each entry has one value per key column (scalar for partition-key-only,Value::Recordfor partition + sort key)per_call_pred— the key condition for one call, referencing element fields asarg[0],arg[1], …
Single key column — user_id = 1 OR user_id = 2:
ANY(MAP([1, 2], user_id = arg[0]))
Composite key — (todo_id = t1 AND step_id >= s1) OR (todo_id = t2 AND step_id >= s2):
ANY(MAP([(t1, s1), (t2, s2)], todo_id = arg[0] AND step_id >= arg[1]))
Batch-load — ANY(MAP(arg[input], todo_id = arg[0])) — already in canonical form;
no structural change needed, only the exec fan-out behavior changes.
Design
1. Capability Flag
#![allow(unused)]
fn main() {
/// Whether OR is supported in index key conditions (e.g. DynamoDB KeyConditionExpression).
pub index_or_predicate: bool,
}
DynamoDB: false. All other backends: true (SQL backends never use these actions).
2. IndexPlan Output Contract
#![allow(unused)]
fn main() {
pub(crate) struct IndexPlan<'a> {
pub(crate) index: &'a Index,
/// Filter to push to the index. Guaranteed form:
///
/// | Condition | Form |
/// |------------------------------------|--------------------------------------------------|
/// | No OR | plain expr — `user_id = 1` |
/// | OR, `index_or_predicate = true` | `Expr::Or([branch1, branch2, ...])` |
/// | OR, `index_or_predicate = false` | `ANY(MAP(Value::List([v1, ...]), per_call_pred))` |
/// | Batch-load (any capability) | `ANY(MAP(arg[input], per_call_pred))` |
pub(crate) index_filter: stmt::Expr,
/// Non-index conditions applied in-memory after results return from each call.
pub(crate) result_filter: Option<stmt::Expr>,
/// Full original predicate applied after all fan-out results are collected.
/// Set for mixed OR operands — see §"Mixed OR Operands".
pub(crate) post_filter: Option<stmt::Expr>,
/// Literal key values for direct lookup: a `Value::List` of `Value::Record` entries,
/// one per lookup. Set by `partition_filter` when all key columns have literal equality
/// matches. When `Some`, the planner routes to `GetByKey` and ignores `index_filter`.
/// May coexist with a canonical `ANY(MAP(...))` `index_filter` — both are produced
/// simultaneously by `partition_filter`; the planner always prefers `GetByKey`.
pub(crate) key_values: Option<stmt::Value>,
}
}
Planner routing (primary key path):
key_values.is_some() → GetByKey (BatchGetItem)
index_filter = ANY(MAP(...)) → fan-out via QueryPk × N
otherwise → single QueryPk call
3. Key Value Extraction in index_match
partition_filter extracts literal key values during filter partitioning, setting
key_values when all key columns have literal equality matches. This replaces the
current try_build_key_filter (kv.rs) post-hoc re-analysis of index_filter.
What moves into index_match: walking each OR branch, reading the RHS of each key
column’s equality predicate, assembling Value::List([Value::Record([v0, ...]), ...]).
What stays in the planner: constructing eval::Func from key_values to drive the
GetByKey operation — a mechanical wrap requiring no further expression analysis.
Why this matters for ordering: if partition_filter produced the canonical
ANY(MAP([1,2], pk=arg[0])) form first, the downstream try_build_key_filter Or arm
would never fire, silently breaking the GetByKey path for primary key OR queries.
Extracting key values inside partition_filter eliminates this conflict — both outputs
are produced together.
4. Planner Invariant
When !capability.index_or_predicate, neither FindPkByIndex.filter nor
QueryPk.pk_filter contains Expr::Or. OR is always restructured into
ANY(MAP(arg[i], per_call_pred)) by partition_filter before reaching the exec layer.
Batch-load path — ANY(MAP(...)) is already produced upstream; the invariant holds.
Only the exec fan-out needs fixing.
User-specified OR path — partition_filter produces canonical form directly. The
planner consumes IndexPlan.index_filter as-is; no rewrite in plan_secondary_index_execution
or plan_primary_key_execution. For mixed OR operands, partition_filter additionally
sets IndexPlan.post_filter to the full original predicate.
5. Exec Fan-out
Both action_find_pk_by_index and action_query_pk receive the same treatment.
After substituting inputs into the filter, check for ANY(MAP(arg[i], per_call_pred)):
- If present: iterate over
input[i]element by element; substitute each intoper_call_predand issue one driver call; concatenate results. Do not callsimplify_expr_any— it would re-expand to OR. - Otherwise: unchanged single-call path.
6. DynamoDB Driver
Revert the temporary OR-splitting workaround in exec_find_pk_by_index. The driver
is a dumb executor of a single valid key condition.
Summary of Changes
| Location | Change |
|---|---|
Capability | Add index_or_predicate: bool; false for DynamoDB |
IndexPlan | Add key_values: Option<stmt::Value> field |
index_match / partition_filter | Or arm: produce canonical ANY(MAP(...)) when !index_or_predicate; extract key_values; fix mixed OR todo!() |
plan_primary_key_execution | Route on key_values / ANY(MAP(...)) instead of calling try_build_key_filter |
plan_secondary_index_execution | No rewrite needed; consumes IndexPlan.index_filter as-is |
kv.rs / try_build_key_filter | Remove (literal case now handled by index_match) |
action_find_pk_by_index | Fan out over ANY(MAP(...)) — one driver call per element; skip simplify_expr_any |
action_query_pk | Same fan-out treatment |
DynamoDB exec_find_pk_by_index | Revert OR-splitting workaround |
Data-Carrying Enum Implementation Design
Builds on unit enum support (#355). See docs/design/enums-and-embedded-structs.md
for the user-facing design.
Value Stream Encoding
Unit and data variants are encoded differently in the value stream:
- Unit variant:
Value::I64(discriminant)— unchanged from unit enum encoding - Data variant:
Value::Record([I64(discriminant), ...active_field_values])
Only the active variant’s fields appear in the record; inactive variant columns (NULL
in the DB) are not included. Primitive::load dispatches on the value type:
I64(d) => unit variant with discriminant d
Record(r) => data variant; r[0] is the discriminant, r[1..] are fields
Schema Changes
EnumVariant gains a fields: Vec<Field> — the same Field type used by
EmbeddedStruct. Field indices are assigned globally across all variants within the
enum, keeping FieldId { model: enum_id, index } as a unique identifier consistent
with how EmbeddedStruct works. The primary_key, auto, and constraints
members of Field are always false/None/[] for variant fields.
Primitive::ty() changes based on variant content:
- Unit-only enum →
Type::I64(unchanged) - Any data variant present →
Type::Model(Self::id()), same as embedded structs
Codegen Changes
Parsing: toasty-codegen/src/schema/ parses variant fields and includes them
in EmbeddedEnum registration so the runtime schema is complete.
Primitive::load: generated arms dispatch on value type first (I64 vs Record),
then on the discriminant within each branch. Data variant arms load each field from
its positional index in the record.
IntoExpr: unit variants emit Value::I64(disc) as today; data variants emit
Value::Record([I64(disc), field_exprs...]).
{Enum}Fields struct: all enums (unit-only and data-carrying) generate a
{Enum}Fields struct with is_{variant}() methods for discriminant-only filtering.
For data-carrying enums, is_{variant}() uses project(path, [0]) to extract the
discriminant from the record representation. For unit-only enums, it compares the
path directly. The struct also delegates comparison methods (eq, ne, etc.) to
Path<Self>.
Engine: Expr::Match
Both table_to_model and model_to_table are expressed using:
Match { subject: Expr, arms: [(pattern: Value, expr: Expr)], else_expr: Expr }
Expr::Match is never serialized to SQL — it is either evaluated in the engine
(for writes) or eliminated by the simplifier before the plan stage (for reads/queries).
table_to_model
For an enum field, table_to_model emits a Match on the discriminator column.
Each arm produces the value shape Primitive::load expects: unit arms emit
I64(disc), data arms emit Record([I64(disc), ...field_col_refs]).
else branch: Expr::Error
The else branch of an enum Match represents the case where the discriminant column
holds an unrecognized value — semantically unreachable for well-formed data.
For data-carrying enums, the else branch is Record([disc_col, Error, ...Error]) —
the same Record shape as data arms, but with Expr::Error in every field slot. This
design is critical for the simplifier: projections distribute uniformly into the else
branch, and field-slot projections yield Expr::Error (correct: accessing a field
on an unknown variant is an error), while discriminant projections ([0]) yield
disc_col (the same as every arm). This enables the uniform-arms optimization to
fire after projection.
For unit-only enums with data variants, else is Expr::Error directly.
model_to_table
Runs the inverse: the incoming value (I64 or Record) is matched on its
discriminant, and each arm emits a flat record of all enum columns in DB order —
setting the discriminator and active variant fields, and NULLing every inactive
variant column. This NULL-out is mandatory: because writes may not have a loaded
model, the engine has no knowledge of the prior variant and must clear all
non-active columns unconditionally.
Simplifier Rules
Project into Match (expr_project.rs)
Distributes a projection into each Match arm AND the else branch:
project(Match(subj, [p => e, ...], else), [i])
→ Match(subj, [p => project(e, [i]), ...], else: project(else, [i]))
Projection is pushed into the else branch unconditionally — Expr::Error inside
a Record is handled naturally (projecting [0] out of Record([disc, Error])
yields disc; projecting [1] yields Error).
Uniform arms (expr_match.rs)
When all arms AND the else branch produce the same expression, the Match is redundant:
Match(subj, [1 => disc, 2 => disc], else: disc) → disc
The else branch MUST equal the common arm expression for this rule to fire. This makes the transformation provably correct — no branch is dropped that could produce a different value.
Match elimination in binary ops (expr_binary_op.rs)
Distributes a binary op over match arms, producing an OR of guarded comparisons. The else branch is included with a negated guard:
Match(subj, [p1 => e1, p2 => e2], else: e3) == rhs
→ OR(subj == p1 AND e1 == rhs,
subj == p2 AND e2 == rhs,
subj != p1 AND subj != p2 AND e3 == rhs)
Each term is fully simplified inline. Terms that fold to false/null are pruned.
No special handling is needed for the else branch — it is always included and
existing simplification rules handle Expr::Error naturally (see below).
Expr::Error semantics
Expr::Error is treated as “unreachable” — not as a poison value that propagates.
No special Error propagation rules exist. Instead, existing rules eliminate Error
through the surrounding context:
-
Data-carrying enum else:
Record([disc, Error, ...]). After tuple decomposition, the guarddisc != p1 AND disc != p2contradicts the decomposeddisc == cfrom the comparison target. The contradicting equality rule (a == c AND a != c → false) folds the AND to false. -
false AND (Error == x): Thefalseshort-circuit in AND eliminates the term without needing to simplifyError == x. -
Record([1, Error]) == Record([0, "alice"]): Tuple decomposition produces1 == 0 AND Error == "alice". The1 == 0 → falsefolds the AND to false.
In all well-formed cases, the guard constraints around Error cause the branch to be pruned without requiring Error-specific rules.
Type inference for Expr::Error
Expr::Error infers as Type::Unknown. TypeUnion::insert skips Unknown, so
an Error branch in a Match doesn’t widen the inferred type union.
Variant-only filter flow
is_email() generates eq(project(path, [0]), I64(1)). After lowering:
eq(project(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
else: Record([disc, Error])), [0]),
I64(1))
- Project-into-Match distributes
[0]into all branches including else project(Record([disc, addr]), [0])→disc(for each arm)project(Record([disc, Error]), [0])→disc(for else)- Uniform-arms fires: all arms AND else produce
disc→ folds todisc - Result:
eq(disc, I64(1))— a cleandisc_col = 1predicate
Full-value equality filter flow
contact().eq(ContactInfo::Email { address: "alice@example.com" }) generates
eq(path, Record([I64(1), "alice@example.com"])). After lowering:
eq(Match(disc, [1 => Record([disc, addr]), 2 => Record([disc, num])],
else: Record([disc, Error])),
Record([I64(1), "alice@example.com"]))
- Match elimination distributes eq into each arm AND else as OR
disc == 1 AND Record([disc, addr]) == Record([I64(1), "alice"])→ simplifiesdisc == 2 AND Record([disc, num]) == Record([I64(1), "alice"])→ false (pruned)- Else:
disc != 1 AND disc != 2 AND Record([disc, Error]) == Record([I64(1), "alice"])→ tuple decomposition:disc != 1 AND disc != 2 AND disc == 1 AND Error == "alice"→ contradicting equality (disc == 1 AND disc != 1) → false (pruned) - Result:
disc_col = 1 AND addr_col = 'alice@example.com'
Correctness Sharp Edges
Whole-variant replacement must NULL all inactive columns. The engine has no
knowledge of the prior variant for query-based updates, so the model_to_table arms
unconditionally NULL every column they do not own.
NULL discriminators are disallowed. The discriminator column carries NOT NULL,
consistent with unit enums today. Option<Enum> support is a future concern.
Unknown discriminants fail at load time. An unrecognized discriminant (e.g. from
a newer schema version) produces a runtime error via Expr::Error. Removing a
variant requires a data migration.
No DB-level integrity for active variant fields. All variant columns are nullable
(to accommodate inactive variants), so a NULL in a required active field is caught
only at load time by Primitive::load, not at write time.
DynamoDB
Equivalent encoding to be determined when implementing the DynamoDB driver phase.
Implementation Status
Completed
-
Schema:
fields: Vec<Field>onEnumVariant; codegen parsing;Primitive::ty()returnsType::Modelfor data-carrying enums. -
Value encoding:
Primitive::load()dispatches on I64 vs Record;IntoExpremits Record for data variants. -
Expr::Match+Expr::Error: Match/MatchArm AST nodes with visitors, eval, and simplifier integration.Expr::Errorfor unreachable branches.build_table_to_model_field_enumusesRecord([disc, Error, ...])for the else branch. -
Simplifier: project-into-Match distribution; uniform-arms folding (with else-branch check); Match-to-OR elimination in binary ops; case distribution for binary ops with Match operands.
-
{Enum}Fieldscodegen: all enums generate a fields struct withis_{variant}()methods and delegated comparison methods. -
Integration tests: CRUD for data-carrying enums; full-value equality filter; variant-only filter (
is_email()); unit enum variant filter (is_pending()). -
Variant+field filter (
contact().email().matches(|e| e.address().eq("x"))): per-variant field accessors with closure-based.matches()API. -
OR tautology elimination:
is_variant(x, 0) or is_variant(x, 1)covering all variants of an enum folds totruein the OR simplifier.
Remaining
-
Partial updates: within-variant partial update builder.
-
DynamoDB: equivalent encoding in the DynamoDB driver.
Open Questions
-
SparseRecord/reload: within-variant partial updates are supported, soSparseRecordandreloadare needed for enum variant fields. Determine howreloadshould handle aSparseRecordscoped to a specific variant’s fields — the in-memory model must update only the changed fields without disturbing the discriminant or other variant columns. -
Shared columns: variants sharing a column via
#[column("name")]is in the user-facing design. Schema parsing should record shared columns in Phase 1; full query support is a follow-on.
Enum and Embedded Struct Support
Addresses Issue #280.
Scope
Add support for:
- Enum types as model fields (unit, tuple, struct variants)
- Embedded structs (no separate table, stored inline)
Both use #[derive(toasty::Embed)].
Storage Strategy
Flattened storage:
- Enums: Discriminator column + nullable columns per variant field
- INTEGER discriminator with required
#[column(variant = N)]on each variant - Works uniformly across all databases (PostgreSQL, MySQL, SQLite, DynamoDB)
- INTEGER discriminator with required
- Embedded structs: No discriminator, just flattened fields
Unit-only enums: No columns - stored as the INTEGER value itself.
Post-MVP: Native ENUM types for PostgreSQL/MySQL discriminators (optimization).
Column Naming
Pattern: {field}_{variant}_{name}
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
critter: Creature, // field name
}
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human { profession: String }, // variant, field
#[column(variant = 2)]
Lizard { habitat: String },
}
// Columns:
// - critter (discriminator)
// - critter_human_profession
// - critter_lizard_habitat
}
Customization
Rename field (at enum definition):
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human { profession: String },
#[column(variant = 2)]
Lizard {
#[column("lizard_env")] // Must include variant scope
habitat: String,
},
}
// → critter_lizard_env (field prefix "critter" is prepended)
}
Custom column names for enum variant fields must include the variant scope. The pattern becomes {field}_{custom_name} where custom_name should include the variant portion.
Rename field prefix (per use):
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
#[column("creature_type")]
critter: Creature,
}
// → creature_type (discriminator)
// → creature_type_human_profession (field prefix replaced for all columns)
// → creature_type_lizard_habitat
}
The #[column("name")] attribute on the parent struct’s field replaces the field prefix for all generated columns.
Customize discriminator type (on enum definition):
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")]
enum Creature { ... }
}
The #[column(type = "...")] attribute on the enum type customizes the database type for the discriminator column (e.g., “bigint”, “smallint”, “tinyint”).
Tuple Variants
Numeric field naming: {field}_{variant}_{index}
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
#[column(variant = 1)]
Phone(String, String),
}
// Columns: contact, contact_phone_0, contact_phone_1
}
Customize with #[column("...")]:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Contact {
#[column(variant = 1)]
Phone(
#[column("phone_country")]
String,
#[column("phone_number")]
String,
),
}
// Columns: contact, contact_phone_country, contact_phone_number
}
Nested Types
Path flattened with underscores:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
#[column(variant = 1)]
Mail { address: Address },
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
// → contact_mail_address_city
// → contact_mail_address_street
}
Shared Columns Across Variants
Multiple variants can share the same column by specifying the same #[column("name")]:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
#[key]
#[auto]
id: u64,
creature: Creature,
}
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human {
#[column("name")]
name: String,
profession: String,
},
#[column(variant = 2)]
Animal {
#[column("name")]
name: String,
species: String,
},
}
// Columns:
// - creature (discriminator)
// - creature_name (shared between Human and Animal)
// - creature_human_profession
// - creature_animal_species
}
Requirements:
- Fields sharing a column must have compatible types (validated at schema build time)
- The shared column name must be identical across variants
- Compatible types: same primitive type, or compatible type conversions
- Shared columns are still nullable at the database level (NULL when variant doesn’t use that field)
Discriminator Types
MVP: INTEGER discriminator for all databases
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human { profession: String },
#[column(variant = 2)]
Lizard { habitat: String },
}
}
All variants require #[column(variant = N)] with unique integer values. Compile error if missing.
Customize discriminator type:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
#[column(type = "bigint")] // Or "smallint", "tinyint", etc.
enum Creature {
#[column(variant = 1)]
Human { profession: String },
#[column(variant = 2)]
Lizard { habitat: String },
}
}
The #[column(type = "...")] attribute on the enum customizes the database type for the discriminator column.
Post-MVP: Native ENUM types for PostgreSQL/MySQL
CREATE TYPE creature AS ENUM ('Human', 'Lizard');
Can customize with #[column(variant = "name")] on variants.
NULL Handling
Inactive variant fields are NULL.
-- When critter = 'Human':
critter_human_profession = 'Knight'
critter_lizard_habitat = NULL
For Option<T> fields: Check discriminator first, then interpret NULL.
Usage
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
address: Address, // embedded struct
status: Status, // embedded enum
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active { since: DateTime },
}
}
Registration: Automatic. db.register::<User>() transitively registers all nested embedded types.
Relations: Forbidden in embedded types (compile error).
Examples
Basic Enum
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
#[key]
#[auto]
id: u64,
status: Status,
}
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active,
#[column(variant = 3)]
Done,
}
}
Schema:
CREATE TABLE task (
id INTEGER PRIMARY KEY,
status INTEGER NOT NULL
);
-- 1=Pending, 2=Active, 3=Done (requires #[column(variant = N)])
Data-Carrying Enum
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
contact: ContactMethod,
}
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Phone { country: String, number: String },
}
}
Schema:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
contact INTEGER NOT NULL,
contact_email_address TEXT,
contact_phone_country TEXT,
contact_phone_number TEXT
);
Embedded Struct
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
address: Address,
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
}
Schema:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
address_street TEXT NOT NULL,
address_city TEXT NOT NULL,
address_zip TEXT NOT NULL
);
Nested Enum + Embedded
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactInfo {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Mail { address: Address },
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
}
Schema:
-- contact: ContactInfo
contact INTEGER NOT NULL,
contact_email_address TEXT,
contact_mail_address_street TEXT,
contact_mail_address_city TEXT
Querying
Basic variant checks
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Task {
#[key]
#[auto]
id: u64,
status: Status,
}
#[derive(toasty::Embed)]
enum Status {
#[column(variant = 1)]
Pending,
#[column(variant = 2)]
Active,
#[column(variant = 3)]
Done,
}
// Query by variant (shorthand)
Task::all().filter(Task::FIELDS.status().is_pending())
Task::all().filter(Task::FIELDS.status().is_active())
// Equivalent using .matches() without field conditions
Task::all().filter(
Task::FIELDS.status().matches(Status::VARIANTS.pending())
)
}
Field access on variant fields
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
contact: ContactMethod,
}
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Phone { country: String, number: String },
}
// Match specific variants and access their fields
User::all().filter(
User::FIELDS.contact().matches(
ContactMethod::VARIANTS.email().address().contains("@gmail")
)
)
User::all().filter(
User::FIELDS.contact().matches(
ContactMethod::VARIANTS.phone().country().eq("US")
)
)
// Shorthand for variant-only checks (no field conditions)
User::all().filter(User::FIELDS.contact().is_email())
User::all().filter(User::FIELDS.contact().is_phone())
// Equivalent using .matches()
User::all().filter(
User::FIELDS.contact().matches(ContactMethod::VARIANTS.email())
)
}
Embedded struct field constraints
Embedded struct fields can be accessed directly for filtering, ordering, and other query operations:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
address: Address,
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
// Filter by embedded struct fields
User::all().filter(User::FIELDS.address().city().eq("Seattle"))
User::all().filter(User::FIELDS.address().zip().like("98%"))
// Multiple constraints on embedded struct
User::all().filter(
User::FIELDS.address().city().eq("Seattle")
.and(User::FIELDS.address().zip().like("98%"))
)
// Order by embedded struct fields
User::all().order_by(User::FIELDS.address().city().asc())
// Select embedded struct fields (projection)
User::all()
.select(User::FIELDS.id())
.select(User::FIELDS.address().city())
}
Nested embedded structs
For nested embedded types, continue chaining field accessors:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Company {
#[key]
#[auto]
id: u64,
headquarters: Office,
}
#[derive(toasty::Embed)]
struct Office {
name: String,
location: Address,
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
// Access nested embedded struct fields
Company::all().filter(
Company::FIELDS.headquarters().location().city().eq("Seattle")
)
Company::all().filter(
Company::FIELDS.headquarters().name().eq("Main Office")
.and(Company::FIELDS.headquarters().location().zip().like("98%"))
)
}
Combining enum and embedded struct constraints
When an enum variant contains an embedded struct, use .matches() to specify the variant, then access the embedded struct’s fields:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: u64,
contact: ContactInfo,
}
#[derive(toasty::Embed)]
enum ContactInfo {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Mail { address: Address },
}
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
}
// Filter by embedded struct fields within enum variant
User::all().filter(
User::FIELDS.contact().matches(
ContactInfo::VARIANTS.mail().address().city().eq("Seattle")
)
)
// Multiple constraints on embedded struct within variant
User::all().filter(
User::FIELDS.contact().matches(
ContactInfo::VARIANTS.mail()
.address().city().eq("Seattle")
.address().street().contains("Main")
)
)
}
Constraints with shared columns
When enum variants share columns, constraints apply based on the variant being matched:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct Character {
#[key]
#[auto]
id: u64,
creature: Creature,
}
#[derive(toasty::Embed)]
enum Creature {
#[column(variant = 1)]
Human {
#[column("name")]
name: String,
profession: String,
},
#[column(variant = 2)]
Animal {
#[column("name")]
name: String,
species: String,
},
}
// Query the shared "name" field for a specific variant
Character::all().filter(
Character::FIELDS.creature().matches(
Creature::VARIANTS.human().name().eq("Alice")
)
)
// Query across variants using the shared column
// (finds any creature with this name, regardless of variant)
Character::all().filter(
Character::FIELDS.creature().name().eq("Bob")
)
// Variant-specific field
Character::all().filter(
Character::FIELDS.creature().matches(
Creature::VARIANTS.human().profession().eq("Knight")
)
)
}
Updating
Update builders provide two methods per field:
.field(value)- Direct value assignment.with_field(|f| ...)- Closure-based update
The .with_* methods provide a uniform API across all field types and enable:
- Embedded types: Partial updates (only set specific nested fields)
- Primitives: Future type-specific operations (e.g.,
NumericUpdate::increment()) - Enums: Update variant fields without changing the discriminator
Whole replacement
Setting an embedded struct field on an update replaces all of its columns:
#![allow(unused)]
fn main() {
// Loaded model update — sets address_street, address_city, address_zip
user.update()
.address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
.exec(&db).await?;
// Query-based update — same behavior, no model loaded
User::filter_by_id(id).update()
.address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
.exec(&db).await?;
}
Partial updates
Each field (primitive or embedded) generates a companion {Type}Update<'a> type that
provides a view into the update statement’s assignments. These update types hold a
reference to the statement and a projection path, allowing them to directly mutate
the statement as fields are set. This enables efficient nested updates without intermediate
allocations.
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Address {
street: String,
city: String,
zip: String,
}
// AddressUpdate<'a> is generated automatically by `#[derive(toasty::Embed)]`
// StringUpdate<'a> is generated for primitive String fields
}
Embedded types:
#![allow(unused)]
fn main() {
// Whole replacement — sets all address columns
user.update()
.address(Address { street: "123 Main", city: "Seattle", zip: "98101" })
.exec(&db).await?;
// Partial update — only address_city is SET
user.update()
.with_address(|a| {
a.set_city("Seattle");
})
.exec(&db).await?;
// Multiple sub-fields — only address_city and address_zip are SET
user.update()
.with_address(|a| {
a.set_city("Seattle");
a.set_zip("98101");
})
.exec(&db).await?;
// Query-based partial update
User::filter_by_id(id).update()
.with_address(|a| a.set_city("Seattle"))
.exec(&db).await?;
}
Primitive types:
#![allow(unused)]
fn main() {
// Direct value
user.update()
.name("Alice")
.exec(&db).await?;
// Via closure (enables future type-specific operations)
user.update()
.with_name(|n| {
n.set("Alice");
})
.exec(&db).await?;
}
For now, primitive update builders only provide .set(). Future enhancements could add
type-specific operations like NumericUpdate::increment(), StringUpdate::append(), etc.
Partial updates with nested embedded structs
Nested embedded structs also generate {Type}Update<'a> types. The .with_* methods
can be nested naturally:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
struct Office {
name: String,
location: Address,
}
// Update only headquarters_location_city
company.update()
.with_headquarters(|h| {
h.with_location(|a| {
a.set_city("Seattle");
});
})
.exec(&db).await?;
// Update headquarters_name and headquarters_location_zip
company.update()
.with_headquarters(|h| {
h.with_name(|n| n.set("West Coast HQ"));
h.with_location(|a| {
a.set_zip("98101");
});
})
.exec(&db).await?;
}
Enum updates
Enums use whole-variant replacement. Setting an enum field replaces the discriminator and all variant columns:
#![allow(unused)]
fn main() {
// Replace the entire enum value — sets discriminator + variant fields,
// NULLs out fields from the previous variant
user.update()
.contact(ContactMethod::Email { address: "new@example.com".into() })
.exec(&db).await?;
}
For data-carrying variants, use .with_contact() to update fields within the current
variant without changing the discriminator:
#![allow(unused)]
fn main() {
#[derive(toasty::Embed)]
enum ContactMethod {
#[column(variant = 1)]
Email { address: String },
#[column(variant = 2)]
Phone { country: String, number: String },
}
// Update only the phone number, leave country and discriminator unchanged
user.update()
.with_contact(|c| {
c.phone(|p| {
p.with_number(|n| n.set("555-1234"));
});
})
.exec(&db).await?;
// Update email variant
User::filter_by_id(id).update()
.with_contact(|c| {
c.email(|e| {
e.with_address(|a| a.set("new@example.com"));
});
})
.exec(&db).await?;
}
ContactMethodUpdate<'a> has one method per variant (e.g., .phone(), .email()). Each
method accepts a closure that receives a builder scoped to that variant’s fields. The
discriminator is not changed by partial updates.
Mapping Layer Formalization
Problem
Toasty’s mapping layer connects model-level fields to database-level columns.
A model field’s type may differ from its storage type (e.g., Timestamp stored
as i64 or text). The mapping must be a bijection — every model value
encodes to exactly one stored value and decodes back losslessly. The bijection
operates at the record level, not per-field: n model fields may map to m
database columns (e.g., multiple fields JSON-encoded into a single column).
The bijection alone is not sufficient. When lowering expressions (filters, ORDER BY, arithmetic) to the database, we need to know whether a given operator can be pushed through the encoding. This is the question of whether the encoding is a homomorphism with respect to that operator:
- For arithmetic:
encode(a ⊕ b) = encode(a) ⊕' encode(b) - For comparisons:
a < b ⟺ encode(a) <' encode(b)
If yes, the operator can be evaluated in storage space (efficient, index-friendly). If no, the database must first decode to the model type (SQL CAST) or the operation must be evaluated application-side.
These are two decoupled concerns:
- Bijection — can we round-trip values? (required for correctness)
- Operator homomorphism — which operators preserve semantics through the encoding? (determines what can be pushed to the DB)
A mapping with no homomorphic operators is still valid — you can store and retrieve. You just can’t push any filters or ordering to the database.
Examples
Timestamp as i64 (epoch seconds)
encode(ts) = ts.epoch_seconds()
decode(n) = Timestamp::from_epoch_seconds(n)
Bijection: ✓ — lossless round-trip.
== homomorphic: ✓ — ts1 == ts2 ⟺ encode(ts1) == encode(ts2)
< homomorphic: ✓ — ts1 < ts2 ⟺ encode(ts1) < encode(ts2)
Epoch seconds preserve temporal ordering under integer comparison, so range
queries (<, >, BETWEEN) can operate directly on the raw column.
+ homomorphic: ✓ — encode(ts + 234s) = encode(ts) + 234
Integer addition over epoch seconds preserves timestamp arithmetic.
Timestamp as text (ISO 8601)
encode(ts) = ts.to_iso8601()
decode(s) = Timestamp::parse_iso8601(s)
Bijection: ✓ — lossless round-trip (assuming canonical formatting).
== homomorphic: ✓ — injective encoding preserves equality.
< homomorphic: fragile — lexicographic order matches temporal order only
for fixed-width UTC formats. Not generally safe.
+ homomorphic: ✗ — text + 234 is meaningless.
String with case inversion
encode(s) = s.invert_case() // "Hello" → "hELLO"
decode(s) = s.invert_case() // "hELLO" → "Hello"
Bijection: ✓ — case inversion is its own inverse.
== homomorphic: ✓ — injective, so equality is preserved. Encode the
search term the same way and compare.
< homomorphic: ✗ — ordering is reversed between cases:
"ABC" < "abc" (A=65 < a=97)
encode("ABC") = "abc"
encode("abc") = "ABC"
"abc" > "ABC" — ordering reversed
A valid mapping, but useless for range queries in storage space.
Bijection by Construction
For arbitrary functions, bijectivity is undecidable. Instead of detecting it, we construct mappings from known-bijective primitives and composition rules that preserve bijectivity. If a mapping is built entirely from these, it is guaranteed valid.
Composition rules
- Sequential:
f ∘ gis a bijection if bothfandgare. - Parallel/product:
(f(a), g(b))is a bijection if bothfandgare.
These compose freely — complex mappings built from simple bijective pieces are automatically valid. Homomorphism properties, however, may be lost at each composition step and must be tracked separately.
Dimensionality: multiple fields → one column
Two fields may map to the same column if and only if the model constrains them to always hold the same value (an equivalence class). In this case no information is lost and the mapping remains a bijection — but only over the restricted domain where the constraint holds. Without such a constraint, collapsing two independent fields into one column destroys injectivity.
This gives us computed fields as a natural consequence. Two fields can reference the same column through different bijective transformations:
regular: String → column (identity)
inverted: String → invert_case(column) (bijection)
Because the transformations are bijections, both fields are readable AND writable.
Writing regular = "Hello" stores "Hello" in the column; inverted
automatically becomes "hELLO". Writing inverted = "hELLO" applies the inverse
to store "Hello"; regular is automatically "Hello". Data flow in both
directions is fully determined by the bijection — no special computed-field
machinery needed.
Computed Fields
Storage is the source of truth. Each field is a view of the underlying column(s) through its bijection. Computed fields are a direct consequence: when multiple fields reference the same column through different bijections, each field is a different view of the same stored data.
Schema representation
Each field stores a bijection pair:
field_to_column: encode — compute column value from field value (inverse)column_to_field: decode — compute field value from column value (forward)
A reverse index maps each column to the set of fields that reference it.
Write propagation
When a field is set, the column value is determined, which determines all sibling fields:
- Compute column value:
col = field_a.field_to_column(new_value) - For each sibling field on the same column:
field_b = field_b.column_to_field(col)
The composed transform between two fields sharing a column is:
field_b.column_to_field(field_a.field_to_column(value))
Conflict detection
If the user sets two fields that share a column in the same operation, the
resulting column values must agree. If
field_a.field_to_column(val_a) ≠ field_b.field_to_column(val_b), the write is
invalid and must be rejected.
Bijective Primitives
Three categories of bijective primitives, each with encode/decode halves:
Type reinterpretation
Converts a single value between two types with the same information content.
Implemented as Expr::Cast in both directions.
Current pairs:
- Timestamp ↔ String (ISO 8601)
- Uuid ↔ String
- Uuid ↔ Bytes
- Date ↔ String
- Time ↔ String
- DateTime ↔ String
- Zoned ↔ String
- Timestamp ↔ DateTime
- Timestamp ↔ Zoned
- Zoned ↔ DateTime
- Decimal ↔ String
- BigDecimal ↔ String
- Integer widening/narrowing (i8 ↔ i16 ↔ i32 ↔ i64, etc.)
Affine transformations
Arithmetic transformations by a constant. Each is a bijection with a known inverse.
x + k— inverse:x - kx * k(k ≠ 0) — inverse:x / kx * k + c(k ≠ 0) — inverse:(x - c) / k
Homomorphism properties (for x + k as representative):
==homomorphic: ✓ —a == b ⟺ (a+k) == (b+k)<homomorphic: ✓ —a < b ⟺ (a+k) < (b+k)+homomorphic: ✗ —encode(a+b) = a+b+k ≠ encode(a)+encode(b) = a+b+2k
Note: x * k for negative k reverses ordering (< not homomorphic).
Product (record)
Packs/unpacks multiple independent values into a fixed-size tuple.
- Encode:
Expr::Record— combine values into a tuple - Decode:
Expr::Project— extract by index
Bijective because each component is independent and individually recoverable. Used for embedded structs (fields flattened into columns).
Coproduct (tagged union)
Encodes/decodes a discriminated union (enum) where the discriminant partitions the domain into disjoint subsets.
- Encode:
Expr::Project— extract discriminant and per-variant fields - Decode:
Expr::Match— branch on discriminant, reconstruct variant viaExpr::Record
Bijective if and only if:
- Arms are exhaustive (cover all discriminant values)
- Arms are disjoint (no overlapping discriminants)
- Each arm’s body is individually a bijection
This is a coproduct of bijections: if f_i: A_i → B_i is a bijection for each
variant i, the combined mapping on the tagged union Σ_i A_i → Σ_i B_i is
also a bijection.
Operator Homomorphism
Operator inventory
Current Toasty binary operators (BinaryOp): ==, !=, <, <=, >, >=.
Arithmetic operators (+, -) are not yet in the AST but are needed for
computed fields and interval arithmetic.
For homomorphism analysis, != is the negation of ==, and >=/<= are
derivable from </>. So the independent set is: ==, <, +.
Per-primitive homomorphism
Type reinterpretation:
| Encoding | == | < | + |
|---|---|---|---|
| Timestamp ↔ String | ✓ | ✓ (¹) | ✗ |
| Uuid ↔ String | ✓ | ✗ | n/a |
| Uuid ↔ Bytes | ✓ | ✗ | n/a |
| Date ↔ String | ✓ | ✓ (¹) | ✗ |
| Time ↔ String | ✓ | ✓ (¹) | ✗ |
| DateTime ↔ String | ✓ | ✓ (¹) | ✗ |
| Zoned ↔ String | ✓ | ✗ | ✗ |
| Timestamp ↔ DateTime | ✓ | ✓ | ✓ |
| Timestamp ↔ Zoned | ✓ | ✓ | ✓ |
| Zoned ↔ DateTime | ✓ | ✓ | ✓ |
| Decimal ↔ String | ✓ | ✗ | ✗ |
| BigDecimal ↔ String | ✓ | ✗ | ✗ |
| Integer widening | ✓ | ✓ | ✓ |
(¹) Requires canonical fixed-width serialization format. Lexicographic ordering matches semantic ordering only if Toasty guarantees consistent formatting (no variable-length subsecond digits, no timezone offset variations, etc.).
All type reinterpretations are injective, so == is always preserved. < and
+ depend on whether the target type’s native operations align with the source
type’s semantics.
Affine transformations:
| Encoding | == | < | + |
|---|---|---|---|
x + k | ✓ | ✓ | ✗ |
x * k (k>0) | ✓ | ✓ | ✗ |
x * k (k<0) | ✓ | ✗ (reversed) | ✗ |
x * k + c | ✓ | ✓ if k>0 | ✗ |
Product (record):
| Operator | Homomorphic? |
|---|---|
== | ✓ — if each component preserves == |
< | conditional — requires lexicographic comparison and each component preserves < |
+ | ✓ — if each component preserves + (component-wise) |
Coproduct (tagged union):
| Operator | Homomorphic? |
|---|---|
== | ✓ — if discriminant + each arm preserves == |
< | generally ✗ — cross-variant comparison is usually meaningless |
+ | ✗ — arithmetic across variants undefined |
Homomorphism under composition
Sequential (g ∘ f): if both f and g are homomorphic for an operator,
so is the composition. Proof: a op b ⟺ f(a) op f(b) ⟺ g(f(a)) op g(f(b)).
Parallel/product ((f(a), g(b))): preserves == if both f and g do.
Preserves < only if tuple comparison is lexicographic and both preserve <.
Coproduct: preserves == if each arm does. Does not generally preserve <.
Cross-encoding comparisons
When two operands use different encodings (e.g., field₁ uses Timestamp→i64,
field₂ uses Timestamp→i64+offset), can_distribute does not directly apply.
The comparison encode₁(a) op encode₂(b) mixes two encodings and may not
preserve semantics.
Fallback: decode both to model space and compare there.
decode₁(col₁) op decode₂(col₂)
This always produces correct results but may require SQL CAST or application-side evaluation.
Database independence
can_distribute does not take a database parameter. Database capabilities
determine which bijection is selected (e.g., PostgreSQL has native timestamps
→ identity mapping; SQLite does not → Timestamp↔i64). Once the bijection is
chosen, can_distribute is purely a property of that bijection and the operator.
The only edge case is if two databases use the same types but their operators
behave differently (e.g., string collation affecting <). This can be handled by
treating such behavioral differences as part of the encoding rather than adding a
database parameter.
Precision / Domain Restriction
Lossy encodings like #[column(type = timestamp(2))] involve two distinct steps:
-
Domain restriction (lossy, write-time): the user’s full-precision value is truncated to the representable domain. This is many-to-one — multiple inputs collapse to the same output. It is not part of the mapping.
-
Encoding (bijective): over the restricted domain (values with ≤2 fractional digits), the mapping is a perfect bijection — lossless round-trip.
The mapping framework only governs step 2. Step 1 is a write-time concern:
when the user assigns a value, it gets projected into the representable domain.
Analogous to integer narrowing (i64 → i32): the mapping between i32 values
and the stored column is bijective; the loss happens if you store a value outside
i32 range.
Nullability
Option<T> with None → NULL is a coproduct bijection:
- Domain partition:
Option<T> = None | Some(T)— two disjoint cases. - Encoding:
None → NULL,Some(v) → encode(v)— each arm is individually bijective (unit↔NULL is trivially so;Somedelegates toT’s encoding). - Decoding:
NULL → None,non-NULL → Some(decode(v)).
This satisfies the coproduct conditions (exhaustive, disjoint, per-arm bijective).
NULL breaks standard ==
SQL uses three-valued logic: NULL = NULL evaluates to NULL (falsy), not
TRUE. This means the standard == operator is not homomorphic over the
nullable encoding — the model-level None == None is true, but
NULL = NULL is not.
NULL-safe operators
All Toasty target databases provide a NULL-safe equality operator:
| Database | Operator |
|---|---|
| PostgreSQL | IS NOT DISTINCT FROM |
| MySQL | <=> |
| SQLite | IS |
Using the NULL-safe operator restores == homomorphism:
a == b ⟺ encode(a) IS NOT DISTINCT FROM encode(b).
Operator mapping
This means homomorphism is not just a property of (encoding, operator) — it is
a property of the triple (encoding, model_op, storage_op). The lowerer may need
to emit a different SQL operator than the one the user wrote:
- Non-nullable field: model
==→ SQL= - Nullable field: model
==→ SQLIS NOT DISTINCT FROM(or<=>,IS)
can_distribute should return the storage-level operator to use, not just a
boolean. Signature sketch:
can_distribute(encoding, model_op) -> Option<storage_op>
None means the operator cannot be pushed to the DB. Some(op) means it can,
using the specified storage operator.
Ordering
NULL ordering is also database-specific (NULLS FIRST vs NULLS LAST). The
lowerer must ensure consistent behavior across backends, potentially by emitting
explicit NULLS FIRST/NULLS LAST clauses.
Lowering Algorithm
The lowerer transforms a model-level expression tree into a storage-level expression tree. The input contains field references and model-level literals. The output contains column references and storage-level values.
Core: lowering a binary operator
lower_binary_op(op, lhs, rhs):
// 1. Identify field references and look up their encodings
// from the schema/mapping.
lhs_encoding = lookup_encoding(lhs) if lhs is FieldRef, else None
rhs_encoding = lookup_encoding(rhs) if rhs is FieldRef, else None
// 2. Determine if the operator can distribute through the encoding.
// For single-column primitive encodings:
if both are FieldRefs with same encoding:
match can_distribute(encoding, op):
Some(storage_op):
// Both fields share the encoding — compare columns directly.
emit: column_lhs storage_op column_rhs
None:
// Decode both to model space.
emit: decode(column_lhs) op decode(column_rhs)
if one is FieldRef, other is Literal:
match can_distribute(field_encoding, op):
Some(storage_op):
// Encode the literal, compare in storage space.
emit: column storage_op encode(literal)
None:
// Decode the column to model space.
emit: decode(column) op literal
if both are Literals:
// Const-evaluate in model space.
emit: literal_lhs op literal_rhs
Encoding the literal
encode(literal) applies the field’s field_to_column bijection to the
model-level value, producing a storage-level value. For a UUID↔text encoding:
encode(UUID("abc-123")) → "abc-123".
Decoding the column
decode(column_ref) applies the field’s column_to_field bijection to the
column reference, wrapping it in the appropriate SQL expression. For UUID↔text:
decode(uuid_col) → CAST(uuid_col AS UUID).
If the database lacks the model type (e.g., SQLite has no UUID), decode is not expressible in SQL. The operation must be evaluated application-side or the query rejected.
Multi-column encodings (product / coproduct)
For fields that span multiple columns, == expands structurally:
lower_binary_op(==, coproduct_field, literal):
encoded = encode(literal)
// encoded is a tuple: (disc_val, col1_val, col2_val, ...)
// Expand into per-column comparisons:
result = TRUE
for each (column, encoded_value) in zip(field.columns, encoded):
col_encoding = encoding_for(column) // e.g., nullable text
match can_distribute(col_encoding, ==):
Some(storage_op):
result = result AND (column storage_op encoded_value)
None:
result = result AND (decode(column) == encoded_value)
emit: result
ORDER BY
lower_order_by(field):
encoding = lookup_encoding(field)
match can_distribute(encoding, <):
Some(_):
// Ordering is preserved in storage space.
emit: ORDER BY column
None:
// Must decode to model space for correct ordering.
emit: ORDER BY decode(column)
SELECT returning
Always decode — application needs model-level values:
lower_select_returning(field):
emit: decode(column) // column_to_field bijection
INSERT / UPDATE
Always encode — database needs storage-level values:
lower_insert_value(field, value):
emit: encode(value) // field_to_column bijection
Examples
WHERE uuid_col == UUID("abc-123"), UUID stored as text:
- LHS is FieldRef → encoding: UUID↔text, column:
uuid_col - RHS is literal:
UUID("abc-123") can_distribute(UUID↔text, ==)→Some(=)- Encode literal:
"abc-123" - Output:
uuid_col = 'abc-123'
WHERE uuid_col < UUID("abc-123"), UUID stored as text:
- LHS is FieldRef → encoding: UUID↔text, column:
uuid_col - RHS is literal:
UUID("abc-123") can_distribute(UUID↔text, <)→None- Decode column:
CAST(uuid_col AS UUID) - Output:
CAST(uuid_col AS UUID) < UUID('abc-123') - (If DB lacks UUID type → application-side evaluation or reject)
WHERE contact == Contact::Phone { number: "123" }, coproduct encoding:
- LHS is FieldRef → coproduct encoding, columns:
disc,phone_number,email_address - RHS is literal → encode:
(0, "123", NULL) - Expand per-column:
disc = 0(can_distribute(i64, ==)→Some(=))phone_number = '123'(can_distribute(nullable text, ==)→Some(=))email_address IS NULL(can_distribute(nullable text, ==)→Some(IS))
- Output:
disc = 0 AND phone_number = '123' AND email_address IS NULL
Schema Representation
Each field’s mapping is stored as a structured Bijection tree. This is the
single source of truth — encode/decode expressions are derived from it.
Bijection enum
#![allow(unused)]
fn main() {
enum Bijection {
/// No transformation — field type == column type.
Identity,
/// Lossless cast between two types with the same information content.
/// e.g., UUID↔text, Timestamp↔i64, integer widening.
Cast { from: Type, to: Type },
/// x*k + c (k ≠ 0). Inverse: (x - c) / k.
Affine { k: Value, c: Value },
/// Option<T> → nullable column.
/// Wraps an inner bijection with None↔NULL.
Nullable(Box<Bijection>),
/// Embedded struct → multiple columns.
/// Each component is an independent bijection on one field↔column pair.
Product(Vec<Bijection>),
/// Enum → discriminant column + per-variant columns.
Coproduct {
discriminant: Box<Bijection>,
variants: Vec<CoproductArm>,
},
/// Composition: apply `inner` first, then `outer`.
/// encode = outer.encode(inner.encode(x))
/// decode = inner.decode(outer.decode(x))
Compose {
inner: Box<Bijection>,
outer: Box<Bijection>,
},
}
struct CoproductArm {
discriminant_value: Value,
body: Bijection, // typically Product for data-carrying variants
}
}
Methods on Bijection
#![allow(unused)]
fn main() {
impl Bijection {
/// Encode a model-level value to a storage-level value.
fn encode(&self, value: Value) -> Value;
/// Produce a decode expression: given a column reference (or tuple of
/// column references), return a model-level expression.
fn decode(&self, column_expr: Expr) -> Expr;
/// Query whether `model_op` can be pushed through this encoding.
/// Returns the storage-level operator to use, or None if the
/// operation must fall back to model space.
fn can_distribute(&self, model_op: BinaryOp) -> Option<StorageOp>;
/// Number of columns this bijection spans.
fn column_count(&self) -> usize;
}
}
can_distribute is defined recursively:
- Identity: always
Some(model_op)— no transformation. - Cast: lookup in the per-pair homomorphism table.
- Affine:
==→Some(=).<→Some(<)if k > 0,Noneif k < 0. - Nullable: delegates to inner, may change op (e.g.,
==→IS NOT DISTINCT FROM). - Product:
==→Some(=)if all components returnSome.<→ only if lexicographic and all components support<. - Coproduct:
==→Someif discriminant + each arm returnsSome.<→ generallyNone. - Compose:
Someonly if both inner and outer returnSome.
Per-field mapping
#![allow(unused)]
fn main() {
struct FieldMapping {
bijection: Bijection,
columns: Vec<ColumnId>, // columns this field maps to (1 for primitive, N for product/coproduct)
}
}
The model-level mapping::Model holds a FieldMapping per field, plus a
reverse index from columns to fields (for computed field propagation).
Verification
The framework should be formally verified using Lean 4 + Mathlib. Mathlib already provides the algebraic vocabulary (bijections, homomorphisms, products, coproducts). The plan:
- Define the primitives and composition rules in Lean
- Prove the general theorems once (composition preserves bijection, coproduct conditions, etc.)
- For each concrete primitive, state and prove its homomorphism properties
- Lean checks everything mechanically
Engine-Level Pagination Design
Overview
This document describes the implementation of engine-level pagination in Toasty. The key principle is that pagination logic (limit+1 strategy, cursor extraction, etc.) should be handled by the engine, not in application-level code. This allows the engine to leverage database-specific capabilities (e.g., DynamoDB’s native cursor support) while providing compatibility for databases that don’t have native support (e.g., SQL databases).
Architecture Context
Statement System
toasty_core::stmt::Statementrepresents a superset of SQL - “Toasty-flavored SQL”- Contains both SQL concepts AND Toasty application-level concepts (models, paths, pagination)
Limit::PaginateForwardis a Toasty-level concept that must be transformed by the engine before reaching SQL generation- By the time statements reach
toasty-sql, they must contain ONLY valid SQL
Engine Pipeline
- Planner: Transforms Toasty statements into a pipeline of actions
- Actions: Executed by the engine, store results in VarStore
- VarStore: Stores intermediate results between pipeline steps
- ExecResponse: Final result containing values and optional metadata
Existing Patterns
- eval::Func: Pre-computed transformations that execute during pipeline execution
- partition_returning: Separates database-handled expressions from in-memory evaluations
- Output::project: Transforms raw database results before storing in VarStore
Design
Core Types
#![allow(unused)]
fn main() {
// In engine.rs
pub struct ExecResponse {
pub values: ValueStream,
pub metadata: Option<Metadata>,
}
pub struct Metadata {
pub next_cursor: Option<Expr>,
pub prev_cursor: Option<Expr>,
pub query: Query,
}
// In engine/plan/exec_statement.rs
pub struct ExecStatement {
pub input: Option<Input>,
pub output: Option<Output>,
pub stmt: stmt::Statement,
pub conditional_update_with_no_returning: bool,
/// Pagination configuration for this query
pub pagination: Option<Pagination>,
}
pub struct Pagination {
/// Original limit before +1 transformation
pub limit: u64,
/// Function to extract cursor from a row
/// Takes row as arg[0], returns cursor value(s)
pub extract_cursor: eval::Func,
}
}
VarStore Changes
The VarStore needs to be updated to store ExecResponse instead of ValueStream:
#![allow(unused)]
fn main() {
pub(crate) struct VarStore {
slots: Vec<Option<ExecResponse>>,
}
}
This allows pagination metadata to flow through the pipeline and be returned from engine::exec.
Implementation Plan
Phase 1: Update VarStore to ExecResponse [Mechanical Change]
This phase is a purely mechanical change to update the VarStore infrastructure. No pagination logic yet.
-
Update VarStore (
engine/exec/var_store.rs):- Change storage type from
ValueStreamtoExecResponse - Update
load()to returnExecResponse - Update
store()to acceptExecResponse - Update
dup()to clone entireExecResponse(including metadata)
- Change storage type from
-
Update all action executors to wrap their results in
ExecResponse:- For now, all actions will use
metadata: None - Each action’s result becomes:
ExecResponse { values, metadata: None } - Actions to update:
action_associateaction_batch_writeaction_delete_by_keyaction_exec_statementaction_find_pk_by_indexaction_get_by_keyaction_insertaction_query_pkaction_update_by_keyaction_set_var
- For now, all actions will use
-
Update pipeline execution (
engine/exec.rs):exec_pipelinereturnsExecResponse- Handle
VarStorereturningExecResponse
-
Update main engine (
engine.rs):exec::execnow returnsExecResponsedirectly- Remove the temporary wrapping logic
This phase establishes the infrastructure without any behavioral changes. All existing tests should continue to pass.
Phase 2: Add Pagination to ExecStatement [Task 2]
- Add
Paginationstruct toengine/plan/exec_statement.rs - Add
pagination: Option<Pagination>field toExecStatement - No execution changes yet - just the structure
Phase 3: Planner Support for SQL Pagination [Task 3]
In planner/select.rs, add pagination planning logic:
#![allow(unused)]
fn main() {
impl Planner<'_> {
fn plan_select_sql(...) {
// ... existing logic ...
// Handle pagination
let pagination = if let Some(Limit::PaginateForward { limit, after }) = &stmt.limit {
Some(self.plan_pagination(&mut stmt, &mut project, limit)?)
} else {
None
};
self.push_action(plan::ExecStatement {
input,
output: Some(plan::Output { var: output, project }),
stmt: stmt.into(),
conditional_update_with_no_returning: false,
pagination,
});
}
fn plan_pagination(
&mut self,
stmt: &mut stmt::Query,
project: &mut eval::Func,
limit_expr: &stmt::Expr,
) -> Result<Pagination> {
let original_limit = self.extract_limit_value(limit_expr)?;
// Get ORDER BY clause (required for pagination)
let order_by = stmt.order_by.as_ref()
.ok_or_else(|| anyhow!("Pagination requires ORDER BY"))?;
// Check if ORDER BY is unique
let is_unique = self.is_order_by_unique(order_by, stmt);
// If not unique, append primary key as tie-breaker
if !is_unique {
self.append_pk_to_order_by(stmt)?;
}
// Ensure ORDER BY fields are in returning clause
let (added_indices, original_field_count) =
self.ensure_order_by_in_returning(stmt)?;
// Build cursor extraction function
let extract_cursor = self.build_cursor_extraction_func(
stmt,
&added_indices,
)?;
// Modify project function if we added fields
if !added_indices.is_empty() {
self.adjust_project_for_pagination(
project,
original_field_count,
added_indices.len(),
);
}
// Transform limit to +1 for next page detection
*stmt.limit.as_mut().unwrap() = Limit::Offset {
limit: (original_limit + 1).into(),
offset: None,
};
Ok(Pagination {
limit: original_limit,
extract_cursor,
})
}
}
}
Key helper methods:
is_order_by_unique: Checks if ORDER BY fields form a unique constraintappend_pk_to_order_by: Adds primary key as tie-breakerensure_order_by_in_returning: Adds ORDER BY fields to SELECT if missingbuild_cursor_extraction_func: Createseval::Functo extract cursoradjust_project_for_pagination: Modifies project to filter out added fields
Phase 4: Executor Implementation [Task 4]
In engine/exec/exec_statement.rs:
#![allow(unused)]
fn main() {
impl Exec<'_> {
pub(super) async fn action_exec_statement(
&mut self,
action: &plan::ExecStatement,
) -> Result<()> {
// ... existing logic to execute statement ...
let res = if let Some(pagination) = &action.pagination {
self.handle_paginated_query(res, pagination, &action.stmt).await?
} else {
ExecResponse {
values: /* normal value stream */,
metadata: None,
}
};
self.vars.store(out.var, res);
Ok(())
}
async fn handle_paginated_query(
&mut self,
rows: Rows,
pagination: &Pagination,
stmt: &Statement,
) -> Result<ExecResponse> {
// Collect limit+1 rows
let mut buffer = Vec::new();
let mut count = 0;
match rows {
Rows::Values(stream) => {
for await value in stream {
buffer.push(value?);
count += 1;
if count > pagination.limit {
break;
}
}
}
_ => return Err(anyhow!("Pagination requires row results")),
}
// Check if there's a next page
let has_next = buffer.len() > pagination.limit as usize;
// Extract cursor if there's a next page
let next_cursor = if has_next {
// Get cursor from the LAST item we're keeping
let last_kept = &buffer[pagination.limit as usize - 1];
let cursor_value = pagination.extract_cursor.eval(&[last_kept.clone()])?;
// Truncate buffer to requested limit
buffer.truncate(pagination.limit as usize);
Some(stmt::Expr::Value(cursor_value))
} else {
None
};
Ok(ExecResponse {
values: ValueStream::from_vec(buffer),
metadata: Some(Metadata {
next_cursor,
prev_cursor: None, // TODO: implement in future
query: stmt.as_query().cloned().unwrap_or_default(),
}),
})
}
}
}
Phase 5: Clean Up Application Layer [Task 5]
Remove the limit+1 logic from Paginate::collect:
#![allow(unused)]
fn main() {
pub async fn collect(self, db: &Db) -> Result<Page<M>> {
// Simply delegate to db.paginate - engine handles pagination
db.paginate(self.query).await
}
}
Update Db::paginate to use the metadata from ExecResponse:
#![allow(unused)]
fn main() {
pub async fn paginate<M: Model>(&self, statement: stmt::Select<M>) -> Result<Page<M>> {
let exec_response = engine::exec(self, statement.untyped.clone().into()).await?;
// Convert value stream to models
let mut cursor = Cursor::new(self.schema.clone(), exec_response.values);
let mut items = Vec::new();
while let Some(item) = cursor.next().await {
items.push(item?);
}
// Extract pagination metadata
let (next_cursor, prev_cursor) = match exec_response.metadata {
Some(metadata) => (metadata.next_cursor, metadata.prev_cursor),
None => (None, None),
};
Ok(Page::new(items, statement, next_cursor, prev_cursor))
}
}
Key Design Decisions
-
Single Source of Truth: The
extract_cursorfunction is the only place that knows how to extract cursors. No redundantorder_by_indices. -
Type Safety: Cursor extraction function uses actual inferred types from the schema, not
Type::Any. -
Automatic Tie-Breaking: The planner automatically appends primary key to ORDER BY when needed for uniqueness.
-
Transparent Field Addition: ORDER BY fields are added to returning clause transparently, and filtered out via the project function.
-
Metadata Threading:
ExecResponseflows through VarStore, preserving metadata through the pipeline.
Testing Strategy
- Unit Tests: Test cursor extraction function generation
- Integration Tests: Test pagination with various ORDER BY configurations
- Database Tests: Ensure SQL generation is correct (no
PaginateForwardin SQL) - End-to-End Tests: Verify pagination works across different databases
Future Enhancements
- Previous Page Support: Implement
prev_cursorextraction andPaginateBackward - DynamoDB Native Pagination: Leverage LastEvaluatedKey instead of limit+1
- Complex ORDER BY: Support expressions beyond simple column references
- Optimization: Cache cursor extraction functions for common patterns
Serialized Field Implementation Design
Builds on the #[serialize] bookkeeping already in place (attribute parsing,
SerializeFormat enum, FieldPrimitive.serialize field). This document covers
the runtime serialization/deserialization codegen.
User-Facing API
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key]
#[auto]
id: uuid::Uuid,
name: String,
#[serialize(json)]
tags: Vec<String>,
// nullable: the column may be NULL. The Rust type must be Option<T>.
// None maps to NULL; Some(v) is serialized as JSON.
#[serialize(json, nullable)]
metadata: Option<HashMap<String, String>>,
// Non-nullable Option: the entire Option value is serialized as JSON.
// Some(v) → `v` as JSON, None → `null` as JSON text (column is NOT NULL).
#[serialize(json)]
extra: Option<String>,
}
}
Fields annotated with #[serialize(json)] are stored as JSON text in a single
database column. The field’s Rust type must implement serde::Serialize and
serde::DeserializeOwned. The database column type defaults to String/TEXT.
Nullability
By default, serialized fields are not nullable. The entire Rust value —
including Option<T> — is serialized as-is into JSON text stored in a NOT NULL
column. This means None becomes the JSON text null, and Some(v) becomes
the JSON serialization of v.
To make the database column nullable, add nullable to the attribute:
#[serialize(json, nullable)]. When nullable is set:
- The Rust type must be
Option<T>. Nonemaps to a SQLNULL(no value stored).Some(v)serializesvas JSON text.
This is an explicit opt-in because the two behaviors are meaningfully different:
a user may legitimately want to serialize None as JSON null text in a NOT
NULL column (e.g., for a JSON API field where null is a valid value distinct
from “no row”).
Value Encoding
A serialized field stores a JSON string in the database. The value stream uses
Value::String for serialized fields, not the field’s logical Rust type.
Rust value ──serde_json::to_string──► Value::String(json) ──► DB column (TEXT)
DB column (TEXT) ──► Value::String(json) ──serde_json::from_str──► Rust value
Schema Changes
For serialized fields, field_ty bypasses <T as Primitive>::field_ty() and
constructs FieldPrimitive directly with ty: Type::String. The user’s Rust
type T does not need to implement Primitive — it only needs Serialize +
DeserializeOwned.
Nullability is determined by the nullable flag in the attribute, not by
inspecting the Rust type.
Remove serialize from Primitive::field_ty
Today Primitive::field_ty accepts a serialize argument so it can thread
SerializeFormat into the FieldPrimitive it builds. With this design,
serialized fields never go through Primitive::field_ty — codegen constructs
the FieldPrimitive directly. That means the serialize parameter is dead
for all callers and should be removed.
#![allow(unused)]
fn main() {
// Primitive trait (before):
fn field_ty(
storage_ty: Option<db::Type>,
serialize: Option<SerializeFormat>,
) -> FieldTy;
// Primitive trait (after):
fn field_ty(storage_ty: Option<db::Type>) -> FieldTy;
}
The default implementation drops the serialize field from the constructed
FieldPrimitive (it is always None when going through the trait). Embedded
type overrides (Embed, enum) already ignore both parameters.
Codegen changes:
#![allow(unused)]
fn main() {
// Non-serialized field (calls through the trait):
field_ty = quote!(<#ty as Primitive>::field_ty(#storage_ty));
nullable = quote!(<#ty as Primitive>::NULLABLE);
// Serialized field (constructed directly):
field_ty = quote!(FieldTy::Primitive(FieldPrimitive {
ty: Type::String,
storage_ty: #storage_ty,
serialize: Some(SerializeFormat::Json),
}));
nullable = #serialize_nullable; // literal bool from attribute
}
No type-level hack is needed — the nullable flag is parsed from the attribute
at macro expansion time and threaded through to schema registration as a
literal bool.
Codegen Changes
Primitive::load / Model::load
For serialized fields, the generated load code reads a String from the record
and deserializes it. The behavior depends on whether nullable is set:
#![allow(unused)]
fn main() {
// Non-nullable (default) — works for any T including Option<T>:
field_name: {
let json_str = <String as Primitive>::load(record[i].take())?;
serde_json::from_str(&json_str)
.map_err(|e| Error::from_args(
format_args!("failed to deserialize field '{}': {}", "field_name", e)
))?
},
// Nullable (#[serialize(json, nullable)]) — T must be Option<U>:
field_name: {
let value = record[i].take();
if value.is_null() {
None
} else {
let json_str = <String as Primitive>::load(value)?;
Some(serde_json::from_str(&json_str)
.map_err(|e| Error::from_args(
format_args!("failed to deserialize field '{}': {}", "field_name", e)
))?)
}
},
}
Non-serialized fields are unchanged: <T as Primitive>::load(record[i].take())?.
Reload (root model and embedded)
Reload match arms follow the same pattern: load as String, then deserialize.
For nullable fields, check null first.
Create builder setters
Serialized field setters accept the concrete Rust type (not impl IntoExpr<T>,
since T does not implement IntoExpr) and serialize to a String expression:
#![allow(unused)]
fn main() {
// Non-nullable (default) — accepts T directly (including Option<T>):
pub fn field_name(mut self, field_name: FieldType) -> Self {
let json = serde_json::to_string(&field_name).expect("failed to serialize");
self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
self
}
// Nullable (#[serialize(json, nullable)]) — accepts Option<InnerType>:
pub fn field_name(mut self, field_name: Option<InnerType>) -> Self {
match &field_name {
Some(v) => {
let json = serde_json::to_string(v).expect("failed to serialize");
self.stmt.set(index, <String as IntoExpr<String>>::into_expr(json));
}
None => {
self.stmt.set(index, Expr::<String>::from_value(Value::Null));
}
}
self
}
}
Update builder setters
Same pattern as create: accept the concrete type, serialize to JSON, store as
String expression.
Dependencies
serde_json is added as an optional dependency of the toasty crate, gated
behind the existing serde feature:
# crates/toasty/Cargo.toml
[features]
serde = ["dep:serde_core", "dep:serde_json"]
[dependencies]
serde_json = { workspace = true, optional = true }
Generated code references serde_json through the codegen support module:
#![allow(unused)]
fn main() {
// crates/toasty/src/lib.rs, in codegen_support
#[cfg(feature = "serde")]
pub use serde_json;
}
If a user writes #[serialize(json)] without enabling the serde feature, the
generated code fails to compile because codegen_support::serde_json does not
exist. The compiler error points at the generated serde_json::from_str call.
Files Modified
| File | Change |
|---|---|
crates/toasty/Cargo.toml | Add serde_json optional dep, update serde feature |
crates/toasty/src/lib.rs | Re-export serde_json in codegen_support |
crates/toasty/src/stmt/primitive.rs | Remove serialize param from Primitive::field_ty |
crates/toasty-codegen/src/schema/field.rs | Parse nullable flag from #[serialize(...)] attribute |
crates/toasty-codegen/src/expand.rs | Update Embed/enum field_ty overrides to drop serialize param |
crates/toasty-codegen/src/expand/schema.rs | Construct FieldPrimitive directly for serialized fields; remove serialize arg from non-serialized field_ty call |
crates/toasty-codegen/src/expand/embedded_enum.rs | Drop serialize arg from field_ty call |
crates/toasty-codegen/src/expand/model.rs | Deserialize in expand_load_body() and expand_embedded_reload_body() |
crates/toasty-codegen/src/expand/create.rs | Serialize in create setter for serialized fields |
crates/toasty-codegen/src/expand/update.rs | Serialize in update setter, deserialize in reload arms |
crates/toasty-driver-integration-suite/Cargo.toml | Add serde, serde_json deps, enable serde feature |
crates/toasty-driver-integration-suite/src/tests/serialize.rs | Integration tests |
Integration Tests
New file serialize.rs in the driver integration suite. Test cases:
- Round-trip a
Vec<String>field through create and read-back - Round-trip a nullable
Option<T>field withSomeandNone(SQL NULL) values - Non-nullable
Option<T>field:Noneround-trips as JSONnulltext (not SQL NULL) - Update a serialized field and verify the new value persists
- Round-trip a custom struct with
serde::Serialize + DeserializeOwned
Toasty ORM - Development Roadmap
This roadmap outlines potential enhancements and missing features for the Toasty ORM.
Overview
Toasty is an easy-to-use ORM for Rust that supports both SQL and NoSQL databases. This roadmap documents potential future work and feature gaps.
Feature Areas
Composite Keys
Composite Key Support (partial implementation)
- Composite foreign key optimization in query simplification
- Composite PK handling in expression rewriting and IN-list operations
- HasMany/BelongsTo relationships with composite foreign keys referencing composite primary keys
- Junction table / many-to-many patterns with composite keys
- DynamoDB driver: batch delete/update with composite keys, composite unique indexes
- Comprehensive test coverage for all composite key combinations
Query Capabilities
Query Ordering, Limits & Pagination
- Multi-column ordering convenience method (
.then_by()) - Direct
.limit()method for non-paginated queries .last()convenience method
- String operations: contains, starts with, ends with, LIKE (partial AST support)
- NOT IN
- Case-insensitive matching
- BETWEEN / range queries
- Relation filtering (filter by associated model fields)
- Field-to-field comparison
- Arithmetic operations in queries (add, subtract, multiply, divide, modulo)
- Aggregate queries and GROUP BY / HAVING
Data Types
Extended Data Types
- Embedded struct & enum support (partial implementation)
- Serde-serialized types (JSON/JSONB columns for arbitrary Rust types)
- Embedded collections (arrays, maps, sets, etc.)
Relationships & Loading
Partial Model Loading
- Allow models to have fields that are not loaded by default (e.g. a large
bodycolumn on anArticlemodel) - Fields opt-in via a
#[deferred]attribute and must be wrapped in aDeferred<T>type - By default, queries skip deferred fields; callers opt-in with
.include(Article::body)(same API as relation preloading) - Accessing a
Deferred<T>that was not loaded either returns an error or panics with a clear message - Works with primitive types, embedded structs, and embedded enums — just a subset of columns in the same table
#![allow(unused)] fn main() { #[toasty::model] struct Article { #[key] #[auto] id: u64, title: String, author: BelongsTo<User>, #[deferred] body: Deferred<String>, // not loaded unless explicitly included } // Load metadata only (no body column fetched) let articles = Article::all().collect(&db).await?; // Load with body let articles = Article::all().include(Article::body).collect(&db).await?; }
Relationships
- Many-to-many relationships
- Polymorphic associations
- Nested preloading (multi-level
.include()support)
Query Building
Query Features
- Subquery improvements
- Better conditional/dynamic query building ergonomics
Database Function Expressions
- Allow database-side functions (e.g.
NOW(),CURRENT_TIMESTAMP) as expressions in create and update operations - User API: field setters accept
toasty::stmthelpers liketoasty::stmt::now()that resolve tocore::stmt::ExprFuncvariants#![allow(unused)] fn main() { // Set updated_at to the database's current time instead of a Rust-side value user.update() .updated_at(toasty::stmt::now()) .exec(&db) .await?; // Also usable in create operations User::create() .name("Alice") .created_at(toasty::stmt::now()) .exec(&db) .await?; } - Extend
ExprFuncenum intoasty-corewith new function variants (e.g.Now) - SQL serialization for each function across supported databases (
NOW()for PostgreSQL/MySQL,datetime('now')for SQLite) - Codegen: update field setter generation to accept both value types and function expressions
- Future: support additional scalar functions (e.g.
COALESCE,LOWER,UPPER,LENGTH)
Raw SQL Support
- Execute arbitrary SQL statements directly
- Parameterized queries with type-safe bindings
- Raw SQL fragments within typed queries (escape hatch for complex expressions)
Data Modification
Upsert
- Insert-or-update: atomic
INSERT ... ON CONFLICT DO UPDATE(PostgreSQL/SQLite),ON DUPLICATE KEY UPDATE(MySQL),MERGE(SQL Server/Oracle) - Insert-or-ignore (
DO NOTHING/INSERT IGNORE) - Conflict target: by column(s), by constraint name, partial indexes (PostgreSQL)
- Column update control: update all non-key columns, named subset, or raw SQL expression
- Access to the proposed row via
EXCLUDEDpseudo-table in the update expression - Bulk upsert (multi-row
VALUES) - DynamoDB:
PutItem(unconditional replace) vs.UpdateItemwith condition expression
Mutation Result Information
- Return affected row counts from update operations (how many records were updated)
- Return affected row counts from delete operations (how many records were deleted)
- Better result types that provide operation metadata
- Distinguish between “no rows matched” vs “rows matched but no changes needed”
Transactions
Atomic Batch Operations
- Cross-database atomic batch API
- Supported across SQL and NoSQL databases
- Type-safe operation batching
- All-or-nothing semantics
SQL Transaction API
- Manual transaction control for SQL databases
- BEGIN/COMMIT/ROLLBACK support
- Savepoints and nested transactions
- Isolation level configuration
Schema Management
Migrations
- Schema migration system
- Migration generation
- Rollback support
- Schema versioning
- CLI tools for schema management
Toasty Runtime Improvements
Concurrent Task Execution
- Replace the current ad-hoc background task with a proper in-flight task manager
- Execute independent parts of an execution plan concurrently
- Track and coordinate multiple in-flight tasks within a single query execution
Cancellation & Cleanup
- Detect when the caller drops the future representing query completion
- Perform clean cancellation on drop (rollback any incomplete transactions)
- Ensure no resource leaks or orphaned database state on cancellation
Internal Instrumentation & Metrics
- Instrument time spent in each execution phase (planning, simplification, execution, serialization)
- Track CPU time consumed by query planning to detect expensive plans
- Provide internal metrics for diagnosing performance bottlenecks
Performance
- Dedicated post-lowering optimization pass for expensive predicate analysis (run once, not per-node)
- Equivalence classes for transitive constraint reasoning (
a = b AND b = 5impliesa = 5) - Structured constraint representation (constant bindings, range bounds, exclusion sets)
- Targeted predicate normalization without full DNF conversion
Stored Procedures (Pre-Compiled Query Plans)
- Compile query plans once and execute them many times with different parameter values
- Skip the full compilation pipeline (simplification, lowering, HIR/MIR planning) on repeated calls
- Parameterized statement AST with
Paramslots for value substitution at execution time - Pairs with database-level prepared statements for end-to-end optimization
Optimization Features
- Bulk inserts/updates
- Query caching
- Connection pooling improvements
Developer Experience
Ergonomic Macros
toasty::query!()- Succinct query syntax that translates to builder DSL#![allow(unused)] fn main() { // Instead of: User::all().filter(...).order_by(...).collect(&db).await toasty::query!(User, filter: ..., order_by: ...).collect(&db).await }toasty::create!()- Concise record creation syntax#![allow(unused)] fn main() { // Instead of: User::create().name("Alice").age(30).exec(&db).await toasty::create!(User, name: "Alice", age: 30).exec(&db).await }toasty::update!()- Simplified update syntax#![allow(unused)] fn main() { // Instead of: user.update().name("Bob").age(31).exec(&db).await toasty::update!(user, name: "Bob", age: 31).exec(&db).await }
Tooling & Debugging
- Query logging
Safety & Security
Sensitive Value Flagging
- Flag sensitive fields (e.g. passwords, tokens, secrets) so they are automatically redacted in logs and debug output
- Attribute-based opt-in:
#[sensitive]on model fields marks values that must never appear in plaintext outside the database - All logging, query tracing, and error messages strip or mask flagged values
- Prevents accidental credential leakage in application logs, query dumps, and diagnostics
Trusted vs Untrusted Input
- Distinguish between values originating from untrusted user input and values produced internally by the query engine (e.g. literal numbers, generated keys)
- Engine-produced values can skip escaping/parameterization since they are known-safe, reducing unnecessary overhead
- Untrusted input continues to be parameterized or escaped to prevent SQL injection
- Enables more efficient SQL generation without weakening safety guarantees for external data
Notes
The roadmap documents describe potential enhancements and missing features. For information about what’s currently implemented, refer to the user guide or test the API directly.
Composite Key Support
Overview
Toasty has partial composite key support. Basic CRUD operations work for models with composite primary keys (both field-level #[key] and model-level #[key(partition = ..., local = ...)]), but several engine optimizations, relationship patterns, and driver operations panic or fall back when encountering composite keys.
This document catalogs the gaps, surveys how other ORMs handle composite keys, identifies common SQL patterns that require composite key support, and proposes a phased implementation plan.
Current State
What Works
Schema definition — Two syntaxes for composite keys:
#![allow(unused)]
fn main() {
// Field-level: multiple #[key] attributes
#[derive(Debug, toasty::Model)]
struct Foo {
#[key]
one: String,
#[key]
two: String,
}
// Model-level: partition/local keys (designed for DynamoDB compatibility)
#[derive(Debug, toasty::Model)]
#[key(partition = user_id, local = id)]
struct Todo {
#[auto]
id: uuid::Uuid,
user_id: uuid::Uuid,
title: String,
}
}
Generated query methods for composite keys:
filter_by_<field1>_and_<field2>()— filter by both key fieldsget_by_<field1>_and_<field2>()— get a single record by both keysfilter_by_<field1>_and_<field2>_batch()— batch get by key tuplesfilter_by_<partition_field>()— filter by partition key alone- Comparison operators on local keys:
gt(),ge(),lt(),le(),ne(),eq()
Database support:
- SQL databases (SQLite, PostgreSQL, MySQL): composite primary keys via field-level
#[key] - DynamoDB: partition/local key syntax (max 2 keys: 1 partition + 1 local)
Test coverage:
one_model_composite_key::batch_get_by_key— basic CRUD with field-level composite keysone_model_query— partition/local key queries with range operatorshas_many_crud_basic::has_many_when_fk_is_composite— HasMany with composite FK (working)embedded— composite keys with embedded struct fieldsexamples/composite-key/— end-to-end example application
What Does Not Work
The following locations contain todo!(), assert!(), or panic!() that block composite key usage:
Engine Simplification (5 locations)
| File | Line | Issue |
|---|---|---|
engine/simplify/expr_binary_op.rs | 23-25 | todo!("handle composite keys") when simplifying equality on model references with composite PKs |
engine/simplify/expr_binary_op.rs | 43-45 | todo!("handle composite keys") when simplifying binary ops on composite FK fields |
engine/simplify/expr_in_list.rs | 30-32 | todo!() when optimizing IN-list expressions for models with composite PKs |
engine/simplify/lift_in_subquery.rs | 92-96 | assert_eq!(len, 1, "TODO: composite keys") — subquery lifting restricted to single-field FKs |
engine/simplify/lift_in_subquery.rs | 109-111, 145-148, 154-157 | Three more todo!("composite keys") in BelongsTo and HasOne subquery lifting |
engine/simplify/rewrite_root_path_expr.rs | 18-19 | todo!("composite primary keys") when rewriting path expressions with key constraints |
Engine Lowering (2 locations)
| File | Line | Issue |
|---|---|---|
engine/lower/insert.rs | 90-92 | todo!() when lowering inserts with BelongsTo relations that have composite FKs |
engine/lower.rs | 893-896 | Unhandled else branch when lowering relationships with composite FKs |
DynamoDB Driver (4 locations)
| File | Line | Issue |
|---|---|---|
driver-dynamodb/op/update_by_key.rs | 197 | assert!(op.keys.len() == 1) — batch update limited to single key |
driver-dynamodb/op/delete_by_key.rs | 119-121 | panic!("only 1 key supported so far") — batch delete limited to single key |
driver-dynamodb/op/delete_by_key.rs | 33 | panic!("TODO: support more than 1 unique index") |
driver-dynamodb/op/create_table.rs | 113 | assert_eq!(1, index.columns.len()) — composite unique indexes unsupported |
Stubbed Tests (2 tests)
| File | Test | Status |
|---|---|---|
has_many_crud_basic.rs | has_many_when_pk_is_composite | Empty — not implemented |
has_many_crud_basic.rs | has_many_when_fk_and_pk_are_composite | Empty — not implemented |
Design Constraints
- Auto-increment is intentionally forbidden with composite keys. The schema verifier rejects
#[auto(increment)]on composite PK tables. UUID auto-generation is the supported alternative. - DynamoDB limits composite keys to 2 columns (1 partition + 1 local). This is a DynamoDB limitation, not a Toasty limitation.
How Other ORMs Handle Composite Keys
Rust ORMs
Diesel — First-class composite key support. #[diesel(primary_key(col1, col2))] on the struct; find() accepts a tuple (val1, val2); Identifiable returns a tuple reference. BelongsTo works with composite keys via explicit foreign_key attribute. Compile-time type checking through generated code.
SeaORM — Supports composite keys via multiple #[sea_orm(primary_key)] field attributes. PrimaryKeyTrait::ValueType is a tuple. find_by_id() and delete_by_id() accept tuples. DAO pattern works fully. Composite foreign keys are less ergonomic but functional.
Python ORMs
SQLAlchemy — Gold standard for composite key support. Multiple primary_key=True columns define a composite PK. session.get(Model, (a, b)) for lookups. ForeignKeyConstraint at the table level handles composite FKs cleanly. Identity map uses tuples. All features (eager/lazy loading, cascades, relationships) work uniformly with composite keys.
Django — Added CompositePrimaryKey in Django 5.2 (2025) after years of surrogate-key-only design. pk returns a tuple. Model.objects.get(pk=(1, 2)) works. Composite FK support is still limited. Ecosystem (admin, REST frameworks, third-party packages) is catching up.
Tortoise ORM — No composite PK support. Surrogate key + unique constraint is the only option.
JavaScript/TypeScript ORMs
Prisma — @@id([field1, field2]) defines composite PKs. Auto-generates compound field names (field1_field2) for findUnique/update/delete. Multi-field @relation(fields: [...], references: [...]) for composite FKs. Fully type-safe generated client.
TypeORM — Multiple @PrimaryColumn() decorators. All operations use object-based where clauses ({ field1: val1, field2: val2 }). @JoinColumn accepts an array for composite FKs. save() does upsert based on all PK fields.
Sequelize — Supports composite PK definition but findByPk() does not work with composite keys (must use findOne({ where })). Composite FK support requires workarounds or raw SQL.
Drizzle — primaryKey({ columns: [col1, col2] }) in the table config callback. foreignKey({ columns: [...], foreignColumns: [...] }) for composite FKs. No special find-by-PK method; all queries use explicit where + and(). SQL-first philosophy.
Java/Kotlin
Hibernate/JPA — Two approaches: @IdClass (flat fields + separate ID class) and @EmbeddedId (nested object). PK class must implement Serializable, equals(), hashCode(). @JoinColumns (plural) for composite FKs. @MapsId connects relationship fields to embedded ID fields. Full relationship support.
Exposed (Kotlin) — PrimaryKey(col1, col2) in the table object. Only the DSL (SQL-like) API supports composite keys; the DAO (EntityClass) API does not. Relationships require manual joins.
Go ORMs
GORM — Multiple gorm:"primaryKey" tags. Composite FKs via foreignKey:Col1,Col2;references:Col1,Col2. Zero-value problem: PK column with value 0 is treated as “not set.”
Ent — No composite PK support by design (graph semantics, every node has a single ID). Unique composite indexes are the workaround.
Ruby
ActiveRecord (Rails 7.1+) — primary_key: [:col1, :col2] in migrations, self.primary_key = [:col1, :col2] in model. find([a, b]) for lookups. query_constraints: [:col1, :col2] for composite FK associations. Pre-7.1 required the composite_primary_keys gem.
Cross-ORM Summary
| ORM | Composite PK | Composite FK | Find by PK | Relationship Support |
|---|---|---|---|---|
| Diesel (Rust) | Yes | Yes | Tuple | Full |
| SeaORM (Rust) | Yes | Partial | Tuple | Full |
| SQLAlchemy (Python) | Yes | Yes | Tuple | Full |
| Django (Python) | 5.2+ | Limited | Tuple | Partial |
| Prisma (TS) | Yes | Yes | Generated compound | Full |
| TypeORM (TS) | Yes | Yes | Object | Full |
| Sequelize (JS) | Yes | Partial | Broken | Partial |
| Drizzle (TS) | Yes | Yes | Manual where | Manual |
| Hibernate/JPA | Yes | Yes | ID class | Full |
| GORM (Go) | Yes | Yes | Where clause | Full |
| ActiveRecord (Ruby) | 7.1+ | 7.1+ | Array | Partial |
Key takeaway: Mature ORMs (Diesel, SQLAlchemy, Hibernate) treat composite keys as first-class citizens where all operations work uniformly. The most common API pattern is tuple-based identity (find((a, b))). Composite foreign keys are universally harder than composite PKs — even established ORMs have rougher edges there.
Common SQL Patterns Requiring Composite Keys
1. Junction Tables (Many-to-Many)
The most common use case. The junction table’s PK is the combination of FKs to both related tables.
CREATE TABLE enrollment (
student_id INTEGER NOT NULL REFERENCES student(id),
course_id INTEGER NOT NULL REFERENCES course(id),
enrolled_at TIMESTAMP DEFAULT NOW(),
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id)
);
Junction tables often accumulate extra attributes (grade, enrolled_at, role) that make them first-class entities requiring full CRUD support, not just a hidden link table.
Toasty gap: Many-to-many relationships are listed as a separate roadmap item. Composite key support is a prerequisite — junction tables are inherently composite-keyed.
2. Multi-Tenant Data Isolation
Tenant ID appears as the first column in every composite PK, enabling partition-level isolation and efficient tenant-scoped queries.
CREATE TABLE tenant_document (
tenant_id UUID NOT NULL REFERENCES tenant(id),
document_id UUID NOT NULL DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
PRIMARY KEY (tenant_id, document_id)
);
-- All queries are scoped: WHERE tenant_id = $1 AND ...
Why composite PKs: Enforces isolation at the database level. PK index prefix enables efficient tenant-scoped queries. Maps directly to DynamoDB’s partition/local key model.
Toasty gap: The #[key(partition = ..., local = ...)] syntax already models this. The gaps are in relationship handling when both sides use composite keys.
3. Time-Series Data
CREATE TABLE sensor_reading (
sensor_id INTEGER NOT NULL,
recorded_at TIMESTAMP NOT NULL,
value DOUBLE PRECISION NOT NULL,
PRIMARY KEY (sensor_id, recorded_at)
);
Why composite PKs: Natural ordering by sensor then time. Range scans on recorded_at within a sensor are efficient. Supports table partitioning by time ranges.
4. Hierarchical Data (Closure Table)
CREATE TABLE category_closure (
ancestor_id INTEGER NOT NULL REFERENCES category(id),
descendant_id INTEGER NOT NULL REFERENCES category(id),
depth INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (ancestor_id, descendant_id)
);
5. Composite Foreign Keys Referencing Composite PKs
A child table references a parent with a composite PK — all parent PK columns appear in the child as FK columns.
CREATE TABLE order_item (
order_id INTEGER NOT NULL REFERENCES "order"(id),
item_number INTEGER NOT NULL,
PRIMARY KEY (order_id, item_number)
);
CREATE TABLE order_item_shipment (
id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL,
item_number INTEGER NOT NULL,
shipment_id INTEGER NOT NULL REFERENCES shipment(id),
FOREIGN KEY (order_id, item_number)
REFERENCES order_item(order_id, item_number)
);
Toasty gap: This is the hardest pattern. The engine simplification and lowering layers assume single-field FKs in multiple places. Fixing this is the core of the composite key work.
6. Versioned Records
CREATE TABLE document_version (
document_id INTEGER NOT NULL REFERENCES document(id),
version INTEGER NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (document_id, version)
);
7. Composite Unique Constraints vs Composite Primary Keys
Some applications prefer a surrogate PK with a composite unique constraint:
-- Surrogate PK + composite unique
CREATE TABLE enrollment (
id SERIAL PRIMARY KEY,
student_id INTEGER NOT NULL,
course_id INTEGER NOT NULL,
UNIQUE (student_id, course_id)
);
Trade-offs: surrogate PKs simplify FKs (single column) and URL design, but composite PKs are more storage-efficient and semantically meaningful. ORMs that don’t support composite PKs (Django pre-5.2, Tortoise, Ent) force the surrogate pattern.
Toasty should support both patterns — composite PKs for direct use and composite unique constraints for the surrogate approach.
Implementation Plan
Phase 1: Engine Simplification — Composite PK/FK Handling
Fix the todo!() panics in the engine simplification layer so that queries involving composite keys pass through without crashing, even if not fully optimized.
Files:
engine/simplify/expr_binary_op.rs— Handle composite PKs and FKs in equality simplification. For composite keys, generate an AND of per-field comparisons.engine/simplify/expr_in_list.rs— Handle IN-list for composite PKs. Generate(col1, col2) IN ((v1, v2), (v3, v4))or equivalent AND/OR tree.engine/simplify/rewrite_root_path_expr.rs— Rewrite path expressions for composite PKs.
Approach: Where a single-field operation currently destructures let [field] = &fields[..], extend to iterate over all fields and combine with AND expressions.
Phase 2: Subquery Lifting for Composite FKs
Extend the subquery lifting optimization to handle composite foreign keys in BelongsTo and HasOne relationships.
Files:
engine/simplify/lift_in_subquery.rs— Remove theassert_eq!(len, 1)and handle multi-field FKs. For the optimization path, generate AND of per-field comparisons. For the fallback IN subquery path, generate tuple-based IN expressions or multiple correlated conditions.
Approach: The existing single-field logic maps fk_field.source -> fk_field.target. For composite keys, do the same for each field pair and combine with AND.
Phase 3: Engine Lowering — Composite FK Relationships
Fix insert and relationship lowering to handle composite FKs.
Files:
engine/lower/insert.rs— When lowering BelongsTo in insert operations, set all FK fields from the related record’s PK fields, not just one.engine/lower.rs— Handle composite FKs in relationship lowering. Generate multi-column join conditions.
Phase 4: DynamoDB Driver — Batch Operations with Composite Keys
Files:
driver-dynamodb/op/update_by_key.rs— Support batch updates with multiple keys (iterate and issue individual UpdateItem calls if needed).driver-dynamodb/op/delete_by_key.rs— Support batch deletes. Remove the single-key panic.driver-dynamodb/op/create_table.rs— Support composite unique indexes (Global Secondary Indexes with multiple key columns where DynamoDB allows it).
Phase 5: Test Coverage
Fill in the stubbed tests and add new ones covering all composite key combinations:
Existing stubs to implement:
has_many_when_pk_is_composite— Parent has composite PK, child has single FK pointing to ithas_many_when_fk_and_pk_are_composite— Both sides have composite keys
New tests to add:
| Test | Description |
|---|---|
composite_pk_crud | Full CRUD (create, read, update, delete) on a model with 2+ key fields |
composite_pk_three_fields | Composite PK with 3 fields to test beyond the 2-field case |
composite_fk_belongs_to | BelongsTo where the FK is composite (references a composite PK) |
composite_fk_has_one | HasOne with composite FK |
composite_key_pagination | Cursor-based pagination with composite PK ordering |
composite_key_batch_operations | Batch get/update/delete with composite keys |
composite_key_scoped_queries | Scoped queries (e.g., user.todos().filter_by_id(...)) with composite keys |
composite_key_update_non_key_fields | Update non-key fields on a composite-keyed model |
composite_key_unique_constraint | Composite unique constraint (not PK) behavior |
junction_table_pattern | Many-to-many junction table with composite PK and extra attributes |
multi_tenant_pattern | Tenant-scoped models with (tenant_id, entity_id) composite PKs |
Design Decisions
Tuple-Based Identity
Following Diesel and SQLAlchemy’s lead, composite key identity should be represented as tuples. The current generated methods (get_by_field1_and_field2(val1, val2)) are a good API. For batch operations, the tuple-of-references pattern (filter_by_field1_and_field2_batch([(&a, &b), ...])) is also solid.
AND Composition for Multi-Field Conditions
When a single-field operation like pk_field = value needs to become a composite operation, the standard approach is:
pk_field1 = value1 AND pk_field2 = value2
This maps cleanly to SQL WHERE clauses and DynamoDB key conditions. The engine’s stmt::ExprAnd already supports this.
IN-List with Composite Keys
For batch lookups, composite IN can be expressed as:
-- Row-value syntax (PostgreSQL, MySQL 8.0+, SQLite)
WHERE (col1, col2) IN ((v1a, v2a), (v1b, v2b))
-- Equivalent OR-of-ANDs (universal)
WHERE (col1 = v1a AND col2 = v2a) OR (col1 = v1b AND col2 = v2b)
The OR-of-ANDs form is more portable across databases. The engine should generate this form and let the SQL serializer optimize to row-value syntax where supported.
Composite FK Optimization
The subquery lifting optimization (lift_in_subquery.rs) currently rewrites:
-- Before: subquery
user_id IN (SELECT id FROM users WHERE name = 'Alice')
-- After: direct comparison
user_id = <alice_id>
For composite FKs, the rewrite becomes:
-- Before: correlated subquery
(order_id, item_number) IN (SELECT order_id, item_number FROM order_items WHERE ...)
-- After: direct comparison
order_id = <val1> AND item_number = <val2>
The same optimization logic applies — just iterated over each FK field pair.
Testing Strategy
- All new tests go in the integration suite (
toasty-driver-integration-suite) to run against all database backends - Use the existing
#[driver_test]macro for multi-database testing - Use the matrix testing infrastructure (
compositedimension) where appropriate - Each phase should have passing tests before moving to the next phase
- No unit tests in source code per project convention
Query Ordering, Limits & Pagination
Overview
Toasty provides cursor-based pagination using keyset pagination, which offers consistent performance and works well across both SQL and NoSQL databases. The implementation converts pagination cursors into WHERE clauses rather than using OFFSET, avoiding the performance issues of traditional offset-based pagination.
Potential Future Work
Multi-column Ordering Convenience
Add .then_by() method for chaining multiple order clauses:
#![allow(unused)]
fn main() {
let users = User::all()
.order_by(User::FIELDS.status().asc())
.then_by(User::FIELDS.created_at().desc())
.paginate(10)
.collect(&db)
.await?;
}
Current workaround requires manual construction:
#![allow(unused)]
fn main() {
use toasty::stmt::OrderBy;
let order = OrderBy::from([
Post::FIELDS.status().asc(),
Post::FIELDS.created_at().desc(),
]);
let posts = Post::all()
.order_by(order)
.collect(&db)
.await?;
}
Implementation:
- File:
toasty-codegen/src/expand/query.rs - Add
.then_by()method to query builder - Complexity: Medium
Direct Limit Method
Expose .limit() for non-paginated queries:
#![allow(unused)]
fn main() {
let recent_posts: Vec<Post> = Post::all()
.order_by(Post::FIELDS.created_at().desc())
.limit(5)
.collect(&db)
.await?;
}
Implementation:
- File:
toasty-codegen/src/expand/query.rs - Generate
.limit()method - Complexity: Low
Last Convenience Method
Get the last matching record:
#![allow(unused)]
fn main() {
let last_user: Option<User> = User::all()
.order_by(User::FIELDS.created_at().desc())
.last(&db)
.await?;
}
Implementation:
- File:
toasty-codegen/src/expand/query.rs - Generate
.last()method - Complexity: Low
Testing
Additional Test Coverage
Tests that could be added:
-
Multi-column ordering
- Verify correct ordering with multiple columns
- Test tie-breaking behavior
-
Direct
.limit()method (when implemented)- Non-paginated queries with limit
- Verify correct number of results
-
.last()convenience method (when implemented)- Returns last matching record
- Returns None when no matches
-
Edge cases
- Empty results with pagination
- Single page results (no next/prev cursors)
- Pagination beyond last page
- Large page sizes
- Page size of 1
Database-Specific Considerations
SQL Databases
- MySQL: Uses
LIMIT nfor pagination (keyset approach via WHERE) - PostgreSQL: Uses
LIMIT nfor pagination (keyset approach via WHERE) - SQLite: Uses
LIMIT nfor pagination (keyset approach via WHERE)
All SQL databases use keyset pagination (WHERE clauses with cursors) rather than OFFSET for consistent performance.
NoSQL Databases
- DynamoDB:
- Limited ordering support (only on sort keys)
- Pagination via LastEvaluatedKey
- Cursor-based approach maps well to DynamoDB’s native pagination
- Needs validation and testing
How Keyset Pagination Works
Instead of using OFFSET, Toasty converts cursors to WHERE clauses:
-- Traditional OFFSET (slow for large offsets)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET 10000;
-- Toasty's cursor approach (always fast)
SELECT * FROM posts
WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 10;
This provides:
- Consistent Performance: O(log n) regardless of page number
- Stable Results: New records don’t shift pagination boundaries
- Database Agnostic: Works efficiently on NoSQL databases
- Real-time Friendly: Handles concurrent insertions gracefully
Notes
- Cursors (
stmt::Expr) can be serialized at the application level if needed for web APIs - Pagination requires an explicit ORDER BY clause to ensure consistent results
- Multi-column ordering works today via manual
OrderByconstruction - The
.then_by()convenience method would improve ergonomics but isn’t essential
Query Constraints & Filtering
Overview
This document identifies gaps in Toasty’s query constraint support compared to mature ORMs, and outlines potential additions for building web applications.
Terminology
A “query constraint” refers to any predicate used in the WHERE clause of a query. In Toasty, constraints are built using:
- Generated filter methods (
Model::filter_by_<field>()) for indexed/key fields - Generic
.filter()method acceptingExpr<bool>for arbitrary conditions Model::FIELDS.<field>()paths combined with comparison methods (.eq(),.gt(), etc.)
Core AST Support Without User API
These expression types exist in toasty-core (crates/toasty-core/src/stmt/expr.rs) and have SQL serialization, but lack a typed user-facing API on Path<T> or Expr<T>:
| Expression | Core AST | SQL Serialized | User API | Notes |
|---|---|---|---|---|
| LIKE | ExprPattern::Like | Yes | None | SQL serialization exists |
| Begins With | ExprPattern::BeginsWith | Yes | None | Converted to LIKE 'prefix%' in SQL |
| EXISTS | ExprExists | Yes | None on user API | Used internally by engine |
| COUNT | ExprFunc::Count | Yes | None | Internal use only |
ORM Comparison
The following table compares Toasty’s constraint support against 8 mature ORMs, highlighting missing features:
| Feature | Toasty | Prisma | Drizzle | Django | SQLAlchemy | Diesel | SeaORM | Hibernate | |—|—|—|—|—|—|—|—|—|—| | Set Operations | | | | | | | | | | NOT IN | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Range | | | | | | | | | | BETWEEN | No | Via gt+lt | Yes | Yes | Yes | Yes | Yes | Yes | | String Operations | | | | | | | | | | LIKE | AST only | Via contains | Yes | Yes | Yes | Yes | Yes | Yes | | Contains (substring) | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Starts with | AST only | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Ends with | No | Yes | Manual | Yes | Yes | Manual | Yes | Manual | | Case-insensitive (ILIKE) | No | Yes | Yes | Yes | Yes | Pg only | No | Manual | | Regex | No | No | No | Yes | Yes | No | No | No | | Full-text search | No | Preview | No | Yes (Pg) | Dialect | Crate | No | Extension | | Relation Filtering | | | | | | | | | | Filter by related fields | No | Yes | Via join | Yes | Yes | Via join | Via join | Via join | | Has related (some/none/every) | No | Yes | Via exists | Via exists | Yes | Via exists | Via join | Via exists | | Aggregation | | | | | | | | | | COUNT / SUM / AVG / etc. | No | Limited | Yes | Yes | Yes | Yes | Yes | Yes | | GROUP BY | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | HAVING | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Advanced | | | | | | | | | | Field-to-field comparison | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Arithmetic in queries | No | No | Yes | Yes | Yes | Yes | Yes | Yes | | Raw SQL escape hatch | No | Full query | Inline | Multiple | Inline | Inline | Inline | Native query | | JSON field queries | No | Limited | Via raw | Yes | Yes | Pg | Via raw | No | | CASE / WHEN | No | No | No | Yes | Yes | No | No | Yes | | Dynamic/conditional filters | No | Spread undef | Pass undef | Chain | Chain | BoxableExpr | add_option | Build list |
Potential Future Work
Features with Existing Internal Support
These features have core AST and SQL serialization but need user-facing APIs:
String Pattern Matching
- Core AST:
ExprPattern::BeginsWithandExprPattern::Likeexist with SQL serialization - Needed:
- Add
ExprPattern::EndsWithandExprPattern::Containsto core AST - Add
.contains(),.starts_with(),.ends_with()onPath<String> - Add
.like()for direct pattern matching - Handle LIKE special character escaping (
%,_)
- Add
- Files:
crates/toasty/src/stmt/path.rs,crates/toasty-core/src/stmt/expr.rs - Use case: Search functionality (e.g., search users by name fragment)
NOT IN
- Current:
INexists but no negated form - Needed:
ExprNotInListor negate theInListexpression, plus.not_in_list()user API - Files:
crates/toasty/src/stmt/path.rs,crates/toasty-core/src/stmt/expr.rs - Use case: Exclusion lists (e.g., “exclude these IDs from results”)
Features Needing New Implementation
Case-Insensitive String Matching
- Current: No support at any layer
- Needed: ILIKE support in SQL serialization (PostgreSQL native, LOWER() wrapper for SQLite/MySQL), plus user API
- Design consideration: How to handle cross-database differences (ILIKE is Pg-only, LOWER()+LIKE is universal but slower)
- Reference: Prisma (
mode: 'insensitive'), Django (__iexact,__icontains) - Use case: User-facing search (e.g., email lookup, name search)
BETWEEN / Range Queries
- Current: Users must combine
.ge()and.le()manually - Needed: Syntactic sugar over AND(ge, le), or a dedicated
ExprBetween - File:
crates/toasty/src/stmt/path.rs - Reference: Drizzle (
between()), Django (__range), Diesel (.between()) - Use case: Date ranges, price ranges, numeric filtering
Relation/Association Filtering
- Current: Scoped queries exist but no way to filter a top-level query by related model fields
- Needed: JOIN or EXISTS subquery generation in the engine, plus user API design
- Complexity: High - requires significant engine work
- Reference: Prisma (
some/none/every), Django (__traversal), SQLAlchemy (.any()/.has()) - Use case: Filtering parents by child attributes (e.g., “users who have at least one order over $100”)
Field-to-Field Comparison
- Current:
Path::eq()requiresIntoExpr<T>, which accepts values but should also accept paths - Needed: Ensure
Path<T>implementsIntoExpr<T>and codegen supports cross-field comparisons - Reference: Django (
F()expressions), SQLAlchemy (column comparison) - Use case: Comparing two columns (e.g., “updated_at > created_at”, “balance > minimum_balance”)
Arithmetic Operations in Queries
- Current: No support -
BinaryOponly includes comparison operators (Eq, Ne, Gt, Ge, Lt, Le) - Needed:
- Add arithmetic operators to AST:
Add,Subtract,Multiply,Divide,Modulo - SQL serialization for arithmetic expressions (standard across databases)
- User API to build arithmetic expressions (e.g.,
.add(),.multiply(), operator overloading, or expression builder) - Type handling for arithmetic results (ensure type safety)
- Add arithmetic operators to AST:
- Files:
crates/toasty-core/src/stmt/op_binary.rs,crates/toasty-core/src/stmt/expr.rs,crates/toasty/src/stmt/path.rs - Reference:
- Django:
F('price') * F('quantity') > 100 - SQLAlchemy:
column('price') * column('quantity') > 100 - Diesel:
price.eq(quantity * 2) - Drizzle:
sqlprice * quantity > 100``
- Django:
- Use cases:
- Computed comparisons:
WHERE age <= 2 * years_in_school - Price calculations:
WHERE price * quantity > 1000 - Time differences:
WHERE (end_time - start_time) > 3600 - Percentage calculations:
WHERE (actual / budget) * 100 > 110 - Complex business rules:
WHERE (base_price * (1 - discount_rate)) > minimum_price
- Computed comparisons:
- Design considerations:
- Should arithmetic create new expression types or extend
BinaryOp? - How to handle type coercion (int vs float, time arithmetic)?
- Support for parentheses and operator precedence
- Whether to support on SELECT side (computed columns) or just WHERE clauses initially
- Should arithmetic create new expression types or extend
Aggregate Queries
- Current:
ExprFunc::Countexists internally but is not user-facing - Needed: User-facing API, return type handling, integration with GROUP BY
- Complexity: High - requires significant API design
- Reference: Django’s annotation system, SQLAlchemy’s
func - Use case: Dashboards, analytics, summary views, pagination metadata
GROUP BY / HAVING
- Current: No support at any layer
- Needed: AST additions, SQL generation, engine support, user API
- Complexity: High
- Use case: Aggregate queries, reports, analytics, dashboards
Raw SQL Escape Hatch
- Current: No support
- Needed: Safe API for parameterized raw SQL fragments within typed queries
- Design consideration: Full raw queries vs. raw fragments within typed queries vs. both
- Reference: Drizzle (
sql`...`templates), SQLAlchemy (text()), Diesel (sql()) - Use case: Edge cases that the ORM can’t express
Dynamic / Conditional Query Building
- Current: Users can chain
.filter()calls, but no ergonomic way to skip filters when parameters areNone - Needed: Pattern for optional filters
- Reference: SeaORM (
Condition::add_option()), Prisma (spread undefined), Diesel (BoxableExpression) - Use case: Search forms, filter UIs, API endpoints with optional parameters
Full-Text Search
- Current: No support
- Complexity: High - database-specific implementations (PostgreSQL tsvector, MySQL FULLTEXT, SQLite FTS5)
- Design consideration: May be best as database-specific extensions rather than a unified API
- Use case: Content-heavy applications (blogs, e-commerce, documentation sites)
JSON Field Queries
- Current: No support
- Complexity: High - needs path traversal syntax, type handling, database-specific operators
- Dependency: Depends on JSON/JSONB data type support
- Reference: Django (
field__key__subkey), SQLAlchemy (column['key']) - Use case: Flexible/schemaless data within relational databases
Advanced / Niche Features
Regex Matching
- Use case: Power-user filtering, data validation queries
- Reference: Django (
__regex,__iregex), SQLAlchemy (regexp_match())
Array/Collection Operations
- Use case: PostgreSQL array columns, MongoDB array fields
- Dependency: Requires array type support first
- Reference: Prisma (
has,hasEvery,hasSome), Django (ArrayField lookups)
CASE/WHEN Expressions
- Use case: Conditional logic within queries for complex business rules
- Reference: Django (
When()/Case()), SQLAlchemy (case())
Subquery Comparisons (ALL/ANY/SOME)
- Use case: Advanced filtering like “price > ALL(SELECT price FROM competitors)”
- Reference: Hibernate, SQLAlchemy (
all_(),any_())
IS DISTINCT FROM
- Use case: NULL-safe comparisons without special-casing IS NULL
- Reference: SQLAlchemy (only ORM with native support)
Implementation Considerations
Recommended Approach
Based on the analysis above, the following groupings maximize user value:
Group 1: Expose Existing Internals Items with core AST and SQL serialization that only need user-facing methods:
.not_in_list()onPath<T>(negate existingInList)
Estimated scope: ~50 lines of user-facing API code + integration tests
Group 2: String Operations Partial AST support that needs completion and exposure:
- Add
ExprPattern::EndsWithandExprPattern::Containsto core AST - Add SQL serialization for new pattern variants
- Add
.contains(),.starts_with(),.ends_with()toPath<String> - Handle LIKE special character escaping
Estimated scope: ~200 lines across core + SQL + user API
Group 3: Ergonomic Improvements
- Case-insensitive matching (ILIKE / LOWER() wrapper)
.between()convenience method.like()direct exposure- Conditional/optional filter building helpers
Group 4: Structural Features Requires deeper engine work:
- Relation filtering (JOIN/EXISTS generation)
- Aggregate functions (user-facing COUNT/SUM/etc.)
- GROUP BY / HAVING
- Raw SQL escape hatch
Reference Implementation Goals
A comprehensive query constraint system would allow users to:
- Search strings by substring, prefix, and suffix (case-sensitive and case-insensitive)
- Use NOT IN with literal lists and subqueries
- Filter by related model attributes
- Use at least basic aggregate queries (COUNT)
- Fall back to raw SQL for anything the ORM can’t express
This would put Toasty on par with the filtering capabilities of Diesel and SeaORM, and cover the vast majority of queries needed by typical web applications.
Query Engine Optimization Roadmap
Overview
The query engine currently performs simplification as a single VisitMut pass that
applies local rewrite rules bottom-up. This works well for straightforward
transformations (constant folding, tuple decomposition, association rewriting),
but it has structural limitations as the optimizer takes on more complex work.
This document tracks improvements to the query engine’s optimization infrastructure, focusing on predicate simplification and the compilation pipeline.
Current State
Simplification Pass
The simplifier (engine/simplify.rs) implements VisitMut and applies rules in
a single bottom-up traversal. Each node is visited once, simplified, and then
its parent is simplified with the updated children.
What works well:
- Local rewrites: constant folding, boolean identity, tuple decomposition
- Association rewriting and subquery lifting
- Match elimination (distributing binary ops over match arms)
Structural limitations:
- Rules fire during the walk, so ordering matters. A rule that produces expressions consumable by another rule only works if the consumer fires later in the same walk or the walk is re-run.
- Global analysis (e.g., detecting contradictions across an entire AND conjunction) must be done inline during the walk, mixing local and global concerns.
- Expensive analyses run on every AND node encountered, even when only a small fraction would benefit.
Contradicting Equality Detection
The simplifier currently detects a = c1 AND a = c2 (where c1 != c2) inline in
simplify_expr_and. This is O(n^2) in the number of equality predicates within a
single AND. While operand lists are typically small, the analysis runs on every
AND node during the walk, including intermediate nodes that are about to be
restructured by other rules.
Planned Improvements
Phase 1: Post-Lowering Optimization Pass
Move expensive predicate analysis out of the per-node simplifier and into a dedicated pass that runs once after lowering, against the HIR representation. At this point the statement is fully resolved to table-level expressions and the predicate tree is stable — no more association rewrites or field resolution changes will restructure it.
This pass would handle:
- Contradicting equality pruning
- Redundant predicate elimination
- Tautology detection
ExprLetinlining (currently done at the end oflower_returning; should move here so all post-lowering expression rewrites live in one place)
Why after lowering: Before lowering, predicates reference model-level fields and contain relationship navigation that the lowering phase rewrites. Running global analysis before this rewriting is wasted work — the predicate tree will change. After lowering, the predicates are in their final structural form (column references, subqueries), so analysis results are stable.
Phase 2: Equivalence Classes
Build equivalence classes from equality predicates before running constraint
analysis. When the optimizer sees a = b AND b = c, it should know that a,
b, and c are all equivalent, enabling:
- Transitive contradiction detection:
a = b AND b = 5 AND a = 7is a contradiction (a must be both 5 and 7), even though no single pair of predicates directly conflicts. - Predicate implication:
a = 5 AND a > 3— the second predicate is implied and can be dropped. - Join predicate inference: If
a = band a filter constrainsa, the same constraint applies tob.
Equivalence classes are a standard technique in query optimizers. The idea is to union-find expressions that are constrained to be equal, then check each class for conflicting constant bindings or range constraints.
Phase 3: Structured Constraint Analysis
Replace ad-hoc pairwise comparisons with a more structured representation of constraints. For each expression (or equivalence class), maintain:
- Constant binding: The expression must equal a specific value
- Range bounds: Upper/lower bounds from inequality predicates
- NOT-equal set: Values the expression cannot be (from
!=predicates)
With this structure, contradiction detection becomes a property check rather than a search: an expression with two different constant bindings, or a constant binding outside its range bounds, is immediately contradictory.
Predicate Normalization (Not Full DNF)
Full conversion to disjunctive normal form (DNF) — where the entire predicate becomes an OR of ANDs — risks exponential blowup. A predicate with N AND-connected clauses of M OR-options each expands to M^N terms. This makes full DNF impractical as a general-purpose transformation.
Instead, apply targeted normalization:
- Flatten associative operators: Merge nested
AND(AND(...), ...)andOR(OR(...), ...)into flat lists (already done). - Canonicalize comparison direction: Ensure constants are on the right side of comparisons (already done).
- Limited distribution: Distribute AND over OR only in specific cases where it enables index utilization or constraint extraction, with a size budget to prevent blowup.
- OR-of-equalities to IN-list: Convert
a = 1 OR a = 2 OR a = 3toa IN (1, 2, 3)for more efficient execution.
The goal is to normalize enough for the constraint analysis to work without paying the exponential cost of full DNF.
Design Principles
- Run expensive analysis once, not per-node. The current simplifier intermixes cheap local rewrites with expensive global analysis. Separate them.
- Analyze after the predicate tree is stable. Post-lowering is the right point — predicates are resolved to columns and won’t be restructured.
- Build structure, then query it. Constructing equivalence classes and constraint summaries up front makes individual checks cheap.
- Budget-limited transformations. Any rewrite that can expand expression size (distribution, case expansion) must have a size limit.