Toasty Query Engine
This document provides a high-level overview of the Toasty query execution engine for developers working on engine internals. It describes the multi-phase pipeline that transforms user queries into database operations.
Overview
The Toasty engine is a multi-database query compiler and runtime that executes ORM operations across SQL and NoSQL databases. It transforms a user’s query (represented as a Statement AST) into a sequence of executable actions through multiple compilation phases.
Execution Model
The final output is a mini program executed by an interpreter. Think of it like a small virtual machine or bytecode interpreter, though there is no control flow (yet):
- Instructions (Actions): Operations like “execute this SQL”, “filter these results”, “merge child records into parents”
- Variables: Storage slots, or registers, that hold intermediate results between instructions
- Linear Execution: Instructions run in sequence (no control flow - no branches or loops, yet). Eventually, the interpreter will be smart about concurrency and execute independent operations in parallel when possible.
- Interpreter: The engine executor reads each instruction, fetches inputs from variables, performs the operation, and stores outputs back to variables
For example, loading users with their todos:
SELECT users.id, users.name, (
SELECT todos.id, todos.title
FROM todos
WHERE todos.user_id = users.id
) FROM users WHERE ...
compiles to a program like:
$0 = ExecSQL("SELECT * FROM users WHERE ...")
$1 = ExecSQL("SELECT * FROM todos WHERE user_id IN ...")
$2 = NestedMerge($0, $1, by: user_id)
return $2
The compilation pipeline below transforms user queries into this instruction/variable representation. Each phase brings the query closer to this final executable form.
Compilation Pipeline
User Query (Statement AST)
↓
[Verification] - Validate statement structure (debug builds only)
↓
[Simplification] - Normalize and optimize the statement AST
↓
[Lowering] - Convert to HIR for dependency analysis
↓
[Planning] - Build MIR operation graph
↓
[Execution Planning] - Convert to action sequence with variables
↓
[Execution] - Run actions against database driver
↓
Result Stream
Phase 1: Simplification
Location: engine/simplify.rs
The simplification phase normalizes and optimizes the statement AST before planning.
Key Transformations
- Association Rewriting: Converts relationship navigation (e.g.,
user.todos()) into explicit subqueries with foreign key filters - Subquery Lifting: Transforms
IN (SELECT ...)expressions into more efficient join-like operations - Expression Normalization: Simplifies complex expressions (e.g., flattening nested ANDs/ORs, constant folding)
- Path Expression Rewriting: Resolves field paths and relationship traversals into explicit column references
- Empty Query Detection: Identifies queries that will return no results
Example: Association Simplification
#![allow(unused)]
fn main() {
// user.todos().delete() generates:
Delete {
from: Todo,
via: User::todos, // relationship traversal
...
}
// After simplification:
Delete {
from: Todo,
filter: todo.user_id IN (SELECT id FROM users WHERE ...)
}
}
Converting relationship navigation into explicit filters early means downstream phases only need to handle standard query patterns with filters and subqueries - no special-case logic for each relationship type.
Phase 2: Lowering
Location: engine/lower.rs
Lowering converts a simplified statement into HIR (High-level Intermediate Representation) - a collection of related statements with tracked dependencies.
Toasty tries to maximize what the target database can handle natively, only decomposing queries when necessary. For example, a query like User::find_by_name("John").todos().all() contains a subquery. SQL databases can execute this as SELECT * FROM todos WHERE user_id IN (SELECT id FROM users WHERE name = 'John'). DynamoDB cannot handle subqueries, so lowering splits this into two statements: first fetch user IDs, then query todos with those IDs.
The HIR tracks a dependency graph between statements - which statements depend on results from others, and which columns flow between them. This graph can contain cycles when preloading associations. For example:
SELECT users.id, users.name, (
SELECT todos.id, todos.title
FROM todos
WHERE todos.user_id = users.id
) FROM users WHERE ...
The users query must execute first to provide IDs for the todos subquery, but the todos results must be merged back into the user records. This creates a cycle: users → todos → users.
This lowering phase handles:
- Statement Decomposition: Breaking queries into sub-statements when the database can’t handle them directly
- Dependency Tracking: Which statements must execute before others
- Argument Extraction: Identifying values passed between statements (e.g., a loaded model’s ID used in a child query’s filter)
- Relationship Handling: Processing relationship loads and nested queries
Lowering Algorithm
Lowering transforms model-level statements to table-level statements through a visitor pattern that rewrites each part of the statement AST:
- Table Resolution:
InsertTarget::Model,UpdateTarget::Model, etc. become their corresponding table references - Returning Clause Transformation:
Returning::Modelis replaced withReturning::Exprcontaining the expanded column expressions - Field Reference Resolution: Model field references are converted to table column references
- Include Expansion: Association includes become subqueries in the returning clause
The TableToModel mapping (built during schema construction) drives the transformation. It contains an expression for each model field that maps to its corresponding table column(s). This supports more than a 1-1 mapping—a model field can be derived from multiple columns or a column can map to multiple fields. Association fields are initialized to Null in this mapping.
When lowering encounters a Returning::Model { include } clause:
- Call
table_to_model.lower_returning_model()to get the base column expressions - For each path in the include list, call
build_include_subquery()to generate a subquery that selects the associated records - Replace the
Nullplaceholder in the returning expression with the generated subquery
Lowering Examples
Example 1: Simple query
Given a model with a renamed column:
#![allow(unused)]
fn main() {
#[derive(Model)]
struct User {
#[key] #[auto] id: u64,
#[column(name = "first_and_last_name")]
name: String,
email: String,
}
}
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
// Note: At model-level, no specific fields are selected
// After lowering
SELECT id, first_and_last_name, email FROM users WHERE id = ?
}
Example 2: Query with association
#![allow(unused)]
fn main() {
// Before lowering (toasty_core::stmt::Statement)
SELECT MODEL FROM User WHERE id = ?
INCLUDE todos
// After lowering
SELECT id, first_and_last_name, email, (
SELECT id, title, user_id FROM todos WHERE todos.user_id = users.id
) FROM users WHERE id = ?
}
Phase 3: Planning
Location: engine/plan.rs
Planning converts HIR into MIR (Middle-level Intermediate Representation) - a directed acyclic graph of operations, both database queries and in-memory transformations. Edges represent data dependencies: an operation cannot execute until all operations it depends on have completed and produced their results.
Since the HIR graph can contain cycles, planning must break them to produce a DAG. This is done by introducing intermediate operations that batch-load data and merge results (e.g., NestedMerge).
Operation Types
The MIR supports various operation types (see engine/mir.rs for details):
SQL operations:
ExecStatement- Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)ReadModifyWrite- Optimistic locking (read, modify, conditional write). Exists as a separate operation because the read result must be processed in-memory to compute the write, whichExecStatementcannot express.
Key-value operations (NoSQL):
GetByKey,DeleteByKey,UpdateByKey- Direct key accessQueryPk,FindPkByIndex- Key lookups via queries or indexes
In-memory operations:
Filter,Project- Transform and filter resultsNestedMerge- Merge child records into parent recordsConst- Constant values
Phase 4: Execution Planning
Location: engine/plan/execution.rs
Execution planning converts the MIR logical plan into a concrete sequence of actions that can be executed. This phase:
- Assigns variable slots for storing intermediate results
- Converts each MIR operation into an execution action
- Maintains topological ordering to ensure dependencies execute first
Action Types
Actions mirror MIR operations but include concrete variable bindings:
SQL actions:
ExecStatement: Execute a SQL query (SELECT, INSERT, UPDATE, DELETE)ReadModifyWrite: Optimistic locking (read, modify, conditional write)
Key-value actions (NoSQL):
GetByKey: Batch fetch by primary keyDeleteByKey: Delete records by primary keyUpdateByKey: Update records by primary keyQueryPk: Query primary keysFindPkByIndex: Find primary keys via secondary index
In-memory actions:
Filter: Apply in-memory filter to a variable’s dataProject: Transform recordsNestedMerge: Merge child records into parent recordsSetVar: Set a variable to a constant value
Phase 5: Execution
Location: engine/exec.rs
The execution phase is the interpreter that runs the compiled program. It iterates through actions, reading inputs from variables, performing operations, and storing outputs back to variables.
Execution Loop
The interpreter follows a simple pattern:
- Initialize variable storage
- For each action in sequence:
- Load input data from variables
- Perform the operation (database query or in-memory transform)
- Store the result in the output variable
- Return to the user the result from the final variable (the last action’s output)
Variable Lifetime
The engine tracks how many times each variable is referenced by downstream actions. A variable may be used by multiple actions (e.g., the same user records merged with both todos and comments). When the last action that needs a variable completes, the variable’s value is dropped to free memory.
Driver Interaction
The execution phase is the only part of the engine that communicates with database drivers. The driver interface is intentionally simple: a single exec() method that accepts an Operation enum. This enum includes variants for both SQL operations (QuerySql, Insert) and key-value operations (GetByKey, QueryPk, FindPkByIndex, DeleteByKey, UpdateByKey).
Each driver implements whichever operations it supports. SQL drivers handle QuerySql natively while key-value drivers handle GetByKey, QueryPk, etc. The planner uses driver.capability() to determine which operations to generate for each database type.