Introduction
formawasm is a backend compiler that turns a formalang Intermediate Representation (IR) module into a WebAssembly component — a .wasm binary that any standards-compliant runtime can execute.
formalang frontend ──► IrModule ──► formawasm ──► .wasm component ──► host runtime
formawasm is a backend: it doesn't parse .fv source files itself, and it doesn't run the resulting wasm. Both are jobs for other libraries (formalang for parsing, wasmtime / wasmi / a browser engine for execution). formawasm only emits bytes.
Every public formalang declaration becomes a typed entry point in the component's interface. The boundary is described in WIT (Wasm Interface Types), the small Interface Definition Language of the Component Model. formawasm generates the WIT file automatically from the public surface of each IR module — the host never hand-writes WIT.
Two ways to read these docs
The book is split into two halves; pick the entry point that matches what you want to do.
- Embedding formawasm in your application → start with Quickstart, then Using the Library and Hosting a Component. The Boundary Policy and Type Mapping chapters explain what formalang values look like when they cross into your host code.
- Contributing to formawasm or extending the backend → start with Architecture, then Crate Layout and Lowering. Extending the Backend covers adding new IR variants or runtime helpers; Testing and Contributing describe the project's quality bar.
Status
Phases 1 through 5 are closed; the backend produces a Component-Model artifact for every milestone. See Feature Coverage for the per-IR-variant breakdown and the project's CHANGELOG.md for a phase-by-phase history.
Quickstart
The fastest path from a formalang source file to a runnable WebAssembly component is the bundled formawasm binary.
Compile a source file
Save this as id.fv:
#![allow(unused)] fn main() { pub fn id(x: I32) -> I32 { x } }
Build the CLI and run it:
cargo build --release --bin formawasm
./target/release/formawasm id.fv
# wrote id.wasm (… bytes) from id.fv
By default the output filename is the input with .fv swapped for .wasm. Use -o <path> to override:
formawasm id.fv -o build/id.component.wasm
If the formalang frontend rejects the source, the diagnostic is printed in the upstream's standard format and the CLI exits non-zero. Backend errors print their typed Display form and also exit non-zero.
Inspect the WIT interface
The emitted .wasm is a Component-Model artifact. The standalone wasm-tools CLI prints the WIT interface formawasm generated:
wasm-tools component wit id.wasm
For the id example above, you should see:
package formawasm:generated;
world component {
export id: func(x: s32) -> s32;
}
Every pub formalang function appears as a WIT export; every extern function appears as an import. Public structs and enums become record and variant declarations on a types interface — see Type Mapping.
Run the component
formawasm doesn't ship a runtime. The standard host-side library is wasmtime; the Hosting a Component chapter shows the full Rust-side wiring for instantiating a component, calling exports, and supplying host-provided imports.
What's next
- Embed the backend directly in your build instead of shelling out to
formawasm→ Using the Library. - Understand exactly which formalang types and constructs the backend supports → Feature Coverage.
- See the rules that govern what can cross the WIT boundary → Boundary Policy.
Examples
The repo's examples/ directory has runnable .fv files plus a walkthrough that uses the wasmtime CLI — no Rust host code required.
Install the toolchain
# formawasm CLI
cargo install formawasm
# wasmtime CLI
curl https://wasmtime.dev/install.sh -sSf | bash
# Or: brew install wasmtime
Compile and run
The repo's numbered .fv files each declare one or more pub fn
exports plus a pub fn run_checks() that asserts the expected
outputs. Compile any of them with the CLI:
formawasm examples/02_generics_pair_result.fv
wasmtime run --invoke 'pair-sum()' examples/02_generics_pair_result.wasm
# 3
Examples like 12_numeric_primitives.fv carry several primitive-typed
exports in one component:
formawasm examples/12_numeric_primitives.fv
wasmtime run --invoke 'i32-add(7, 35)' examples/12_numeric_primitives.wasm # 42
wasmtime run --invoke 'i64-mul(6, 7)' examples/12_numeric_primitives.wasm # 42
To exercise the embedded assert(...) calls in each example's
run-checks export, run the test harness — it wires assert to
a host function that traps on false, so passing means every
embedded assertion held:
cargo test --test examples
Limits of wasmtime --invoke
--invoke is built for primitive parameter and return types. Components whose signatures involve string, list<T>, record, or variant need a richer host:
- A Rust wrapper built with
wasmtime::component::bindgen!— see Hosting a Component. jco— JavaScript host bindings, runs in Node.js or the browser.wasmtime serve— HTTP-world components without per-feature glue.
The fully-walked-through commands and notes live in examples/README.md in the repo.
Using the Library
formawasm exposes a single backend type — WasmBackend — that implements formalang's Backend trait. The same backend is what formawasm CLI drives internally; using it from your own crate gives you control over which IR passes run, how diagnostics are reported, and where the bytes go.
Add the dependency
In your Cargo.toml:
[dependencies]
formalang = "0.0.5-beta"
formawasm = "0.0.1-beta"
formawasm is pre-1.0; pinning to an exact patch is recommended until a 0.1.0 line lands.
End-to-end example
#![allow(unused)] fn main() { use formalang::{ FileSystemResolver, Pipeline, compile_to_ir_with_resolver, ir::{ClosureConversionPass, DeadCodeEliminationPass, MonomorphisePass, ResolveReferencesPass}, }; use formawasm::WasmBackend; use std::path::PathBuf; fn build_component(source_path: &str) -> Result<Vec<u8>, Box<dyn std::error::Error>> { let source = std::fs::read_to_string(source_path)?; // Resolve `use foo::bar` against the source file's parent directory. let resolver = FileSystemResolver::new( PathBuf::from(source_path).parent().unwrap_or(".".as_ref()).to_path_buf(), ); let module = compile_to_ir_with_resolver(&source, resolver) .map_err(|errors| format!("{} compile errors", errors.len()))?; // Standard codegen pipeline. Order matters: monomorphise specializes // generics, resolve-references stamps typed IDs, closure-conversion // lifts every closure to a top-level function, DCE strips dead code. let mut pipeline = Pipeline::new() .pass(MonomorphisePass::default()) .pass(ResolveReferencesPass::new()) .pass(ClosureConversionPass::new()) .pass(DeadCodeEliminationPass::new()); let bytes = pipeline.emit(module, &WasmBackend::new())?; Ok(bytes) } }
The four passes shown are the canonical pre-codegen sequence: skipping any of them violates an invariant the backend relies on (see Architecture). The CLI runs the same sequence.
The Backend trait
WasmBackend implements [formalang::pipeline::Backend]:
#![allow(unused)] fn main() { pub trait Backend { type Output; type Error; fn generate(&self, module: &IrModule) -> Result<Self::Output, Self::Error>; } }
For WasmBackend:
Output = Vec<u8>— the wrapped component bytes.Error = WasmBackendError— a typed enum covering preflight failures, lowering errors, WIT emission, component wrapping, and (when enabled) the optionalwasm-optand validation steps.
You can also call WasmBackend::new().generate(&module) directly if you've built the IR by hand or don't need the pass infrastructure.
Optional steps
WasmBackend is configured with a builder-style API:
#![allow(unused)] fn main() { use formawasm::WasmBackend; // Re-validate the wrapped component bytes through `wasmparser` // before returning. Off by default; surfaces backend bugs as // `WasmBackendError::Validation` instead of as runtime failures // inside the embedding host. let backend = WasmBackend::new().with_validation(); }
Build-time options live behind cargo features — see Cargo Features.
Observability
The backend is instrumented with tracing. The default no-subscriber path is essentially free; install a subscriber if you want to see per-stage timings and byte counts:
#![allow(unused)] fn main() { use tracing_subscriber::EnvFilter; tracing_subscriber::fmt() .with_env_filter(EnvFilter::from_default_env()) .init(); }
Then run with RUST_LOG=formawasm=debug to see entries for preflight, survey, lower_module, emit_wit, and wrap_component, with byte sizes attached to each stage.
Re-exports
formawasm re-exports the formalang IR types it consumes, so callers don't need a separate formalang dependency for the type surface:
#![allow(unused)] fn main() { pub use formalang::ir::{IrModule, IrFunction, ResolvedType, /* … */}; pub use formalang::pipeline::{Backend, Pipeline, IrPass}; }
You only need a direct formalang dependency if you're calling the parser (compile_to_ir, compile_to_ir_with_resolver) or constructing IR via the upstream constructors.
Hosting a Component
Once formawasm has emitted a .wasm component, you need a runtime to execute it. This chapter walks through the wasmtime Rust API end-to-end; the same component can also be loaded by browser-based runtimes (jco), wasmi, or any other Component-Model-compliant engine.
Minimal: call an exported function
Given an id.wasm from the Quickstart:
use wasmtime::{Config, Engine, Store}; use wasmtime::component::{Component, Linker}; fn main() -> Result<(), Box<dyn std::error::Error>> { let mut config = Config::new(); config.wasm_component_model(true); // required let engine = Engine::new(&config)?; let bytes = std::fs::read("id.wasm")?; let component = Component::from_binary(&engine, &bytes)?; let linker = Linker::<()>::new(&engine); let mut store = Store::new(&engine, ()); let instance = linker.instantiate(&mut store, &component)?; let id = instance.get_typed_func::<(i32,), (i32,)>(&mut store, "id")?; let (got,) = id.call(&mut store, (42,))?; assert_eq!(got, 42); Ok(()) }
A few notes:
wasm_component_model(true)is required — the defaultwasmtime::Engineonly loads core wasm modules.- Function names cross the boundary in kebab-case. A formalang
pub fn call_host(...)iscall-hostat the WIT layer andcall-hostinget_typed_func. - Argument and return types are tuples. A function returning a single value still surfaces as
(T,).
Supplying host imports
Any formalang extern fn becomes a wasm import that the host must provide before instantiation:
#![allow(unused)] fn main() { use wasmtime::component::Linker; let mut linker = Linker::<()>::new(&engine); // World-level imports show up at the linker's root namespace // under their kebab-case WIT name. linker .root() .func_wrap("host-double", |_store, (n,): (i32,)| Ok((2_i32 * n,)))?; let instance = linker.instantiate(&mut store, &component)?; let call_host = instance.get_typed_func::<(i32,), (i32,)>(&mut store, "call-host")?; let (got,) = call_host.call(&mut store, (21,))?; assert_eq!(got, 42); }
If the host doesn't supply every import the component declares, linker.instantiate(...) returns an error.
Strings, lists, and records
For non-primitive boundary types, prefer wasmtime::component::bindgen! to generate strongly-typed Rust wrappers from the WIT file. The dynamic get_typed_func API works for primitives, but bindgen! handles records, variants, lists, and options without you spelling out the canonical-ABI layout.
#![allow(unused)] fn main() { wasmtime::component::bindgen!({ path: "wit/component.wit", world: "component", }); }
Then call exports through the generated trait surface with native Rust types.
Picking a runtime
| Runtime | Strengths | Where it's at home |
|---|---|---|
wasmtime | First-class Component Model, JIT + AOT | Server-side, CLIs, embedded Rust applications |
wasmi | Pure-Rust interpreter, no_std | Resource-constrained or sandboxing-sensitive embeddings |
jco | JavaScript/TypeScript host bindings | Browser, Node.js |
Browsers (with js-component-tools) | Native execution in the page | Web frontends |
formawasm-emitted components are runtime-agnostic; the choice is yours and depends on where the component runs.
Zero-export components
A formalang module without any pub fn (or with only pub struct / pub enum declarations) emits a valid component with no exports. It's still loadable, just not callable from the host. Useful for type-only modules — see the WIT examples in Type Mapping.
Boundary Policy
A formalang module compiled by formawasm has two layers:
- Inside the component: the full formalang language is supported — closures, generic instantiations (after monomorphisation), all aggregate kinds, virtual dispatch, the works.
- At the public boundary (
pub fn,extern fn,pub struct,pub enumsignatures): a strict subset, defined by what WIT can express.
This chapter is about that boundary. The full type-by-type table lives in Type Mapping; this page covers the rules.
What crosses the boundary
| Category | Allowed |
|---|---|
| Primitives | I32, I64, F32, F64, Boolean, String, Path, Regex |
| Containers | Optional<T>, Array<T>, Dictionary<K, V> (over boundary-allowed K / V) |
| Aggregates | IrStruct → record, IrEnum → variant, named tuples → record { name: T, … } |
| Functions | pub fn → component export, extern fn → component import |
A few specifics worth highlighting:
- Multi-field variant payloads lower as positional
tuple<T0, T1, …>arms. Field names don't survive the boundary, but the layout planner lays them out in declaration order so the index→field mapping stays stable. - Named tuples map to
record. formalang tuples carry field names; formawasm deliberately does not use WIT's positionaltuplefor top-level tuples — that would lose the names. PathandRegexare represented as WITstringat the boundary; their identity is preserved internally inside the component.
Rejected at pre-flight
The pre-flight pass refuses to lower a module that contains any of:
- Closure-typed values in public signatures. Closures live entirely inside a component; they can't cross the WIT boundary because there is no canonical-ABI representation for "function pointer plus environment".
- Generic traits. Matches the existing
MonomorphisePassconstraint upstream — every trait must be specialized before it reaches the backend. - Unresolved type parameters (
ResolvedType::TypeParam).MonomorphisePassis responsible for stamping these out; if any survive into the backend it's an upstream bug. ResolvedType::Error. Sentinel value — its presence indicates a frontend bug.
Pre-flight failures surface as WasmBackendError::Preflight(_) with a typed PreflightError payload pointing at the offending IR node.
Why these rules
The split between "inside the component" and "across the boundary" comes from the Component Model. Core wasm only knows i32 / i64 / f32 / f64 and linear memory; the Component Model layers a typed ABI on top, and WIT is how that ABI is described. Anything WIT can't represent — closures, unresolved generics, language-internal placeholders — can't cross.
Inside, none of those restrictions apply. The backend has the full power of core wasm available: linear memory, tables, multi-value returns, indirect calls, custom helpers. Closures are lowered through funcref tables, virtual dispatch through per-trait vtables, strings and dictionaries through bump-allocated layouts.
What this means for your API design
When you decide what to mark pub, think about the boundary:
- A pure-data record or enum crosses cleanly —
pub struct Point { x: I32, y: I32 }is fine. - An enum with a closure-typed payload does not cross — even if every other variant is plain. Move the closure-bearing variant to a private enum, or accept that the type stays inside.
- A function that takes or returns a closure does not cross. Closures are how you compose internal logic, not how you talk to the host.
The internal-vs-public distinction maps onto exactly the same distinction WIT makes between world exports and interface items not surfaced into the world. formawasm enforces it at pub-time so you can't accidentally write a non-portable signature.
Type Mapping
Every formalang type that crosses the WIT boundary maps to a WIT shape. The full table is below; for the rules governing what may cross, see Boundary Policy.
Internal types (closures, ranges, the bump-allocator's free pointer) live entirely inside the core module and never appear in the WIT file.
Primitives
| formalang | WIT |
|---|---|
I32 / I64 | s32 / s64 |
U32 / U64 | u32 / u64 |
F32 / F64 | f32 / f64 |
Boolean | bool |
String, Path, Regex | string |
Containers
| formalang | WIT |
|---|---|
Optional<T> | option<T> |
Array<T> | list<T> |
Dictionary<K, V> | list<tuple<K, V>> |
named tuple (x: I32, y: I32) | record { x: s32, y: s32 } |
Aggregates
| formalang | WIT |
|---|---|
IrStruct | record |
IrEnum (unit arm) | variant arm with no payload |
IrEnum (single-payload arm) | variant arm with one payload type |
IrEnum (multi-field payload) | variant arm carrying tuple<T0, T1, …> |
Examples
A function with primitives only:
#![allow(unused)] fn main() { pub fn id(x: I32) -> I32 { x } }
emits:
package formawasm:generated;
world component {
export id: func(x: s32) -> s32;
}
A pub struct:
#![allow(unused)] fn main() { pub struct Point { x: I32, y: I32 } }
emits:
package formawasm:generated;
interface types {
record point {
x: s32,
y: s32,
}
}
world component {
use types.{point};
}
An enum with mixed arms:
#![allow(unused)] fn main() { pub enum Action { reset add(value: I32) replace(x: I32, y: I32) } }
emits:
package formawasm:generated;
interface types {
variant action {
reset,
add(s32),
replace(tuple<s32, s32>),
}
}
world component {
use types.{action};
}
Identifier conversion
formalang identifiers use snake_case or camelCase; WIT identifiers are required to be kebab-case. The backend converts every identifier crossing the boundary:
call_host→call-hostPoint→pointMyEnum→my-enum
This applies to function names, type names, record field names, and variant arm names. Inside a component, the original identifier is preserved (and visible in the wasm name custom section for debugger tooling); only the boundary view is kebab-cased.
Internal-only types
These never appear in WIT and are documented here only so you know they exist:
| Type | Internal representation |
|---|---|
Range<T> | { start, end } pair in linear memory |
Closure | After closure-conversion: (funcref, env_ptr) pair |
| Anonymous tuple | Anonymous record laid out by the layout planner |
| Vtable | Flat array of funcref-table indices, one per trait method |
If you mark a function pub whose signature mentions any of these as a top-level type, pre-flight rejects it. (Some — like anonymous structs nested inside a pub record's fields — flow through transparently because the layout planner handles them.)
Feature Coverage
Every formalang IR construct maps to a compile phase. The tables below record exactly what's lowered today and how. Inside a module, every feature is supported; the Boundary Policy restrictions apply only to types appearing in pub signatures.
IrExpr variants
| Variant | Phase | Notes |
|---|---|---|
Literal | 1a | Numeric, boolean literals; string literals in 2 |
Reference (dotted path) | 1a | Resolved to local, global, or function reference |
LetRef | 1a | Local-variable read |
SelfFieldRef | 1b | self.field inside methods |
FieldAccess | 1b | obj.field |
BinaryOp (numeric / boolean / comparison) | 1a | Direct Wasm instructions |
BinaryOp::Add on String | 2 | String concatenation runtime helper |
BinaryOp::Range | 1c | Lowers to {start, end} pair in linear memory |
BinaryOp::Eq/Ne on String | 2 | String equality runtime helper |
UnaryOp | 1a | Neg, Not |
If | 1a | Maps to Wasm if/else |
Block | 1a | Sequence of statements + result expression |
For (over Array) | 1c | loop + br_if with index counter |
For (over Range) | 1c (I32 / I64) + post-Phase-4 housekeeping (F32 / F64) | Same lowering for all four numeric primitives. Float ranges advance by 1.0 per iteration; the output buffer is sized to ceil(end - start) so fractional gaps don't overrun. |
Match | 1b | br_table on enum tag, payload extraction by offset |
FunctionCall (direct) | 1a | Wasm call instruction |
MethodCall (Static dispatch) | 1b | Resolved to direct call at compile time |
MethodCall (Virtual dispatch) | 3 | Vtable lookup + call_indirect |
StructInst | 1b | Bump-allocate, write fields, return pointer |
EnumInst | 1b | Allocate tag + payload, return pointer |
Array (literal) | 1c | Allocate {ptr, len, cap} header + element buffer |
Tuple (literal) | 1b | Treated as anonymous struct |
DictLiteral | 2 | Sorted-pairs array v1 |
DictAccess | 2 | Lookup runtime helper |
Closure | — | Eliminated upstream by closure-conversion IrPass |
ClosureRef { funcref, env_struct } | 1b | Synthesised by closure conversion; lowered as funcref index + env pointer |
ResolvedType variants
| Type | Phase | Notes |
|---|---|---|
Primitive(I32/I64/F32/F64) | 1a | Native Wasm valtypes |
Primitive(Boolean) | 1a | Lowered as i32 (0 or 1) |
Primitive(Never) | 1a | Zero-sized; functions returning Never emit unreachable |
Primitive(String) | 2 | Linear-memory {ptr, len} |
Primitive(Path) | 2 | Same layout as String; identity preserved internally |
Primitive(Regex) | 2 | Same layout as String; identity preserved internally |
Struct | 1b | Heap-allocated record |
Enum | 1b | Tag (i32) + padded payload |
Tuple | 1b | Same layout as anonymous Struct |
Array<T> | 1c | {ptr, len, cap} |
Range<T> | 1c | {start, end} over numeric T |
Optional<T> | 2 | Tag + payload, or null-pointer trick for reference types |
Dictionary<K, V> | 2 | Sorted-pairs array v1 |
Closure { param_tys, return_ty } | 1b | Funcref index + env pointer; intramodule only |
External { module_path, name, … } | 5+ | Upstream-blocked. compile_to_ir_with_resolver returns one IrModule and discards imported-module IRs; backend has nothing to resolve External against. |
Generic { base, args } | — | Eliminated by upstream MonomorphisePass |
TypeParam | — | Pre-flight rejection |
Trait | — | Banned as a value at semantic time upstream |
Error | — | Pre-flight rejection (frontend invariant violation) |
ParamConvention variants
| Convention | Phase | Lowering |
|---|---|---|
Let (default) | 1a | Pass by value (or by pointer for aggregates) |
Mut | 1b | Pass pointer into caller's frame; callee mutates in place |
Sink | 1b | Move semantics: caller relinquishes the buffer; callee owns it |
DispatchKind variants
| Dispatch | Phase | Lowering |
|---|---|---|
Static { impl_id } | 1b | Direct call to a known function index |
Virtual { trait_id, method_name } | 3 | Per-trait vtable in linear memory; call_indirect |
Patterns
formalang's IR flattens patterns to variant-name + simple bindings (no nested patterns, guards, or-patterns, or range patterns at the IR level). The wasm lowering handles this directly via br_table on the variant tag plus offset-based payload extraction. BindingPattern destructuring in let bindings is also flattened upstream into simple Let nodes.
Operators
BinaryOp: Add, Sub, Mul, Div, Mod, Lt, Gt, Le, Ge, Eq, Ne, And, Or, Range. UnaryOp: Neg, Not.
Operator lowering is type-dispatched — BinaryOp::Add on I32 lowers to a single Wasm instruction, but on String it calls the __str_concat runtime helper. The dispatch table lives in src/lower/binary_op.rs.
Phase milestones
The phases referenced above correspond to the project's milestone tests. Each milestone hand-builds an IrModule exercising the phase's features and runs it under wasmtime end-to-end:
| Phase | Milestone test | What it proves |
|---|---|---|
| 1a | tests/backend_smoke.rs | Recursive fibonacci runs |
| 1b | tests/milestone_1b.rs | Counter struct + Action enum + methods (mut self, Match) |
| 1c | tests/sieve.rs | Sieve of Eratosthenes returns list<bool> |
| 2 | tests/milestone_2.rs | greet(role: String) -> String exercising Dictionary<String,String>, I32? Some-wrap, string concatenation |
| 3 | tests/milestone_3.rs | Trait Greet dispatches across two impls |
| 4 | tests/milestone_4.rs | Host-provided host_double extern called from call_host |
The full phase-by-phase history is in CHANGELOG.md.
Cargo Features
formawasm has two optional cargo features. Both are off by default; enabling them only adds dependencies (and compile cost) for callers that opt in.
wasm-opt
Runs binaryen's wasm-opt over the emitted core module before component wrapping.
Enable in your Cargo.toml:
[dependencies]
formawasm = { version = "0.0.1-beta", features = ["wasm-opt"] }
Or, when compiling formawasm directly:
cargo build --features wasm-opt
make test-wasm-opt
When the feature is on, WasmBackend::generate invokes the optimizer at -Os (size-leaning) with Feature::All enabled, so multi-table, reference types, and bulk-memory survive the pass. Output is typically 30–50% smaller than the unoptimized core module on real workloads.
The pass is also exposed as a free function — formawasm::optimize_core_module(&bytes) — for callers that obtain core wasm by other means and want to run the same post-pass without going through WasmBackend.
The wasm-opt crate compiles binaryen from source on first build, which adds several minutes to a clean compile. Production users typically gate this behind a release-only profile.
dwarf
Attaches DWARF debug sections (.debug_info, .debug_abbrev, .debug_line, .debug_str) to the emitted core module so source-level debuggers can map wasm addresses back to formalang source lines.
[dependencies]
formawasm = { version = "0.0.1-beta", features = ["dwarf"] }
Granularity is function-level: one subprogram DIE per user function with name + decl_file + decl_line + low_pc / high_pc, plus a .debug_line row pointing at each function's first source line. Per-statement line tables can layer on later.
The feature pulls in gimli and the IR-side IrSpan data formalang attaches to every node. Default-feature builds skip the dependency entirely.
with_validation() (always available)
Not a cargo feature, but worth mentioning here: WasmBackend::new().with_validation() enables a wasmparser::Validator re-check against the wrapped component bytes before returning. Surfaces internal-error conditions (malformed core wasm slipping through, canonical-ABI mismatches) as WasmBackendError::Validation instead of as runtime failures inside the embedding host.
The pass adds one validation pass per generate call; off by default for speed, on for production correctness when the cost is acceptable.
Combining features
All three are independent and compose cleanly:
[dependencies]
formawasm = { version = "0.0.1-beta", features = ["wasm-opt", "dwarf"] }
Order of operations in WasmBackend::generate when all are enabled:
preflight ──► survey ──► lower_module ──► [wasm-opt post-pass]
──► emit_wit ──► wrap_component ──► [validation]
DWARF sections are emitted by lower_module, so they ride through the wasm-opt pass intact (binaryen preserves custom sections by default).
Troubleshooting
Common errors when compiling formalang to a WebAssembly component, and what they mean.
Pre-flight rejections
Pre-flight runs first; failures here mean the IR shape violates an invariant the backend expects. Each one points at the IR variant that triggered it.
PreflightError::ClosureExprPresent
A bare IrExpr::Closure survived into the backend.
Cause: the codegen pipeline didn't run ClosureConversionPass before invoking the backend. The pass lifts every closure to a top-level function plus a synthetic env struct, leaving only IrExpr::ClosureRef for the backend to consume.
Fix: ensure your Pipeline includes ClosureConversionPass between MonomorphisePass and DeadCodeEliminationPass. The formawasm CLI binary wires the canonical sequence; if you're driving the backend directly, mirror it.
PreflightError::PublicClosureSignature
A pub fn has a closure-typed parameter or return value.
Cause: closures can't cross the WIT boundary — there's no canonical-ABI representation for "function pointer + environment". See Boundary Policy.
Fix: make the function private (drop pub), or replace the closure parameter with a concrete enum / struct that the host can construct.
PreflightError::TypeParamPresent / PreflightError::GenericTraitPresent
An unresolved type parameter or a generic trait survived into the backend.
Cause: MonomorphisePass is responsible for specializing every generic before codegen. If one slips through, the pass either didn't run or hit a bug.
Fix: confirm MonomorphisePass runs first in your pipeline. If it does and the error persists, it's an upstream bug — open an issue with the source that reproduces it.
PreflightError::ErrorTypePresent
A ResolvedType::Error sentinel reached the backend.
Cause: this is an internal placeholder the formalang frontend uses while error recovery is in flight. Its presence in the IR returned to a backend means the frontend produced a partially-typed module despite compile_to_ir returning Ok.
Fix: open an issue against formalang — a successful compile should never carry Error types into the IR.
Lowering errors
These surface from lower_module and indicate a feature the backend doesn't yet support, or a malformed IR shape.
WitEmitError::NotYetSupported
The WIT emitter encountered a public-surface type it can't represent today. Most boundary-relevant types are supported (see Type Mapping); this error generally means a feature still in flight.
Fix: check the Feature Coverage tables. If the variant isn't listed as supported, the workaround is to keep that type internal — drop the pub qualifier on the offending declaration.
LayoutError::*
The layout planner couldn't compute a memory layout for a type. Usually because the IR points at a struct/enum that wasn't registered in the module — typically a bug in the IR construction.
Fix: if you hand-built the IR, confirm every StructId / EnumId referenced from a ResolvedType exists in module.structs / module.enums. If you used compile_to_ir, this is an upstream bug.
Component-wrap errors
ComponentWrapError
wit-component failed to wrap the core module + WIT into a Component-Model artifact. Usually the WIT and the core module disagree about a function signature — the canonical-ABI lowering didn't match the WIT type.
Fix: this is a backend bug. Run with the validation step on (WasmBackend::new().with_validation()) to surface the malformed wasm earlier. Open an issue with the source that reproduces.
Runtime errors (under wasmtime)
Errors during component instantiation or function calls aren't formawasm errors per se — they come from wasmtime — but a few are common enough to call out.
"import cm32p2::host-foo not provided"
Your component declares an extern fn host_foo but the host didn't supply it via Linker::root().func_wrap(...).
Fix: see the Hosting a Component chapter on supplying imports.
"function not found: foo"
You called instance.get_typed_func(&mut store, "foo") but the WIT export name is foo-bar (kebab-case).
Fix: WIT identifiers are kebab-case; formalang pub fn foo_bar exports as foo-bar. See Type Mapping.
unexpected trap
A wasm trap — typically an arithmetic check (i32_div_s by zero, integer overflow on signed division, out-of-bounds index access) or an unreachable instruction (functions with Never return type). The trap's BacktraceFrame chain should point at the offending function.
Fix: enable the dwarf feature for source-level line numbers in the trap backtrace.
Where to ask
- Backend bugs / feature requests: formawasm issues
- Frontend / language questions: formalang issues
- Component Model / WIT questions: bytecodealliance/wit-component
Architecture
formawasm is a single-crate compiler backend. Its job is to turn one IrModule into one Vec<u8> — wrapped Component-Model bytes — and surface every error as a typed value. This page traces the journey from input IR to output bytes.
The pipeline
WasmBackend::generate(&module) runs seven stages, in order:
preflight ──► survey ──► lower_module ──► [wasm-opt] ──►
emit_wit ──► wrap_component ──► [validate]
Each stage lives in its own source module and surfaces a typed error.
| Stage | Module | Job |
|---|---|---|
preflight::check | src/preflight.rs | Reject leftover IrExpr::Closure, public closure-typed signatures, generic traits, ResolvedType::TypeParam, ResolvedType::Error. Fail fast. |
survey::survey | src/survey.rs | Walk IrModule; classify every top-level item as export / import / internal. Returns a PublicSurface. |
module_lowering::lower_module | src/module_lowering.rs | Plan memory layouts, declare runtime helpers, declare extern imports under cm32p2, declare funcref tables for closures + trait methods, build per-function bodies, concatenate static-data segments. Returns core wasm bytes. |
optimize_core_module (optional) | src/backend.rs | Behind the wasm-opt cargo feature: run binaryen at -Os over the core bytes with Feature::All enabled. |
wit::emit_wit | src/wit.rs | Walk the public surface; emit import / export lines plus record / variant declarations for public structs / enums. |
component::wrap_component | src/component.rs | Feed core module bytes + WIT to wit-component::ComponentEncoder. Returns wrapped component bytes. |
validate_component (optional) | src/backend.rs | When constructed via with_validation: run wasmparser::Validator against the wrapped bytes. |
Why this shape
A few decisions are worth calling out, because they constrain how new features compose:
Preflight is a separate stage, not interleaved. A bad IR shape should fail before we do any work — the lowering paths can then assume well-formed input and skip defensive checks.
The survey runs before lowering. Knowing the export and import sets upfront lets the lowerer commit to function-index allocations early, so it never has to reorder or renumber.
Layouts are planned bottom-up, lowering top-down. The layout planner (src/layout.rs) computes one record per type before any function body is emitted; the lowerer then resolves i32_load / i32_store offsets against compile-time constants. This is what makes per-method dispatch a direct index into a vtable instead of a runtime map lookup.
Validation is opt-in. wit-component already validates internally during wrap, and a defensive wasmparser re-check on the hot path adds cost no production user pays. Tests that construct backends via with_validation() get the safety net; the default builder skips it.
The optimizer pass runs on core wasm, not on the wrapped component. Binaryen's component-model support is still young, and the canonical-ABI wrappers / cabi_realloc export are easier to keep intact when wrapping happens after.
Boundary representation
The backend has two representational regimes:
- Inside the core module: full power of core wasm. Linear memory for aggregates, tables for funcrefs, multi-value returns where they help. Lowering is free to use any wasm proposal we've enabled.
- At the WIT boundary: the canonical ABI. Aggregates flow as
(ptr, len)pairs or pointers into linear memory; thecabi_reallocexport gives the host a hook into the component's allocator.
Translating between the two regimes happens in two places: WIT-generated parameter-split wrappers (lift inbound boundary values to internal pointers) and return-shape wrappers (lower outbound internal pointers to canonical-ABI return values). Both are emitted by lower_module alongside user functions.
Per-module compile
The lower_module stage is the heavy lifter. Inside a single call:
- Plan layouts for every aggregate type (
plan_struct,plan_enum,plan_array,plan_range,plan_optional,plan_string,plan_dictionary,plan_vtable). - Declare runtime helpers: bump allocator (
__alloc), string equality (__str_eq), string concatenation (__str_concat), canonical-ABI realloc (cabi_realloc). - Declare extern imports under the
cm32p2namespace (canonical-ABI 32-bit-platform-2 mangling perwit-component). Imports occupy the leading region of the function-index space. - Declare funcref tables: one for closures, one per trait for vtable dispatch.
- Lower each function body via the
src/lower/*submodules; eachlower_*function appends instructions to the caller'sInstructionSink. - Concatenate static data: string-pool bytes + per-impl vtables get written into a single passive data segment seeded into linear memory at startup.
- Emit the wasm
namecustom section so debug tooling resolvesfunc[N]back to the source identifier.
The resulting Vec<u8> is a fully-formed core wasm module, ready for wit-component to wrap.
Where the IR comes from
formawasm doesn't parse .fv source files itself. The expected pre-codegen pipeline is:
#![allow(unused)] fn main() { Pipeline::new() .pass(MonomorphisePass::default()) // specialize generics .pass(ResolveReferencesPass::new()) // stamp typed IDs .pass(ClosureConversionPass::new()) // lift closures .pass(DeadCodeEliminationPass::new()) // strip unreachable .emit(module, &WasmBackend::new()) }
Skipping MonomorphisePass leaves ResolvedType::Generic / TypeParam in the IR, which preflight rejects. Skipping ClosureConversionPass leaves IrExpr::Closure in the IR, which preflight also rejects. The other two passes are quality-of-life rather than correctness — ResolveReferencesPass lets the backend skip name resolution at lowering time, and DeadCodeEliminationPass keeps the emitted bytes small.
Crate Layout
formawasm is intentionally flat. Every top-level concern lives in its own src/ module; only the per-IR-variant lowering family is grouped under a lower/ subdirectory.
src/
lib.rs # public surface re-exports
backend.rs # WasmBackend, Backend impl, optional wasm-opt + validation
preflight.rs # rejection of unsupported IR shapes
survey.rs # public-surface classification
layout.rs # memory-layout planning (struct, enum, array, range,
# optional, string, dictionary, vtable)
types.rs # IR ResolvedType → wasm valtype mapping
ident.rs # source-name → kebab-case helper
string_pool.rs # compile-time string-literal interning
module.rs # core-Wasm ModuleBuilder
module_lowering.rs # IrModule → core wasm bytes (orchestration)
wit.rs # WIT auto-generation
component.rs # core module + WIT → component
dwarf.rs # (feature: dwarf) DWARF debug-section emission
lower/ # per-IrExpr lowering (one submodule per family)
mod.rs # shared types + lower_expr dispatcher
aggregate.rs # struct/enum/tuple instantiation, field access
binary_op.rs # type-dispatched binary operators
block.rs # Block + Let + per-source scratch-local planning
call.rs # FunctionCall, MethodCall (static + virtual)
control.rs # If, For (over Range and Array)
literal.rs # numeric / boolean / string literals
optional.rs # Some-wrap coercion at let / return / args / fields
reference.rs # Reference, LetRef, SelfFieldRef
unary_op.rs # Neg, Not
bin/
formawasm.rs # source.fv → output.wasm CLI driver
docs/ # this book (mdBook source)
tests/ # one file per IR construct + per phase milestone
plans/ # forward-looking plan notes (decisions in flight)
Module responsibilities
lib.rs
Public re-exports. Anything a downstream crate imports (use formawasm::WasmBackend) goes through here. Also re-exports the upstream IR types (IrModule, IrFunction, ResolvedType) so callers don't need a separate formalang dependency.
backend.rs
WasmBackend itself. Implements formalang::pipeline::Backend, holds the validation toggle, and orchestrates the seven-stage pipeline. The optimize_core_module free function lives here too — both behind the wasm-opt cargo feature.
preflight.rs
The rejection pass. Every PreflightError variant carries a human-readable breadcrumb pointing at the offending IR node. Failure means an upstream pipeline step was skipped or an upstream invariant was violated.
survey.rs
Classifies top-level items into exports / imports / internal types. Returns a PublicSurface that downstream stages consume — lower_module for function-index allocation, wit::emit_wit for which items appear in the WIT file.
layout.rs
The memory-layout planner. One plan_* function per type family, each producing a record of per-field offsets, total size, and alignment. All sizes follow the Component-Model canonical ABI. The lowerer resolves i32_load / i32_store offsets against the constants this module returns.
module_lowering.rs
The bridge between IrModule and module::ModuleBuilder. Walks the IR module, builds a FunctionMap ahead of time so recursive calls resolve, then lowers each function body and plugs it into a fresh ModuleBuilder before returning the encoded module bytes.
lower/
Per-IrExpr lowering. Each lower_* function appends instructions to the caller's InstructionSink and assumes the surrounding stack discipline is maintained by the caller. Helpers do not emit a closing end — that's the function-body framer's job.
The submodules are split by IR variant family rather than by phase, so adding a new operator (e.g. a new BinaryOp::Bitwise) means editing exactly one file.
wit.rs
WIT text generation. Walks the public surface, emits the world block plus any interface types declarations, and returns a String ready for wit-component::ComponentEncoder.
component.rs
The final wrap. Takes core module bytes + WIT text, runs them through wit-component::ComponentEncoder, returns the component bytes.
string_pool.rs
Compile-time string-literal interning. Every string literal lowered by lower::literal is added to the pool; on module finalization, the pool's bytes are written into a passive data segment with a single per-literal (ptr, len) header pair.
dwarf.rs
Behind the dwarf cargo feature: DWARF .debug_info / .debug_abbrev / .debug_line / .debug_str custom-section construction from the IrSpan data formalang attaches to every IR node.
Public-API surface
What's pub from this crate (per lib.rs):
WasmBackend,WasmBackendError— the headline type and its error.Backend(re-exported from formalang) — the traitWasmBackendimplements.IrModule,IrFunction,ResolvedType,Pipeline, etc. — re-exports so callers don't need a separateformalangdep.lower_module,emit_wit,wrap_componentplus their error types — for callers who want to drive individual stages.plan_struct,plan_enum,plan_array, … plus their*Layoutrecords — so layout-aware tools can introspect the same memory shapes the backend uses.PublicSurface— for callers who want to inspect the surface classification without re-walking the IR.
Internal helpers (string_pool, ident) stay pub(crate). The boundary between "public" and "internal" is enforced at lint time: #[deny(unreachable_pub)] keeps the surface from leaking accidentally.
Tests
tests/ mirrors the lowering layout: one file per IR variant (lower_struct_inst.rs, lower_array.rs, lower_match.rs, …), one file per layout family (layout_struct.rs, layout_array.rs, …), plus per-phase milestone tests (milestone_1b.rs, milestone_2.rs, …, sieve.rs). See Testing for conventions.
tests/snapshots/ holds insta snapshots of the WIT emitter output — every WIT shape we generate is captured as a .snap file so divergence trips a review-gated diff.
Lowering
This page is the deep-dive into how a formalang IrExpr becomes core wasm. The Architecture chapter covers the surrounding pipeline; this one is about the inside of lower_module.
Memory model
Every aggregate value — IrStruct, IrEnum, Tuple, Array header, Range, Optional, String header, Dictionary header — lives in linear memory through a bump allocator. The wasm representation of an aggregate is a single i32 pointer; methods, field access, match arms, and call sites pass that pointer through wasm locals.
The decision to keep all aggregates on the heap (rather than splitting small ones onto the wasm value stack) is recorded in Stack vs Heap for Small Aggregates.
Linear-memory layout
Every formawasm-emitted module has one linear memory and a __heap_ptr global pointing at the next free byte. The bump allocator (__alloc(size: i32) -> i32) returns the current __heap_ptr and advances it; there's no free list and no reclamation. Components that need GC bring their own — see the "Roadmap" section in CHANGELOG.md.
The static-data segment occupies the leading region of linear memory:
- String-pool bytes — every string literal interned at compile time, packed contiguously.
- Per-impl vtables — flat arrays of funcref-table indices, one per trait method per impl.
- Per-literal
(ptr, len)headers — pre-builtStringheaders pointing at the pool.
__heap_ptr is initialized to the byte just past the static data; bump allocations start there.
Aggregate layouts
| Type | Layout |
|---|---|
IrStruct | Fields in declaration order, each at the next offset rounded up to its alignment |
IrEnum | i32 discriminant tag at offset 0, padded payload following |
Tuple | Same as IrStruct (anonymous record) |
Array<T> | { ptr: i32, len: i32, cap: i32 } header pointing at element buffer |
Range<T> | { start: T, end: T } |
Optional<T> | { tag: i32, payload: T } |
String / Path / Regex | { ptr: i32, len: i32 } |
Dictionary<K, V> | Sorted-pairs array v1: { ptr, len, cap } over (K, V) pairs |
| Vtable | Flat array of i32 funcref-table indices, one per trait method |
Size and alignment follow the Component-Model canonical ABI: bool is 1 byte, s32/f32 are 4 bytes aligned to 4, s64/f64 are 8 bytes aligned to 8. Each aggregate's total size is rounded up to its own alignment.
Aggregate-typed fields (a struct field whose type is itself a struct) lower as 4-byte pointers — the nested aggregate gets its own bump allocation. This keeps every aggregate's layout stable regardless of how its fields are typed.
Runtime helpers
The lowerer emits a small set of helper functions into every module. They're conceptually a runtime; technically they're just wasm functions the lowerer wires up alongside user code.
| Helper | Signature | Job |
|---|---|---|
__alloc | (size: i32) -> i32 | Bump-allocate size bytes, return pointer |
__str_eq | (a_ptr: i32, a_len: i32, b_ptr: i32, b_len: i32) -> i32 | Byte-equal compare; returns 0 or 1 |
__str_concat | (a_ptr: i32, a_len: i32, b_ptr: i32, b_len: i32) -> i32 | Allocate result buffer, memory.copy both inputs, return pointer to a fresh {ptr, len} header |
cabi_realloc | canonical-ABI signature | Host-callable hook into __alloc so wit-component's lift wrappers can allocate inbound buffers in our memory |
The string built-ins formalang's prelude declares (String::len, is_empty, slice, starts_with, contains, byte_at) wire to additional helpers via prelude_helper_index. slice is zero-copy (returns a fresh {ptr, len} header pointing into the existing buffer); byte_at traps on out-of-range; contains runs a naive O(n·m) substring search.
Per-IrExpr lowering
src/lower/mod.rs declares lower_expr, the recursive dispatcher. Each variant routes to a function in one of the family submodules:
| Family | File | Variants |
|---|---|---|
| Aggregates | aggregate.rs | StructInst, EnumInst, Tuple, FieldAccess |
| Binary ops | binary_op.rs | BinaryOp (type-dispatched) |
| Blocks | block.rs | Block, Let, IrBlockStatement::Assign |
| Calls | call.rs | FunctionCall, MethodCall (static + virtual), CallClosure |
| Control | control.rs | If, For (over Range and Array), Match |
| Literals | literal.rs | Literal (numeric / boolean / string), Array (literal), DictLiteral, nil |
| Optional coercion | optional.rs | Some-wrap at let / return / if / match / args / aggregate fields |
| References | reference.rs | Reference, LetRef, SelfFieldRef |
| Unary ops | unary_op.rs | Neg, Not |
A few non-obvious lowering choices:
Match uses br_table on the discriminant tag. No string compare, no nested ifs — straight i32_load of the tag at offset 0, then br_table to the matching arm's body. Payload bindings are extracted with offset loads against the variant layout.
MethodCall static dispatch becomes call; virtual dispatch becomes call_indirect. The vtable lookup is i32_load from vtable_base + method_idx * 4, giving a funcref-table index, which call_indirect consumes.
For over Range<T> and Array<T> both lower as loop + br_if comprehensions producing a fresh Array<body_ty> result. The pre-walk (walk_count in block.rs) reserves one i32 scratch local per construction so nested allocations don't clobber each other.
Closures, after ClosureConversionPass, are (funcref, env_ptr) pairs. Indirect invocation via a funcref Table + call_indirect with the env pointer prepended to the user-visible argument list.
Type-dispatched operators
Operator lowering depends on the operand type, not just the operator. BinaryOp::Add on I32 lowers to i32.add; on String it calls __str_concat; on F64 it lowers to f64.add. The dispatch table lives in src/lower/binary_op.rs — adding a new operand type means extending exactly one match.
Comparison operators (Eq, Ne, Lt, etc.) work the same way, with a separate dispatch table. String equality routes to __str_eq; numeric equality routes to the appropriate i32.eq / i64.eq / f32.eq / f64.eq.
Boundary trampolines
Every public function gets a pair of wrappers around it:
- A lift wrapper that takes canonical-ABI parameters (split
(ptr, len)for strings/lists, etc.) and assembles internal pointer arguments. - A lower wrapper that takes the internal return value and produces canonical-ABI return shapes.
These are emitted by lower_module alongside the user function. The cabi_realloc export lets the host allocate inbound buffers in our linear memory before lift; this is how wit-component smuggles strings and lists across the boundary without copying twice.
DWARF
Behind the dwarf cargo feature, lower_module emits four debug sections (.debug_info, .debug_abbrev, .debug_line, .debug_str). Granularity is function-level today: one subprogram DIE per user function with name + decl_file + decl_line + low_pc / high_pc, plus a .debug_line row pointing at each function's first source line. Per-statement line tables are a follow-up.
The IR-side IrSpan data formalang attaches to every node provides the source coordinates; dwarf.rs translates them into gimli's writer types and module_lowering.rs wires the resulting bytes into custom sections.
Extending the Backend
Most contributions to formawasm fall into one of three buckets: adding support for a new IR variant, lifting a feature from "pre-flight rejected" to "lowered", or wiring a new runtime helper. This page walks through each.
Adding a new IrExpr variant
Suppose formalang adds IrExpr::Bitwise { op, left, right } for bitwise operators. The work splits across four files:
src/types.rs— if the new variant introduces a wasm-valtype mapping the existingbody_value_typedoesn't already cover, extend the dispatch.src/lower/mod.rs— add a match arm inlower_exprrouting the new variant to a newlower_bitwisefunction in an appropriate submodule (binary_op.rsif it's binary, a new file if it's a new family).src/lower/<submodule>.rs— write thelower_bitwisefunction. Append wasm instructions to the caller'sInstructionSink; do not emit a closingend(the function-body framer handles that).tests/lower_bitwise.rs— hand-build anIrModuleexercising the new variant and run it throughWasmBackend::generate. Validate as a Component-Model artifact and, where possible, instantiate under wasmtime to confirm the runtime semantics.
If the new variant has a public-surface footprint (it produces a type that crosses the WIT boundary), also extend src/wit.rs and add an insta snapshot under tests/snapshots/.
Lifting a feature from preflight rejection
If the IR carries something the backend currently refuses (e.g. a public closure-typed signature became expressible later), the work is:
- Remove the rejection in
src/preflight.rsfor the variant in question. - Implement the lowering following the steps above.
- Update
tests/preflight.rsto drop the now-obsolete rejection case. - Update Feature Coverage to mark the variant as supported, and add a row to Boundary Policy if it crosses.
The opposite direction also happens — adding a new rejection case for a corner the backend can't actually handle. Same shape: extend PreflightError, add a test that exercises the rejection, document the case in Troubleshooting.
Wiring a new runtime helper
Runtime helpers live alongside user functions in the emitted module — there's no separate "runtime" object file. Adding one is:
- Declare the helper's wasm signature as part of the module skeleton in
module_lowering::lower_module. - Emit the helper's body as part of the helper-bootstrap section. Conventionally helpers use the
__prefix (__alloc,__str_eq, …). - Resolve the helper's function index in the
FunctionMapso callers cancallit directly. - Use it from a
lower_*function by emitting the rightcall <__helper_index>instruction sequence.
For helpers that the formalang prelude exposes through extern impl (e.g. String::contains), the wiring also touches prelude_helper_index so the IR-level method-call site resolves to the helper instead of an external symbol.
Adding a new memory layout
If a new container type lands (say, Set<T>):
- Add
plan_settosrc/layout.rsreturning aSetLayoutrecord. Follow the canonical-ABI sizing rules (1 bytebool, 4 bytess32/f32, 8 bytess64/f64, fields aligned to their own alignment, total size aligned to the max field alignment). - Re-export
SetLayoutfromsrc/lib.rsso external tools can introspect it. - Use the layout in lowering — typically
lower::aggregatefor construction,lower::referencefor access.
Layouts are planned bottom-up — every aggregate type is laid out before any function body is emitted. The lowerer resolves offsets against the layout records as compile-time constants.
Phase milestones
Each lowering family has a milestone test that exercises the family end-to-end through the full pipeline. When you add or extend a feature, the right kind of test to add is a follow-up assertion in the relevant milestone, or a dedicated test under tests/ if the feature is large enough to warrant its own file.
Existing milestones:
| Phase | Test | Coverage |
|---|---|---|
| 1a | tests/backend_smoke.rs::fibonacci_… | Recursive functions, primitives, If |
| 1b | tests/milestone_1b.rs | Structs, enums, methods (mut self), Match |
| 1c | tests/sieve.rs | Arrays, ranges, For, recursive helpers |
| 2 | tests/milestone_2.rs | Strings, Optional, Dictionary, string ops |
| 3 | tests/milestone_3.rs | Trait Greet across two impls (virtual dispatch) |
| 4 | tests/milestone_4.rs | Host-provided extern fn (component import) |
A new phase generally introduces a new milestone test. Naming follows milestone_<phase>.rs.
Pattern: hand-build an IR for a test
The milestone tests don't go through the formalang frontend — they hand-build the IR using the upstream constructors so the backend gets exactly the shape under test. The pattern is:
#![allow(unused)] fn main() { use formalang::ir::{IrModule, IrFunction, IrFunctionParam, IrExpr, /* … */}; use formalang::ast::{PrimitiveType, ParamConvention}; use formawasm::{Backend, WasmBackend}; let mut module = IrModule::new(); module.functions.push(IrFunction { name: "id".to_owned(), generic_params: Vec::new(), params: vec![IrFunctionParam { binding_id: BindingId(0), name: "x".to_owned(), external_label: None, ty: Some(ResolvedType::Primitive(PrimitiveType::I32)), default: None, convention: ParamConvention::Let, span: IrSpan::default(), }], return_type: Some(ResolvedType::Primitive(PrimitiveType::I32)), body: Some(/* IrExpr tree */), extern_abi: None, attributes: Vec::new(), doc: None, span: IrSpan::default(), }); let bytes = WasmBackend::new().generate(&module)?; }
This is intentional. Hand-building IR keeps the backend tests independent of the frontend's current syntax — a parser change can't break a backend test, and a backend test can exercise an IR shape the parser doesn't yet emit.
Testing
formawasm's tests are the primary specification of correct behavior. The strict-clippy + typed-error setup means runtime failures are typed errors; the test suite confirms they get raised at the right shapes and that the happy path produces bytes that validate, instantiate, and run.
Test layout
Tests live under tests/, one file per concern:
tests/
backend_smoke.rs # WasmBackend::generate end-to-end
cli.rs # formawasm CLI integration tests
layout_*.rs # one per layout family
lower_*.rs # one per IR variant family
milestone_*.rs # per-phase end-to-end milestones
sieve.rs # Phase 1c milestone (named for the algorithm)
preflight.rs # rejection cases
survey.rs # public-surface classification
types.rs # ResolvedType → wasm valtype mapping
wit.rs # WIT emission
wit_snapshots.rs # insta snapshots of emitted WIT
component.rs # component wrap
wasm_opt_size.rs # (feature: wasm-opt) size-comparison
The split mirrors src/lower/: one source submodule, one test file. Adding a new IR variant means adding both src/lower/<family>.rs::lower_<variant> and tests/lower_<variant>.rs.
Test conventions
Tests return Result<(), TestError> where TestError = Box<dyn std::error::Error + Send + Sync>. Use ? to propagate errors; never bare assert! / panic! — return Err(...) instead.
#![allow(unused)] fn main() { type TestError = Box<dyn std::error::Error + Send + Sync>; type TestResult = Result<(), TestError>; #[test] fn empty_module_generates_a_valid_component() -> TestResult { let backend = WasmBackend::new(); let module = IrModule::new(); let bytes = backend.generate(&module)?; validate_component(&bytes) } }
This pattern is enforced by the panic_in_result_fn clippy lint. It also matches the project-wide style of typed errors over panics: a test failure is a value, not a control-flow exception.
What every milestone test does
End-to-end milestone tests follow a consistent shape:
- Hand-build the IR for the feature (see Extending the Backend for the pattern).
- Run through the full pipeline: typically via
Pipeline::emit(module, &WasmBackend::new()), sometimes directly viaWasmBackend::new().generate(&module)if no pre-codegen passes are needed. - Validate as a Component-Model artifact using
wasmparser::ValidatorwithWasmFeatures::default(). - Instantiate under wasmtime with
Config::wasm_component_model(true). - Call exports and assert on results using
instance.get_typed_func(or, for richer types, thebindgen!macro on a generated WIT file).
Example, condensed from tests/backend_smoke.rs:
#![allow(unused)] fn main() { let bytes = backend.generate(&module)?; validate_component(&bytes)?; let mut config = Config::new(); config.wasm_component_model(true); let engine = Engine::new(&config)?; let component = Component::from_binary(&engine, &bytes)?; let linker = Linker::<()>::new(&engine); let mut store = Store::new(&engine, ()); let instance = linker.instantiate(&mut store, &component)?; let fib = instance.get_typed_func::<(i32,), (i32,)>(&mut store, "fib")?; let (got,) = fib.call(&mut store, (10,))?; if got != 55 { return Err(format!("fib(10) = {got}, want 55").into()); } }
Snapshot tests for WIT
tests/wit_snapshots.rs uses insta to capture the full WIT text emitted for representative IR shapes. Each snapshot lives in tests/snapshots/wit_snapshots__<test>.snap and is committed alongside the test.
When the WIT emitter changes, snapshots may diverge. Run:
cargo insta review
to walk through diffs and accept or reject each one. CI compares against the committed snapshots and fails if any diverge without a corresponding accept.
Running tests
The Makefile wraps cargo test with nice -n 19 ionice -c 3 so a heavy compile doesn't starve the desktop:
make test # default-features
make test-wasm-opt # with the wasm-opt cargo feature on
TEST_ARGS='--test sieve' make test # one file
For CI parity:
make ci # fmt + clippy + doc + test + deny + test-wasm-opt
The make ci target compiles binaryen on a clean tree (the wasm-opt step), so expect a few minutes the first time.
What's not tested directly
A few things deliberately don't have unit tests:
- The bump allocator — exercised by every milestone test that constructs an aggregate. A direct test would lock in the
__allocABI, which we may want to evolve toward GC. - String-pool internal layout — same. Exercised through string-literal lowering tests.
__heap_ptr's exact starting value — depends on static-data size; tested via "the module instantiates" rather than by spelling out the integer.
When in doubt: a new feature gets a lower_<feature>.rs test. A change that touches only internal layout gets covered by the existing milestone that exercises the feature end-to-end.
Contributing
Brief operational guide for working on formawasm. Strict clippy, typed errors, tests return Result, no panics in production paths.
Where things live
| File | Purpose |
|---|---|
README.md | Project pitch — short, user-facing. |
book.toml + docs/ | This book. Built with mdbook build. |
CHANGELOG.md | Phase-by-phase history. The "Roadmap" section captures what's left. |
Cargo.toml | Single source of lint levels ([lints.*]). |
clippy.toml | Behavioral clippy thresholds and acronym list. |
deny.toml | License + advisory gates (run via make deny). |
Makefile | Local check shortcuts (make check runs the full suite). |
.github/workflows/ci.yml | Same gates as make, in CI. |
Code style
Comments
- Short and to the point — one sentence is usually enough.
- Explain why a decision was made or what architectural constraint it satisfies.
- Never explain what the code already says (rename a variable instead).
- Module-level
//!comments: purpose of the module and its relationship to the rest of the system. One short paragraph max. - Struct/enum doc comments: one line stating what it represents and its role.
- Method/function doc comments: only when the signature does not make the intent obvious, or when there is a non-obvious invariant the caller must respect.
Errors
- Always typed. Use
thiserrorfor crate-level error enums. - Never
unwrap()/expect()/panic!()outside of tests (clippy enforces this). - Tests return
Result<(), TestError>whereTestError = Box<dyn std::error::Error + Send + Sync>. Use?to propagate; never bareassert!/panic!— returnErr(...)instead.
Suppressions
#[allow]is forbidden by lint. Use#[expect(reason = "...")]with a real explanation when you must override a lint.
Async
- Only where genuinely needed. Don't make a function
asyncspeculatively. - Never hold a lock across
.await. Clippy enforcesawait_holding_lock.
Preferred crates
thiserror(v2) — all error handling.strum(withderive) — enum ↔ string conversions.
Microcommit cadence
One commit per microcommit. Before each commit verify:
make check # fmt + clippy + doc + test
Subject line: <scope>: <verb-phrase> (e.g. lower: emit i32 add via wasm-encoder). Body explains why, not what.
The full local suite — including cargo deny and the wasm-opt feature path — runs as:
make ci
That mirrors the GitHub Actions workflow.
Quality gates
The strict lint configuration in Cargo.toml::[lints.*] is the canonical source. Highlights:
- No silent failures:
unwrap_used,expect_used,panic,todo,unimplemented,unreachable,exitare alldeny. Use typedResulteverywhere. - No print macros:
print_stdout,print_stderr,dbg_macroaredeny. Usetracingfor diagnostics. The CLI binary is the one exception, gated by an#[expect(reason = "...")]at the binary's top. - No silent overflow / wrap / truncation / sign loss:
arithmetic_side_effects,integer_division,modulo_arithmetic,cast_possible_truncation,cast_possible_wrap,cast_sign_loss,cast_precision_lossare alldeny. Use checked arithmetic or document the invariant via#[expect(reason = "...")]. - No
await_holding_lock— async work that mixes locks and.awaitis a deadlock waiting to happen. - Match exhaustiveness:
wildcard_enum_match_armisdeny. Every enum match lists every variant explicitly.
The strict-clippy + typed-error setup mirrors the reference Rust setup at ~/projects/smid/smid-ws0.
CI
.github/workflows/ci.yml runs the same gates as make ci:
cargo fmt --checkcargo clippy --all-targets -- -D warningscargo doc --no-depswithRUSTDOCFLAGS="-D warnings"(catches dead intra-doc links and private-item leaks)cargo testcargo deny check(license + advisory gates)cargo test --features wasm-opt(parallel job; the optional post-pass can't bit-rot)
A failing CI gate blocks merge. Don't bypass with --no-verify — fix the underlying issue.
Working in the book
The book's source lives in docs/; build it with:
mdbook build # output goes to ./book/
mdbook serve # local dev server with live reload
./book/ is .gitignored — only the docs/ source is committed.
When you change the book, update docs/SUMMARY.md if you've added or removed pages. mdBook only emits pages listed in SUMMARY.md, so an unlinked file silently disappears from the rendered output.
Stack vs Heap for Small Aggregates
Status: design note / decision recorded Last updated: 2026-05-02
An early open question for the backend was whether aggregates inside the core module could live in wasm locals (cheap) instead of bump-allocated linear memory (uniform). This document records the analysis and the decision.
Decision: keep the current uniform-heap design through Phase 5. Revisit only if a real workload surfaces aggregate-allocation as a measurable bottleneck. If it does, the work belongs in its own phase, not as an opportunistic addition.
Current shape (Phase 1b through Phase 4)
Every aggregate value — IrStruct, IrEnum, Tuple, Array
header, Range, Optional, String header, Dictionary header
— lives in linear memory through the bump allocator. The wasm
representation of an aggregate is a single i32 pointer; methods,
field access, match arms, and call sites pass that pointer
through wasm locals.
Lowering paths that depend on this convention:
- Construction (
StructInst,EnumInst,Tuple,Array, range binary op,nilliteral,Some-wrap,Stringliteral,DictLiteral): bump-allocate, write fields, push pointer. - Field access (
FieldAccess,SelfFieldRef): pointer + offset load. Match: scrutinee pointer in a scratch local, tag load at offset 0, payload binding loads at variant offsets.- Method call:
selfpointer is the first param, treated aslocal 0. - For-loops: array headers + element buffers separately allocated;
out_bufandout_headerwritten at finalization. - Optional coercion (
lower_coerced): allocate Optional cell, tag- payload writes, push pointer.
- Boundary lifting/lowering: canonical-ABI wrappers split string /
list parameters into
(ptr, len)pairs and reassemble against our internal header pointer;cabi_reallocallocates inbound buffers in our linear memory.
The pre-walk (walk_count in block.rs) reserves one i32 scratch
local per construction so nested allocations don't clobber each
other.
What "stack-allocated aggregates" would mean
An alternative path: aggregates whose total size fits a budget
(say, two wasm value slots) live in wasm locals — directly as
multi-value tuples — instead of bump-allocated linear memory.
A Point { x: I32, y: I32 } carried as a pair of locals beats
the heap path's alloc(8) + i32_store + i32_store + i32_load + i32_load round-trip.
The wins are real:
- Two locals + multi-value return is cheaper than allocate + write
- read + read.
- The bump-allocator's monotonic pointer doesn't grow for every
short-lived
Point; long-running components don't bloat their linear memory just because a tight loop constructs lots of small aggregates. - Wasm engines can keep stack-allocated locals in physical registers more aggressively than memory-resident values.
Why we're not building it now
Cost: invasive across every aggregate site
Aggregates appear in roughly twenty-five lowering paths (everything listed in "Current shape" above, plus the symmetric read paths). Each one needs both a heap-mode and a stack-mode emitter, plus a discriminator at the construction site. Some sites can't easily go stack-mode at all:
mut selfmethods mutate fields through the implicitselfpointer at wasm-local 0. A stack-modeselfwould need either multi-value parameters + multi-value returns (the callee returns the mutated tuple) or a pointer-back-to-caller's-locals convention that wasm doesn't have. Heap stays.- Aggregates flowing across function calls generally need pointers — multi-value param/return is supported but cumbersome, and breaks down once the aggregate is recursive or generic. Pure leaf functions could opt-in but the rule for "when does this aggregate flow vs. stay local" is non-obvious.
- Aggregates crossing the WIT boundary must follow the
canonical ABI — pointer-based, anchored in linear memory.
cabi_reallocis the host's hook into our allocator. Mixing stack and heap at the boundary requires reassembly trampolines. - Aggregates inside arrays / dictionaries are pointer-of-aggregate values today (the array element buffer holds i32 pointers). Stack-mode would change the buffer's element stride per type. The layout planner already handles this for primitives; extending to size-classed aggregates needs a third "stack-aggregate" bucket.
The pre-walk + scratch-local infrastructure plus the canonical-
ABI wrappers plus the optional-coercion pipeline are all wired
against pointer convention. Each picks up a parallel
discriminator. The function-body planner (block.rs) grows a
mode field; aggregate-construction lowerings duplicate; reads
duplicate; method dispatch grows two arms.
Benefit: speculative for our workload
The PLAN's milestones cover programs that allocate thousands of
aggregates: the sieve allocates one Boolean array of length
~30; milestone_1b allocates Counter / Action enum values one per
method call; milestone_2 allocates one Dictionary + ~3 strings.
None of these benchmarks heap-aggregate allocation as a
bottleneck — the bump allocator runs in O(1), the working set
stays in cache, and wasmtime's JIT optimizes the load/store
patterns.
For a real consumer that needs the perf — say, a tight numeric
loop that constructs a Vec2 { x: F32, y: F32 } per iteration —
the right tooling is profiling first, not a speculative refactor.
Composition: forces decisions in code today that shouldn't bind us
Picking a size threshold (aggregates ≤ 8 bytes go stack) bakes a
heuristic into every aggregate site. The threshold's right value
is workload-dependent; encoding it now forces every Phase-5+
feature to compose around the chosen threshold. A threshold that's
right for tight numeric loops is wrong for components that pass
many medium-sized aggregates around.
What would change the call
If any of these surface, revisit:
- Profiling shows aggregate allocation > 5% of runtime in a real consumer's workload.
- A wasm GC proposal lands and we adopt it. GC'd structs replace the bump allocator entirely; stack-mode for small aggregates becomes a compile-time selector, not a parallel path.
- The wasm
memargsproposal grows multi-result-return for pointer-bearing aggregates. Cheaper to flow stack values across call boundaries. - A user-facing
#[stack]annotation on aggregate definitions. Removes the heuristic by letting the source author opt in per-type.
Until then: uniform heap is correct, simple, and fast enough.
Status
Resolved: we picked heap; this note records why. Phase 5+ items compose against the uniform-heap convention without worrying about a future split. If profiling on a real workload ever flips the call, the work spins up as its own phase with a dedicated milestone.
Function Attributes (#[inline] / #[no_inline] / #[cold])
Status: design note / decision recorded Last updated: 2026-05-02
formalang lets source authors annotate functions with codegen
hints — inline, no_inline, cold — and preserves them
through the IR as IrFunction.attributes: Vec<FunctionAttribute>.
This document records why the WebAssembly backend doesn't honor
those attributes today and the conditions under which the
decision should be revisited.
Decision: ignore function attributes; delegate all
inlining / placement decisions to wasm-opt. Revisit if a real
consumer surfaces a workload where the attributes' intent
materially differs from binaryen's heuristics.
Why ignoring is correct today
Wasm has no first-class equivalent for any of the three attributes:
Inline/NoInline: no wasm instruction or section marks a function as "always inline" or "never inline". Inlining decisions live entirely in the optimizer (binaryen, wasm-opt).Cold: no wasm equivalent of a.text.coldsection or branch- prediction hint. Wasm engines decide hot/cold placement internally based on profile data they collect at runtime.
Three options for the backend:
Option A — pass attributes through to wasm-opt
Binaryen has no documented input format for "the source author
asked to inline this function". The closest thing — naming a
function __attribute__((always_inline))-style — works for C/C++
inputs but doesn't have a wasm-encoder side. We'd have to pre-
process the module ourselves, mark functions, then run a custom
pass; binaryen's stock pipeline ignores all of it.
The result would either be code that pre-empts binaryen's inlining heuristics (rarely the right call — binaryen has good heuristics tuned for size-vs-speed) or code that adds marker custom sections binaryen doesn't read.
Option B — emit a custom section listing attributes
Ship a formawasm.attributes custom section keyed by function
index, listing the attribute set per function. Useful for tooling
that wants to introspect the original intent (debuggers, IDE
plugins). Doesn't actually affect codegen — binaryen still ignores
the marker, the engine still ignores it.
This is plausible but speculative: nobody currently reads such a section. It's also non-standard, so different tools would read different formats.
Option C — ignore and document
The IR field stays preserved through the pipeline (closure-conv,
DCE, fold). Backends that can express the attributes (LLVM IR
emitter would have inlinehint / coldcc, JVM emitter would
have nothing) honor them. The wasm backend doesn't, and says so.
This is what we're doing. Cost: zero. Benefit: zero. Risk: the implicit user expectation that "inline" means something. Mitigated by the source-language documentation explaining that backends choose how to honor hints.
What would change the call
If any of these surface, revisit:
- Profiling shows a real workload where binaryen's inlining
heuristic underperforms a manual hint by a measurable margin
(>5%). A
--respect-inline-hintscodegen flag could then pre-process the module to bias binaryen's choice. - A wasm proposal lands defining function attributes natively
(e.g. branch-hint sections like
wasi-nn's). Honoring would become a question of emitting the right wasm-spec-defined section. - A non-codegen consumer needs the attributes. A dataflow analyzer, profile-guided optimizer, or IDE plugin might want "user said inline" data alongside the wasm bytecode. A custom section with a documented format is the natural bridge.
Until then: the IR's attributes field is passed through for
the future, but never consulted at codegen time. No diagnostic
fires when a function is annotated; the annotation is silently
preserved in the IR shape and silently ignored by the wasm
backend.